Applied Statistical Methods
UN3105 - Fall 2020
This course is meant to give you a survey of various applied statistic methods. This can vary drastically depending on the instructor’s background.
Topic | What Problems Does It Solve? |
---|---|
Sampling and data quality | How do you get the data relevant to your problem? |
Bayesian Statistics | How do we introduce prior knowledge into modeling? |
Kalman Filters + Kriging | How do we deal with temporally or spatially dependent data? |
Survival analysis | How do we deal with censored data? |
Causal inference | What else can quantify the impact besides randomized controlled trials? |
(if time allows) Sequential analysis | Can we use the data sequentially without cheating? |
Expectations
- Learning outcomes
- Students should be able to critique data-based studies
- Students should be able to articulate the difference between their ideal dataset and feasible dataset
- Students should be able to collect/wrangle/clean/transform their data
- Students should be able to diagnose models and understand their strength/weaknesses
- Students should be able to identify the classic models for certain types of data/problem
- Students should be able to simulate/hypothesize alternative scenarios that can explain the same patterns in the data
- Your Job
- Come to class, bring your laptop, take chances!
- Give feedback in office hours or e-mail, I don’t want to waste your time.
- Avoid e-mailing if possible, share your thoughts on the discussion board instead.
- Participate and ask questions, this is not easy!
- In class: forecast what should be done, compare with what is happening, then summarize the difference.
- Canvas: describe what you observe then describe what you expect.
- To each other: summarize the conversation to ensure you’re listening and think constructively before criticizing.
- Academic honesty: https://www.cs.columbia.edu/education/honesty/
People
Instructor: Wayne Tai Lee (wtl2109)
Teaching Assistant(s): Navid Ardeshir (na2844)
Timeline
I reserve the right to change the ordering and the content for the course throughout the semester.
Date | Topic | Follow-up | Before-Class |
---|---|---|---|
2020-09-08 | Introductions and expectations | syllabus | |
2020-09-10 | Revisiting data collection and common errors | Sampling: Design and Analysis Chap 1-2.2 | |
2020-09-15 | Sampling and practice with NHANES and Discussion on Paper | Sampling: Design and Analysis Chap 1-2.2 | Homework 0 due |
2020-09-17 | Introduction to Data Quality | Read Modeling Ideology and Predicting Policy Change with Social Media by Zhang and Counts | |
2020-09-22 | How to start a problem? discussion on reading | The Silent Sex: Gender, Deliberation, and Institutions, Mendelberg and Karpowitz, Chapter 3 | |
2020-09-24 | Discussion on EDA with focus on NYTimes Comments | Homework 1 - NYTimes EDA | |
2020-09-29 | Regression Refresher with R | A Modern Approach to Regression with R | |
2020-10-01 | Regression with NYTimes based on Reading | Exploring characteristics of online news comments and commenters with machine learning approaches by Lee and Ryu | |
2020-10-06 | Crash course in Bayesian Statistics | Doing Bayesian Data Analysis by John Kruschke | Project 1 Due |
2020-10-08 | Contrasting Bayesian Methods with Classical Methods | ||
2020-10-13 | Dependent Data - Problems with Temporal Data | Homework 2 | |
2020-10-15 | Dependent Data Continued - Time Series and Kalman Filters | Chapter 1 on this dissertation | |
2020-10-20 | Practice - Forecasting Temperature | For manipulating spatial data in R: rspatial.org | |
2020-10-22 | Dependent Data - GIS View of Spatial Data | ||
2020-10-27 | Dependent Data continued - Spatial Statistics | Interpolation of Spatial Data: Some Theory for Kriging - Ch 1.2 | Homework3 |
2020-10-29 | Practice with Kriging | ||
2020-11-03 | NO CLASS - Election day | ||
2020-11-05 | Discussion on spatial data privacy | Twelve Million Phones, One Dataset, Zero Privacy | |
2020-11-10 | Survival data and the issue of censoring | Survival analysis: models and applications Chapter 1 | |
2020-11-12 | Simulating challenges from censored data | Vignette on R package survival, Vignette on time dependent survival analysis, and Survival analysis: models and applications Chapter 2.1.1 + 2.1.2 + 5.1 | Project 2 Due |
2020-11-17 | Survival curves and the Kaplan-Meier Estimator | ||
2020-11-19 | Practice with survival analysis | Framingham Heart Study (Study description on Canvas) | |
2020-11-24 | Missing data | - The prevention and handling of the missing data - Homework4 |
|
2020-11-26 | NO CLASS - Thanksgiving Holiday | ||
2020-12-01 | Discussion - flaws in randomized control studies | Randomization in the tropics revisited: a theme and eleven variations | |
2020-12-03 | Causal inference - AB testing in tech and traps | Homework5 | |
2020-12-08 | Causal Inference - matching algorithms, difference-in-differences | - Joshua D. Angrist and Jörn-Steffen Pischke (2015). Mastering ’Metrics: The Path from Cause to Effect, chapter 5 (see Canvas) - Modern Algorithms for Matching in Observational Studies by Paul Rosenbaum, 2020 |
|
2020-12-10 | Wrap-up | Project 3 Due | |
TBD | Measure understanding | Final Exam | You! |
Logistics
Lectures: TuTh 11:40-12:55 Eastern on Zoom (links on Canvas) Office Hours: No office hours for TA, use discussion board for questions Instructor office hours by appointment only
Computer Setup
- I encourage you to setup your computing environment through Anaconda and Jupyter Notebooks since we’ll be using Jupyter Notebooks in class.
- Here’s how you typeset math and code in Jupyter Notebooks
- The common commands for math and code are here
- Please have your photo posted on Zoom
Grading
If your final grade is in [93-100), you will earn at least an A, [90-93) will earn you at least an A-, [87-90) will earn you at least a B+, etc. A grading curve may be applied depending on the class performance but your grade will not be curved downwards. “At least” implies that there’s a possibility to earn a grade higher than your actual percentage.
A+ will be rewarded only on exceptional cases.
- Homeworks (20%)
- Late homeworks will receive 0 credit
- Your lowest homework grade will be dropped. If you missed Homework0 because you enrolled late, this prevents you from receiving 0.
- Projects (70%)
- Late projects will be penalized by 50% for each day it’s late.
- Projects should be submitted on Canvas
- Participation (10%)
- Instead of attendance, in class activities, recorded through Canvas, is how we’ll grade this.
- To pass the class, you must have at least 50% here.
Prerequisites
- Exposure to foundational statistics and probability
- Course in computing that manipulated data
- Linear regression
Textbooks / Supplies
No textbook but references are available on the syllabus.