Applied Statistical Methods
UN3105 - Fall 2024
This course is meant to give you a survey of various applied statistic methods beyond linear regression. This can vary drastically depending on the instructor’s background.
Topic | What Problems Does It Solve? |
---|---|
Bayesian Statistics | How do we introduce prior knowledge into modeling? |
Kalman Filters + Kriging | How do we deal with temporally or spatially dependent data? |
Sampling and data quality | How do you get relevant data to your problem? |
Survival analysis | How do we deal with censored data? |
Causal inference | What else can quantify the impact besides randomized controlled trials? |
(if time allows) Sequential analysis | Can we use the data sequentially without cheating? |
- Your Job
-
AI tools like ChatGPT are generally allowed unless explicitly banned for the assignment. You are strongly discouraged from using them for intro courses but this is not one of them. However, it is best that you prompt ChatGPT to ask you questions rather than having it provide you with solutions. You are still responsible for the correctness of your work. Here’s an example prompt you could try:
””” You are a college instructor helping students with an assignment. Your job is to help clarify and guide my thinking by asking questions back without giving me the answers to the problem. Here are 2 examples: Question: create a simulation that demonstrates the sample average is unbiased for estimating the population mean. Your answer: What does unbiased mean? Would you expect a single sample average to be exactly the same as the population mean?
Question: how should we evaluate a model? Answer: What is the purpose of the model? How would you know if the model was bad? What is the model being compared to? “””
- Bring your laptop, take notes!
- Avoid e-mailing if possible, share your thoughts on the discussion board instead.
- Participate and ask questions, this is not easy!
- In class: forecast what could be done, compare with what is happening, then summarize the difference.
- Ed Discussions: describe the problem: what you observed vs what you expected to see.
- To each other: summarize the conversation to ensure you’re listening and think constructively before criticizing.
- Upholding the honor code: https://www.cs.columbia.edu/education/honesty/
People
Instructor: Wayne Tai Lee (wtl2109)
Teaching Assistant: Yizi Zhang (yz4123)
Timeline
I reserve the right to change the ordering and the content for the course throughout the semester.
Logistics
Lectures: MW 2:40-4:00pm Eastern Office Hours:
- Instructor office hours: Thursday 9:30am-11:30am Uris 324 (previously the facuty lounge)
- TA office hours Tuesday/Thursday 4-5pm on Zoom
Computer Setup
- I encourage you to setup your computing environment through Anaconda and Jupyter Notebooks or use RStudio.
- Here’s how you typeset math and code in Jupyter Notebooks
- The common commands for math and code are here
Grading
If your final grade is in [93-100), you will earn at least an A, [90-93) will earn you at least an A-, [87-90) will earn you at least a B+, etc. A grading curve may be applied depending on the class performance but your grade will not be curved downwards. “At least” implies that there’s a possibility to earn a grade higher than your actual percentage.
A+ will be rewarded only on exceptional cases.
- Homeworks (25%)
- Late homeworks will receive 0 credit
- The TA can grant extensions for these
- Projects (70%)
- Late projects will be penalized by 50% for each day it’s late.
- Projects should be submitted on Gradescope
- The likely distribution is a 20/20/30 split across the 3 projects
- Participation (5%)
- Instead of attendance, in class activities, recorded through Canvas, is how we’ll grade this.
- To pass the class, you must have at least 50% here.
- You will get the full 5% if you have at least 75% of these.
Prerequisites
- Exposure to foundational statistics and probability
- Course in computing that manipulated data
- Linear regression
Textbooks / Supplies
No textbook but references are available on the syllabus.