GU4205/5205 - Linear Regression Model
This class is designed for advanced undergraduates or master students who will need a solid mathematical understanding in regression to help their future learnings for more advanced models.
Expectations
- Learning outcomes
- Understand when linear regression is used for a descriptive, predictive, or prescriptive purpose
- Understand how the linear algebra behind linear regression can help us diagnose the behavior in regression
- Be able to run, diagnose, and improve your linear regression models using common methodologies
- Be able to construct counter examples for linear regression to fail
- Be able to simulate and confirm the mathematical derivations
- Your Job
- Come to class, bring your laptop, take chances!
- Run through the code and derivations in lecture
- Take notes that augment the lectures
- Give feedback in office hours or e-mail, I don’t want to waste your time.
- Participate and ask questions, this is not easy!
- In class: forecast what should be done, compare with what is happening, then summarize the difference.
- Online: describe what you observe, describe what you expect, communicate clearly.
- To each other: summarize the conversation to ensure you’re listening and think constructively before criticizing.
- Academic honesty: https://www.cs.columbia.edu/education/honesty/
People
Instructor: Wayne Tai Lee: wtl2109
Teaching Assistant(s): TBD
Timeline
I reserve the right to change the ordering and the content for the course throughout the semester.
Logistics
Lectures: MW 2:40pm - 3:55pm, Location: 301 Pupin Laboratories Office Hours: Tu 2:00pm - 4:30pm, Location 610 Watson Hall (612 W 115th St 6F), led by Wayne Th 2:00pm - 4:30pm, Location 610 Watson Hall (612 W 115th St 6F), led by Wayne F 9:00am-12:00pm, Location 10th floor School of Social Works lounge area, led by Yian Huang Tu 12:30-3:20pm (sharp, not delayed), Location 10th floor School of Social Works lounge area, led by Navid Ardeshir
Grading
If your final grade is in [93-100], you will earn at least an A, [90-93) will earn at least an A-, [87-90) will earn at least a B+, etc. A grading curves may occur depending on the class performance but will not curve downwards. I may not give out A+
- Homeworks (15%)
- Late homeworks will receive 0 credit
- Homework solutions will exist in R
- Your lowest homework grade will be dropped
- No make-up homeworks will be granted even if you registered late to the class
- Exams (80%)
- Midterms (25% each)
- Final (30%)
- Participation (5%)
- This will be based on in-class online activities
- Possible recovery
Your final percentage grade will be the maximum of the following 4 values, the idea is that your final can cover part of your midterm grade:
- Homework * 0.15 + Midterm1 * 0.25 + Midterm2 * 0.25 + Final * 0.3 + Surveys * 0.05
- Homework * 0.15 + Midterm1 * 0.15 + Midterm2 * 0.15 + Final * 0.5 + Surveys * 0.05
- Homework * 0.15 + Midterm1 * 0.15 + Midterm2 * 0.25 + Final * 0.4 + Surveys * 0.05
- Homework * 0.15 + Midterm1 * 0.25 + Midterm2 * 0.15 + Final * 0.4 + Surveys * 0.05
Prerequisites
- Some familiarity with R or Python
- Need to know how to write a loop and visualize data
- An introductory statistics class
- Basic probability distributions (e.g. Gaussian, binomial distributions and their likelihoods)
- Basic hypothesis testing (e.g. t-test)
- Summary statistics
- Basic linear algebra
Textbooks / Supplies
- Applied Linear Regression Models 4th Edition by Kutner, Nachtscheim, and Neter
- Statistical Models Theory and Practice by David A. Freedman
Acknowledgement
A lot of these materials are based off the materials from Prof Ronald Neath and Prof Gabriel Young.