Statistical Computing and Introduction to Data Science
GR5206 / 4206 - Fall 2024
Learning outcomes
- Understand basic programming
- Manipulate data with different structures
- Control flow
- Functions
- Explore data via visualization
- Study statistical concepts via simulations
- Automate tasks with programming
- Understand basic optimization
Prerequisites
- An introductory statistics class
- Basic probability distributions (e.g. Gaussian, binomial distributions and their likelihoods)
- Basic hypothesis testing (e.g. t-test)
- Summary statistics
- Histograms, boxplots, etc
- Multivariate calculus
- Derivatives and functions
- Matrix operations and inverses of matrices
- You should be at least co-enrolled in a modeling class like regression
Textbooks and references
- Google!
- Python concept notes
- Python Data Science Handbook
- Basics only - Programming with Python by Software Carpentry
- LearningPython.org
- Data engineering references (not covered in this class):
- Designing Data-Intensive Applications by Martin Kleppmann (available in NYPL)
- System Design Interview - An Insider’s Guide by Alex Xu
- AI tools like ChatGPT are generally NOT allowed unless explicitly allowed for the assignment. You are strongly discouraged from using them for intro courses which this course is. If you cannot resist the temptation though, it is best that you prompt ChatGPT to ask you questions rather than having it provide you with solutions. You are still responsible for the correctness of your work. Here’s an example prompt you could try:
You are a college instructor helping students to learn Python fundamentals. You should not give the solution to students but help clarify and guide their thinking by asking questions back or providing counter examples. Here are 2 examples: Question: """ create a simulation that demonstrates the sample average is unbiased for estimating the population mean. """ Your answer: """ What does unbiased mean? Let's write the code that will draw a sample first. """ Question: """ Where is my bug? y = x.shuffle() sum(y) TypeError: 'NoneType' object is not iterable """ Your answer: """ What do you think the type of `y` is? """
Timeline
I reserve the right to change the ordering and the content for the course throughout the semester.
Logistics
Class time: F 10:10am - 12:40pm, Location: 301 Uris
Teaching Team
See Ed for offiec hours
Grading
If your final grade is in [93-97), you will earn at least an A, [90-93) will earn at least an A-, [87-90) will earn at least a B+, etc. A grading curves may occur depending on the class performance but I will not curve downwards. I will not give out A+ for this class.
- Homeworks (25%)
- Late homeworks will receive 0 credit
- No make-up homeworks will be granted even if you registered late to the class
- If you want to learn how to use Google Colab, follow these instructions
- Please read these important things related to submitting homeworks on Ed
- Exams (70%)
- 2 weighing schemes
- Midterm (30%) Final (40%)
- Midterm (15%) Final (55%) You will receive a letter grade from curving each approach and receive the higher letter grade between the 2 approaches.
- Participation (5%)
- In class participation
- Online question posting (non-private) and answers are all ways to achieve this
- I will reach out after the midterm if you are at risk of missing some points here.
- You can miss one of these for free.
Exam accomodations
In order to receive disability-related academic accommodations for this course, students must first be registered with their school Disability Services (DS) office. Detailed information is available online for both the Columbia and Barnard registration processes.
Refer to the appropriate website for information regarding deadlines, disability documentation requirements, and drop-in hours(Columbia)/intake session (Barnard).
For this course, students are not required to have testing forms or accommodation letters signed by faculty. However, students must do the following:
· The Instructor section of the form has already been completed and does not need to be signed by the professor.
· The student must complete the Student section of the form and submit the form to Disability Services.
· Master forms are available in the Disability Services office or online: https://health.columbia.edu/services/testing-accommodations
Expectations
- Take chances!
- Break the code in lecture
- Give feedback in office hours or e-mail, don’t waste your time if you think a topic is not helpful
- Participate and ask questions, this is not easy!
- In class: forecast what should be done, compare with what is happening, then summarize the difference.
- Online: describe what you observe, describe what you expect, communicate clearly.
- To each other: summarize the conversation to ensure you’re listening and think constructively before criticizing.
- Academic honesty: https://www.cs.columbia.edu/education/honesty/
Acknowledgement
A lot of these materials are based off the materials from Prof Thibault Vatter and Prof Gabriel Young.