UN2102 Applied Statistical Computing and Data Mining
Learning outcomes
- Manipulate data with different structures
- Explore data via visualization and basic models
- Tackle familiar statistical concepts using simulations
- Solving problems by working backwards and decomposing complex tasks
- Debug code
People
Instructor: Wayne Tai Lee: wtl2109
Teaching Assistant(s):
- TBD
- TBD
Timeline
I reserve the right to change the ordering and the content for the course throughout the semester.
Date | Topic | Reference | Due |
---|---|---|---|
2021-01-12 | - Why statistical computing | ||
2021-01-14 | - Variables, vectors, and functions on vectors with in-class prompts | Simulating LLN | Get R 4.0 installed, then R studio installed |
2021-01-19 | - For-loops | Simulating LLN | |
2021-01-21 | - Recreating Fisher’s results | Past course notes on subsetting | |
2021-01-26 | - Data frames and booleans | ||
2021-01-28 | - Writing functions in R | Scope | Homework 1 Due |
2021-02-02 | - Data visualization - baseR | Text 7.1.1 + 7.1* | |
2021-02-04 | - if/else | ||
2021-02-09 | Review session | Homework 2 Due | |
2021-02-11 | Take-Home Midterm 1 | ||
2021-02-16 | Joins | ||
2021-02-18 | Lists | Text 5 | |
2021-02-23 | *apply functions and vectorized calculations | Homework 3 | |
2021-02-25 | More practice on data wrangling | ||
2021-03-02 | Spring Recess No Class | ||
2021-03-04 | Spring Recess No Class | ||
2021-03-09 | Entering tidyverse with Data Visualization ggplot() and %>% ggplot video on Vimeo and 優酷 %>% operator video on Vimeo and 優酷 |
Online Tutorials | Homework 4 |
2021-03-11 | Working with text | ||
2021-03-16 | Working with text continued | ||
2021-03-18 | Reading in different types of data and lec-vimeo or lec-優酷 | ||
2021-03-23 | Review session | Homework 5 | |
2021-03-25 | Take-Home Midterm 2 | ||
2021-03-30 | Scraping with vimeo lectures and 優酷 lecture | ||
2021-04-01 | API Calls with vimeo lecture and 優酷 lecture | ||
2021-04-06 | Simulations Simulation video on vimeo and 優酷 |
||
2021-04-08 | Permutations Permutation video on vimeo and 優酷 |
||
2021-04-13 | Cleaning code | ||
2021-04-15 | What we don’t know | Homework 6 | |
2021-04-20 | Take-Home Final Exam |
Expectations
- Come to class, bring your laptop, take chances!
- Run through the code in lecture
- Take notes that augment the lectures
- Give feedback, don’t waste your time if you think a topic is not helpful
- Participate and ask questions, this is not easy!
- In class: forecast what should be done, compare with what is happening, then summarize the difference.
- Online: describe what you observe, describe what you expect, communicate clearly.
- To each other: summarize the conversation to ensure you’re listening and think constructively before criticizing.
- Academic honesty: https://www.cs.columbia.edu/education/honesty/
Logistics
Lectures: TuTh 4:10pm - 5:25pm, Zoom Link on Canvas
Office Hours:
- Wayne: TBD
- TBD
- TBD
Grading
If your final grade is in [93-100], you will earn at least an A, [90-93) will earn at least an A-, [87-90) will earn at least a B+, etc. A grading curves may occur depending on the class performance but I will not curve downwards. I may not give out A+ in this class.
- Homeworks (15%)
- Late homeworks will receive 0 credit
- Homework solutions will exist in R
- Your lowest homework grade will be dropped (this is for students who add this course late)
- No make-up homeworks will be granted even if you registered late to the class
- Please export all homeworks in PDF files following these instructions
- You should try using Rmarkdown to create your solutions
- Exams (80%)
- Midterms (25% each)
- Final (30%)
- Participation (5%)
- This will be based on in-class online activities
- You’ll need at least 50% here to pass the class
- If you achieve 50% participation, you will receive the full 5% credit
Exam accomodations
In order to receive disability-related academic accommodations for this course, students must first be registered with their school Disability Services (DS) office. Detailed information is available online for both the Columbia and Barnard registration processes.
Refer to the appropriate website for information regarding deadlines, disability documentation requirements, and drop-in hours(Columbia)/intake session (Barnard).
For this course, students are not required to have testing forms or accommodation letters signed by faculty. However, students must do the following:
- The Instructor section of the form has already been completed and does not need to be signed by the professor.
- The student must complete the Student section of the form and submit the form to Disability Services.
- Master forms are available in the Disability Services office or online: https://health.columbia.edu/services/testing-accommodations
Prerequisites
- An introductory statistics class
- Basic probability distributions (e.g. Gaussian, binomial distributions and their likelihoods)
- Basic hypothesis testing (e.g. t-test)
- Summary statistics
- Histograms, boxplots, etc
- Some understanding of Microsoft Excel or Google Spreadsheets
Textbooks / References
- The Art of R Programming: Tour of Statistical Software Design is available through CLIO
- Advanced R is available online
- Past course notes is available online
Acknowledgement
A lot of these materials are based off the materials from Prof Thibault Vatter and Prof Gabriel Young.