Statistical Computing
Basic computing skills are essential for data manipulation and statistical simulations. Data manipulation is essential for exploring data, executing algorithms, and automation. Simulations can help verify mathematical proofs, test out the impact of violated assumptions, and validate your models to deepen your intuition.
Syllabi and Materials
Syllabi:
Pre-requisites
- Exposure to an introductory statistics class
- Average, median, hypothesis testing, correlation, the difference between a random variables and its realization
Computer setting up
I encourage you to set up Jupyter Notebooks on your computer so you could repeat these in R or Python in the future.
Topic Notes
For people who purely want to learn data analysis with minimal amount of Computer Science background, I’ve written some notes to help:
- Learning R through Examples and Errors. After
going through these notes, you should be able to
- Collect data from the internet
- Manipulate messy data into an easy-to-analyze format
- Write custom functions to process the data
- Summarize/plot data efficiently
- TODO, another set of notes that focuses on statistical computing
- Empirically verifying theoretical ideas
- Estimating probabilities using simulations
- Bootstrap/cross validating etc
- Basic lessons on distributed computing