Wayne's Github Page

A place to learn about statistics

Applied machine learning

This course will expose students to a variety of data mining applications using machine learning methods.

Students who finish this class should:

Prerequisites

Textbooks / References

Timeline

I reserve the right to change the ordering and the content for the course throughout the semester.

Date Topic Reference Due
2025-01-21 Intro to data mining - Brazilian e-commerce on Kaggle
- ISL Chapter 2.2
 
2025-01-23 Data mining with basic statistics and regression review ISL Chapter 3 - Have R studio installed
- Informal exploration with the Brazilian e-commerce dataset
2025-01-28 Regression review continued ISL Chapter 3  
2025-01-30 Principal Component Analysis ISL 6.3.1 Homework 1 Due
2025-02-04 Principal Component Analysis Applications ISL 6.3.1 + ISL 10.2  
2025-02-06 Logistic regression + Naive Bayes and notes ISL 6.3.1 + ISL 10.2  
2025-02-11 Beyond classification accuracy + Rise of machine learning and “wrong” models - some history Paper on Why Biased Estimators given Stein Estimator + Gauss Markov Theorem + ISL Chapter 2.2 continued  
2025-02-13 Ridge + Lasso Regression and notebook ISL 6.2 Homework 2 Due Date Delayed Slightly
2025-02-18 Tree Methods and notebook ISL 8.1  
2025-02-20 Trees + forests with real data and notebook ISL 8.2
bias in random forest variable importance
 
2025-02-25 Ridge + Lasso Simulations ISL 6.2  
2025-02-27 Data Pipelines ISL 8.1 Homework 3
2025-03-04 Data pipeline continued; Optimization and objective functions caret library ISL Chapter 3.1.1 + 3.3.3  
2025-03-06 Guest lecture Paper: Fast Interpretable Greedy-Tree Sums  
2025-03-11 Resampling techniques - accuracy vs robustness Slides 7 + Resampling from ISL - Read paper on Stability
2025-03-13 Automated Model Selection Slides 7 + + ISL on resampling Project 1
2025-03-18 Spring Break    
2025-03-20 Spring Break    
2025-03-25 Clustering - Kmeans ISL 10.2  
2025-03-27 Clustering - Kmeans continued ISL 10.2  
2025-04-01 K-means with real data ISL 10.2  
2025-04-03 Hierarchical clustering ISL 10.2 [Homework 4]
2025-04-08 Hierarchical clustering with real data ISL 10.2  
2025-04-10 DBSCAN DBSCAN from KDNuggets  
2025-04-15 feature engineering - with text Pre-processing Text + Speech and Language Chapter 6.5  
2025-04-17 Working with text data continued    
2025-04-22 Independent Component Analysis Stanford ICA Slides  
2025-04-24 Models on text including Wordfish   [Homework 5]
2025-04-29 Going over final projects in class    
2025-05-01 Going over final projects in class + what we didn’t teach   Final Project

Logistics

Lectures: TuTh 2:40pm - 3:55pm, 602 Hamilton Hall

Teaching Team

Online Discussion

The TA and grader will check the online discussion for 30 minutes each weekday. Do not expect an immediate response so please start your work early and understand that you should post your questions more clearly.

Grading

If your final grade is in [93-100], you will earn at least an A, [90-93) will earn at least an A-, [87-90) will earn at least a B+, etc. A grading curves may occur depending on the class performance but I will not curve downwards. I may not give out A+’s in this class.

- Homeworks (20%)

Acknowledgement

A lot of these materials are based off the materials from Prof Vincent Dorie.