Description
CIS 5200 provides a fundamental introduction to the mathematics, algorithms and practice of machine learning, focusing on representation, loss functions, and optimization. Topics covered include:
- Supervised learning: least squares regression, logistic regression, L0/L1/L2 feature selection/regularization, online learning, boosting, Naive Bayes, support vector machines, ensemble methods, neural nets/deep learning
- Unsupervised learning: PCA, K-means clustering, Gaussian Mixture Models, EM, HMMs, Bayesian networks
- Reinforcement learning: TD-learning, Q-learning, deep learning
Audience
The course is aimed broadly at advanced undergraduates and beginning graduate students in computer science, electrical engineering, mathematics, physics, and statistics. This is a hard course; A good alternative for those with less math background or time is CIS419/519 or, if you want a really nice, much easier intro, take the Coursera ML course. If unsure which to take, see this.
Software
We will be coding in Python, using the Jupyter/SKLearn/Pytorch libraries, running on Google Colab.
Pre-requisites
- Basic probability and statistics (random variables, covariance matrix, CDF/PDF, Gaussian and other distributions, multiple regression). [CSE 261]
- Basic linear algebra (matrices, vectors, rank, basis, projection, inverse, eigenvectors).
- Reasonable programming skills, including basic knowledge of python.
Format
The format this year will be a mix of
- Lectures - these can be watched live or via video recording
- Worksheets - step-by-step jupyter notebooks that cover the core material
- Homework, quizzes, final project, midterm, final (see below)
- "Pods" - mandatory attendance, TA-led discussion groups that meet for one hour a week. They should help you meet other students, assure you understand the material, and allow for discussion of broader topics.
Evaluation
- 20% Worksheets: These are graded primarily on being completed on time (but answers need to be sensible).
- 20% Problem Sets: Your lowest homework score will be dropped. Any homework turned in late will be penalized 25 points per late day or fraction of day.
- 10% Participation/attendance: Only for your pod! One permitted absence.
- 10% Final project: Late days are not permitted for the final project.
- 10% Quizzes and Surveys
- 10% Midterm
- 20% Final (cumulative: 1/3 on pre-midterm; 2/3 on post-midterm)
The problem sets include programming questions. The midterm and final will be semi-closed book exams (cheat sheet allowed: one 2-sided sheet for the midterm, two 2-sided sheets for the final), which will encompass material covered in the lectures and assigned in the readings. The project is an open-ended three-person team project.
We do not take attendance except for the pods, but you will learn more if you attend lectures instead of watching the recordings.
Worksheets, quizzes and surveys should be completed before your pod on the week after they are assigned.
Reading Materials
- For the mathematical side of ML: C. Bishop, Pattern Recognition and Machine Learning. 2007
- For classical ML in Scikit-learn: hands on machine learning
- For deep learning in pytorch: Dive into Deep Learning
- example final projects demo1 and demo2
- See also Resources and Lectures