#!/usr/local/bin/php
Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /cgihome/cis520/html/dynamic/2016/wiki/pmwiki.php on line 691
Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /cgihome/cis520/html/dynamic/2016/wiki/pmwiki.php on line 694
Warning: Use of undefined constant MathJaxInlineCallback - assumed 'MathJaxInlineCallback' (this will throw an Error in a future version of PHP) in /cgihome/cis520/html/dynamic/2016/wiki/cookbook/MathJax.php on line 84
Warning: Use of undefined constant MathJaxEquationCallback - assumed 'MathJaxEquationCallback' (this will throw an Error in a future version of PHP) in /cgihome/cis520/html/dynamic/2016/wiki/cookbook/MathJax.php on line 88
Warning: Use of undefined constant MathJaxLatexeqrefCallback - assumed 'MathJaxLatexeqrefCallback' (this will throw an Error in a future version of PHP) in /cgihome/cis520/html/dynamic/2016/wiki/cookbook/MathJax.php on line 94
Lectures /
IntroOn this page… (hide) What is Machine Learning?Definition 1. A funny thing happened on the way to AI. Definition 2. Statistics + Algorithms Definition 3. Fancy Function Fitting Here’s ML about 50 years ago: from the vault. Types of ML
Some examplesVision: detecting faces in photographsYour new camera detects faces in images for better focusing and light-metering. This is probably the most-fielded example of classification. Photo management software cluster faces in your images to index photos by people in them. Speech: recognizing dictationYou can dictate your blog or search the web by voice. Speech recognition systems are built using (mostly) supervised learning. Text: News digestGoogle clusters news stories into groups on the same topic and classifies them into sections like World, Sports, Entertainment, Tech, etc. This is primarily an example of unsupervised learning. Recommendation systems: NetflixNetflix uses your movie ratings to predict what other movies you might like. This is a very useful (and profitable) example of supervised learning: regression (although it uses unsupervised techniques like dimensionality reduction). Games: Microsoft Xbox KinectKinect uses supervised learning to detect joint positions from depth images (decision trees, actually, a lot of them) Learning to control robots: Pancake flippin’Complex control policies can be learned using reinforcement learning. ML is (often) modeling a probability distributionProbabilistic reasoning is central to many machine learning tasks. Probability is an extremely useful way of quantifying our beliefs about the state of the world. Generative vs. discriminative models. Many of the methods discussed in this course model one of the following probability distributions:
Using the basic rules of probability, we can compute a discriminative posterior {$P(Y \mid X) = P(X,Y) /\sum_{Y'} P(X,Y')$} for any generative model in order to make decisions. In lecture, we showed that NB and LR can both lead to the same form of posterior, although the parameters estimated will generally be different. Any generative model can be used as an unsupervised method. Since {$P(X) = \sum_Y P(X,Y)$}, a generative model defines a distribution over {$X$} by marginalizing out the class labels. If the class labels are unknown, we can use EM to estimate them! In this sense, a GMM is just a Gaussian Naive Bayes model with unknown classes. Graphical models cover all probability models we discussed. Graphical models are an efficient and intuitive way of encoding a set of independence assumptions about a set of random variables. As an exercise, try to draw the models to represent Naive Bayes and Logistic Regression. Once you have learned about graphical models, it’s rare to ever talk about probabilities again without them! ML is (often) optimizationWhile probability is the language by which we express our beliefs about the state of the universe, in order to talk about achieving goals we need to introduce the language of optimization. Objective (Loss) functions. The standard machine learning paradigm is to (1) define an objective function that captures performance of a model on a given task and (2) optimization that objective with respect to some parameters. We saw a variety of loss functions in this course: (here, {$f(x)$} is a predictive model that returns a real-valued number)
The key thing to remember is that we can often understand the behavior of an algorithm by figuring out and analyzing the behavior of the loss function that it optimizes and how it optimizes it. Regularization: MLE vs. MAP. We will learn that the phenomenon of overfitting can occur if we optimize our objective on training data too tightly. To explicitly control the extent to which we “trust” the training data, we introduced the concept of priors or regularization. If {$\mathcal{D}_X$}, {$\mathcal{D}_Y$} are the dataset, and {$\theta$} our parameters, then probabilistically we have {$ \log P(\mathcal{D}_X,\mathcal{D}_Y,\theta) = \log P(\mathcal{D}_X,\mathcal{D}_Y \mid \theta) + \log P(\theta) = -loss(\theta) + regularizer(\theta) $} In practice, we can substitute whatever loss function we like and whatever regularization function we like, provided we can still solve the resulting optimization problem. Convex: Gradient descent. How do we solve optimization problems? We will see a very simple, powerful method for optimization convex objectives: gradient descent (or ascent for concave functions.) All we need to do is be able to compute a gradient (or any sub-gradient, if the function is not differentiable) at every point. ML in practiceProblem is determined by the labels (or lack thereof). What makes a machine learning problem? The simplest way of phrasing the ML problems we’ve discussed “given some computed features {$X$}, predict something {$Y$}.” Depending on whether or not {$Y$} is given to us, we have unsupervised (not given) vs supervised (given) vs semi-supervised (some {$Y$}’s are given) problems. (Note that we didn’t discuss semi-supervised much in this course, though it came up on the project.) Depending on the domain of {$Y$}, we end up with binary classification, regression, multi-class, etc. In this sense, one take-away from this course should be knowing how to approach a real-world problem and formulate it as a machine learning task, with some idea of algorithms you could use to begin to tackle the problem. Cross-validation. As described in the video, cross validation is an incredibly important tool in an ML practioner’s toolbox. It’s critical for both training, evaluating, and selecting between different models. At this point, you should know how to approach a problem, divide the data into training and test, and compare different algorithms on that problem. Lessons from the project. The final project is arguably the most practical learning experience of the course. You will see that overfitting is a crucial problem, and occurs both with algorithmic estimation and by human designers. Back to Lectures |