MLMath

How to solve ML problems that require math

So far we've been getting a lot of feedback that there are many in the course who feel like they are struggling with the more mathematical problems on the homework. While it might seem really hard now, please try your best to get through it, because unfortunately this level of mathematical rigour is central to both applied and theoretical machine learning research. We recognize that it is very tough if you haven't taken any course that covered probability or statistics, and it has been a long time for many of you since you had to apply calculus and remember things like how to manipulate exponents.

That being said, you should do your best to avail yourself of the many resources that are out there to try to fill in the gaps in your knowledge as quickly as possible. This doesn't mean necessarily going out and trying to plow through textbooks!! Of all of probability theory and calculus, we will only be requiring a very small subset of the concepts that would be covered in great detail by a textbook. Instead, we're going to provide some tips and references here that cover more specifically what you need to know to get through this course.

Background in probability

The most important background knowledge you will need is an intuitive understanding of probability at a basic level. In my opinion, the best resources are probability review sessions from other courses that you can find on the web. Here's some that I think cover the basics very quickly and clearly:

Review of probability from a course by David Blei at Princeton
Andrew Moore's Probability tutorial slides (somewhat incomplete)
Another probability review, from UCI
Theoretical CS cheat sheet - only a subset of this incredible dense packet applies to our course, but it's extremely concise and has seemingly innumerable useful identities.

We're not going to completely reproduce another probability review in its entirety here, but if you don't know what linearity of expectation means, or what independence or conditional probability mean, then you really should spend an hour or two reading through these reviews.

Also, an important and obvious note: Even though you may look for information related to probability distributions, etc., and other required background for the course, it should go without saying that you may NOT look for solutions to homework problems on the internet.

A high level look at speaking the language of ML

Machine learning involves many different disciplines that all have their own terminology and notation. Because of this, there are typically many ways of looking at a single problem, and many ways of writing the same problem down. On the one hand, this is annoying because there is often no single "right" way to write down a given model or notation, and it can be unclear what a given equation means. On the other hand, it encourages you to look beyond the specifics of a single equation and try to interpret the broad meaning of what's going on, and to look for mathematical motifs or patterns. For example, as we talked about in recitation linearity is a key theme of many models and something we'll repeatedly come back to. Furthermore, we went over in lecture how multiplying distributions together to get a posterior can end up with the same general form as the prior, so that all we need to do is recognize which parts of the equation are the parameters.

Try to avoid dealing with ugly math at all costs

At a deep, fundamental level, being lazy can pay off when it comes to manipulating probabilistic expressions. My general approach to this sort of thing is as follows:

Experiment by expanding terms or plugging in definitions: a lot of problems can be solved just by plugging in definitions in the right order.
Apply linearity of expectation as many times as you can.
Ideally, you can manipulate things so that you only have terms like {$\mathbf{E}[X]$}, which you know because {$X$} is distributed according to a common distribution.
If you're multiplying distributions, isolate constant terms and try to figure out if any of the rest of the equation matches any part of a distribution that you know already.
If you're working with a distribution, try dropping any normalizing constant and just working with non-constant terms. You might be able to group terms now that you didn't see before.
If I can't seem to do anything else, I can try tricks like taking {$a = \exp\{\log a\}$}, to see if I can manipulate things better when log has converted multiplication to addition. Alternatively you can see if any sums or integrals can be solved using a standard identity like might be found on the "cheat sheet" linked to above.
If it's been awhile since you've seen derivatives, the most commonly used rules are all right here: list of differentiation identities.
Finally, if all else fails, try to evaluate integrals or sums. NEVER START BY EVALUATING INTEGRALS OR SUMS!

Also, remember your exponents: for example,

{$\log(ab) = \log(a) + \log(b)$}
{$\log(a/b) = \log(a) - \log(b)$}
{$e^{a \log b} = \left(e^{\log b}\right)^a = b^a$}
{$e^a\cdot e^b = e^{a+b}$}
{$\log (ae^b) = \log(a) + b$}

If your maths aren't working out, check your exponentiation -- it is very easy to make a mistake there.