So far we’ve been getting a lot of feedback that there are many in the course who feel like they are struggling with the more mathematical problems on the homework. While it might seem really hard now, please try your best to get through it, because unfortunately this level of mathematical rigour is central to both applied and theoretical machine learning research. We recognize that it is very tough if you haven’t taken any course that covered probability or statistics, and it has been a long time for many of you since you had to apply calculus and remember things like how to manipulate exponents.
That being said, you should do your best to avail yourself of the many resources that are out there to try to fill in the gaps in your knowledge as quickly as possible. This doesn’t mean necessarily going out and trying to plow through textbooks!! Of all of probability theory and calculus, we will only be requiring a very small subset of the concepts that would be covered in great detail by a textbook. Instead, we’re going to provide some tips and references here that cover more specifically what you need to know to get through this course.
The most important background knowledge you will need is an intuitive understanding of probability at a basic level. In my opinion, the best resources are probability review sessions from other courses that you can find on the web. Here’s some that I think cover the basics very quickly and clearly:
We’re not going to completely reproduce another probability review in its entirety here, but if you don’t know what linearity of expectation means, or what independence or conditional probability mean, then you really should spend an hour or two reading through these reviews.
Also, an important and obvious note: Even though you may look for information related to probability distributions, etc., and other required background for the course, it should go without saying that you may NOT look for solutions to homework problems on the internet.
Machine learning involves many different disciplines that all have their own terminology and notation. Because of this, there are typically many ways of looking at a single problem, and many ways of writing the same problem down. On the one hand, this is annoying because there is often no single “right” way to write down a given model or notation, and it can be unclear what a given equation means. On the other hand, it encourages you to look beyond the specifics of a single equation and try to interpret the broad meaning of what’s going on, and to look for mathematical motifs or patterns. For example, as we talked about in recitation linearity is a key theme of many models and something we’ll repeatedly come back to. Furthermore, we went over in lecture how multiplying distributions together to get a posterior can end up with the same general form as the prior, so that all we need to do is recognize which parts of the equation are the parameters.
At a deep, fundamental level, being lazy can pay off when it comes to manipulating probabilistic expressions. My general approach to this sort of thing is as follows:
Also, remember your exponents: for example,
If your maths aren’t working out, check your exponentiation — it is very easy to make a mistake there.