Lectures /
Linear Algebra ReviewOn this page… (hide) Linear Algebra Notes and Resources : Linear Algebra lectures by Professor Gil Strang at MIT A Tutorial on Linear Algebra by Professor C. T. Abdallah Linear Algebra Review by Professor Fernando Paganini, UCLA Machine learning relies on two basic areas of mathematics: linear algebra and probability. This lecture is a brief review of both of these subjects that you are expected to go over on your own. If you feel that you need more practice, a good book for matrices and linear algebra is Matrix Analysis and Applied Linear Algebra by Carl D. Meyer, and a good book for probability is A First Course in Probability, by Sheldon Ross. Any other books that you are comfortable with are fine, as this course simply uses the information in the books, and does not teach them. NOTE If the topics covered in this review are entirely new to you, we STRONGLY urge you to reconsider taking this class! Either drop the course, or take an incomplete and finish it later; the material in this section is only a review, it is not intended to teach you the material for the first time. Matrices and Linear AlgebraOne description of Machine Learning is that it is fancy curve fitting; you have one or more linear equations that can make a prediction of some kind based on some number of inputs. For the moment, we’re going to sidestep how you choose the linear equations that you choose, their coefficients, etc., and concentrate only on what you do once you have them (the bulk of the course is finding those linear equations and their coefficients, so you’ll see all that soon enough). So, let’s jump in: {$ \mathbf{y = X w } $} Thus far, this is exceedingly uninformative. Let’s break it down into its parts to see what is going on.
So another way of looking at {$ \mathbf{y = X w} $} is: {$ \left[\begin{array}{c} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n \end{array}\right] = \left[\begin{array}{ccccc} X_{11} & X_{12} & X_{13} & \ldots & X_{1m} \\ X_{21} & X_{22} & X_{23} & \ldots & X_{2m} \\ X_{31} & X_{32} & X_{33} & \ldots & X_{3m} \\ \vdots & \vdots & \vdots & \vdots\;\vdots\;\vdots & \vdots \\ X_{n1} & X_{n2} & X_{n3} & \ldots & X_{nm} \end{array}\right] \times \mathbf{w} $} {$ \mathbf{w} $} is the most interesting part of the stuff above. We know we have some set of inputs, and we have some set of outputs we’re interested in. We think that there is a relation between the two. We need {$ \mathbf{w} $} to be an accurate representation of this relation. Let’s start with the simplest relation we can define. The simplest one is to copy some input variable to the output. So we choose one and do so: {$ y_1 = X_{11} $} Well, this is a good start, but what if our output {$ y_1 $} is dependent on several different inputs, like how today’s weather is dependent on the temperature of the past several days, the humidity, etc. Clearly, a better approximation is a combination of functions: {$ y_1 = X_{11} + X_{12} + X_{13} + \ldots $} So far, so good. But what happens if we try to predict the humidity next using the same set of equations? {$ y_2 = X_{21} + X_{22} + X_{23} + \ldots $} Well, we’re giving the same weights to each of the inputs, which is problem because humidity is measured by percentage and has to be restricted to the values [0.0, 100.0], whereas the temperature in some parts of the world go over 100.0. This is a problem, not just because the output might be out of range, but also because the importance of the incoming input variables might be different depending on what we’re trying to predict. It’d be best if we can scale the weight of the inputs individually: {$ y_2 = X_{21} w_1 + X_{22} w_2 + X_{23} w_3 + \ldots $} Regression explanation {$ \begin{array}{ccccccccc} y_1 & = & X_{11} w_1 & + & X_{12} w_2 & + & \ldots & + & X_{1m} w_m \\ y_2 & = & X_{21} w_1 & + & X_{22} w_2 & + & \ldots & + & X_{2m} w_m \\ \vdots & = & \vdots & + & \vdots & + & \vdots\;\vdots\;\vdots & + & \vdots \\ y_n & = & X_{n1} w_1 & + & X_{n2} w_2 & + & \ldots & + & X_{nm} w_m \end{array} $} Now we have a way of controlling the influence of any particular function and input on any particular output value. But… we’re lazy. We keep repeating the same {$w_1, \cdots, w_m$} on each line, and it’d be nice if there was a simple method of avoiding writing them over and over again. In short: {$ \left[\begin{array}{c} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n \end{array}\right] = \left[\begin{array}{ccccc} X_{11} & X_{12} & X_{13} & \ldots & X_{1m} \\ X_{21} & X_{22} & X_{23} & \ldots & X_{2m} \\ X_{31} & X_{32} & X_{33} & \ldots & X_{3m} \\ \vdots & \vdots & \vdots & \vdots\;\vdots\;\vdots & \vdots \\ X_{n1} & X_{n2} & X_{n3} & \ldots & X_{nm} \end{array}\right] \times \left[\begin{array}{c} w_{1} \\ w_{2} \\ \vdots \\ w_{m} \end{array} \right] $} If you notice, the matrix has {$n$} rows and {$m$} columns. By choosing to make the matrix like this, we can have a different number of inputs and outputs, which could be very handy. While we’re at it, we can also see how the matrices and vectors multiple against one another. The matrix rows are multiplied against the input column producing the output {$ y_i = [X]_{i*}*w $}. This is the basics of matrix multiplication. A few things to notice about multiplying to matrices together. If the first matrix has {$k$} rows and {$l$} columns, the second matrix must have {$l$} rows, and {$m$} columns. This should be obvious from the example above; if the number of columns in {$\mathbf{X}$} was not the same as the number of rows in {$\mathbf{w}$} then either there would be too many coefficients or not enough coefficients, which would make the multiplication illegal. ExamplesHere are a couple of worked examples: {$\left[\begin{array}{ccccccccc} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \\ \end{array} \right] \times \left[\begin{array}{c} 1 \\ 2 \\ 3 \end{array}\right] = \left[\begin{array}{c} 1*1 + 2*2 + 3*3 \\ 4*1 + 5*2 + 6*3 \\ 7*1 + 8*2 + 9*3 \end{array}\right] = \left[\begin{array}{c} 14 \\ 32 \\ 50 \end{array}\right]$} {$\left[\begin{array}{ccccccccc} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \\ 10 & 11 & 12 \\ \end{array} \right] \times \left[\begin{array}{c} 1 \\ 2 \\ 3 \end{array}\right] = \left[\begin{array}{c} 1*1 + 2*2 + 3*3 \\ 4*1 + 5*2 + 6*3 \\ 7*1 + 8*2 + 9*3 \\ 10*1 + 11*2 + 12*3 \end{array}\right] = \left[\begin{array}{c} 14 \\ 32 \\ 50 \\ 68 \end{array}\right]$} |