MatlabTutorial

Goals

There are two goals of this Matlab tutorial:

To get Matlab beginners up to speed with relevant portions of Matlab that will be needed for this course.
To teach practical "tips and tricks" to help with debugging, testing, etc. of machine learning Matlab programs.

For the first goal, I will start with the very basics: a highlight of the useful parts of the Matlab interface, key features of programming in Matlab that differentiate it from other languages, how Matlab will make beautiful plots for you, how to use the tools that Matlab provides for debugging and profiling your code, and, most importantly, how to use the vast and enormously helpful documentation library that Matlab ships with.

For the second goal, we'll go through a detailed example of a typical machine learning programming assignment. Debugging a machine learning program can sometimes be tricky; the goal of the program is to "learn" a model of some sort, and in many cases, there is no immediately obvious "correct" output to compare against.

Concepts - Using the Matlab Desktop

Here are some simple lecture notes that I typed up for the initial part of the tutorial, where I describe how to use the Matlab Desktop. Since this is focused on concepts, it's easier to explain in Wiki format than in comments in an M-file (like the remainder of the tutorial).

The GUI

First, the basics. The Matlab Desktop works a lot like a typical IDE, such as Visual Studio or Eclipse: the display is broken down into resizable frames into which you can drag and drop various window components. Most likely the default configuration you will see upon starting Matlab for the first time is a three-frame view, with four windows loaded by default into the three frames. We'll talk briefly about how to use each of these windows.

Command Window: The Command Window is where the magic happens. Essentially, the Command Window is a very powerful interactive shell. The Matlab shell prompt is the simple double right angle brackets, >>. Just like any other shell, you can navigate the file structure, run scripts and programs, create and destroy variables, and so forth. In addition, the Command Window searches a path variable to determine the location of any command that you enter (more on this later). A lot of standard UNIX commands are built-in to Matlab, like cd, pwd, ls, etc., as well as windows style commands like dir, so it should feel fairly familiar. On UNIX systems, you can access a sh shell by prefixing any command with ! (or with the system command), so that you can do pretty much everything that you will ever need to do right from inside of the Matlab Command Window. Finally, the Matlab shell is required to run Matlab scripts and programs (saved in files with a .m extension), and accessing the vast scientific computing libraries and capabilities that Matlab provides.
Command History: The Command History shows a history of the commands you've entered into the Command Window. Right click on an entry, and many useful little shortcuts will appear, e.g., creating new scripts from old commands or evaluating them.
Current Directory: A standard file browser, and for changing the current directory of the Command Window prompt without needing to run shell commands. Again try right clicking to bring up useful shortcuts.
Workspace: This may not be in focus by default (you may need to click on it to bring it up). The Workspace window is a list of all the variables that you have created so far in your Matlab session, usually ignored by most people.
Editor: An excellent code and text editor, which you can access by entering edit in the Command Window. If you're used to Emacs, never fear -- in Preferences, you can set the Editor to emulate Emacs keybindings. The Editor is also Matlab's debugger, and very useful in that regard.
Help: THE MOST IMPORTANT WINDOW OF ALL!!! The help window is not open by default, and many Matlab first-timers take a long time to find it (e.g., me) because the help command does NOT open up the Help window. Rather, the doc command does. The help command prints out whatever comments are at the top of a given function or script file, which presumably offer some pointers on how the command is intended to be use. For instance, try help save. To access the full documentation for a given command, use doc save (or whatever command you are interested in). doc will also provide you with a table of contents, and direct you to the innumerable tutorials, demos, and videos that Matlab provides to help you get started on your computational quest.

Concept: The Workspace

In addition to being the name of a window, the Workspace is Matlab's general term for anything that is currently loaded into Matlab's memory. When you create a new variable, it goes in the workspace; when you load data from saved files, they are loaded into the workspace. You can save your current entire workspace with the save command, like thus:

>>save myworkspace.mat

This tells Matlab to save the current workspace into the Matlab compressed data format file myworkspace.mat. Note: mat-files are typically the most effective way of storing your data when working with Matlab since it is automatically compressed.

Similarly, you can load variables saved in mat files using the load command, e.g.

>> load myworkspace.mat

You can clear variables from the workspace using the clear command. All very straightforward so far.

Workspace scope, and scripts vs. functions

In matlab, there are two ways to build up libraries of commands that you can use in the future. First, you can simply load up the editor and start typing away. If you save your work to an .m file on Matlab's path (or in the current directory), you can run those saved commands by entering in the name of the file (without extension) into the Command Window. This is called a script, and it works by essentially copying and pasting the commands into the Command Window in order. If any of the commands generates an error, it doesn't care -- it just keeps going. Furthermore, because the commands are in effect entered into the Command Window, the script will work directly in your current workspace. So, if you have a workspace variable called homework_results which you've put all of the results of your first assignment, and you accidentally run a script that initializes homework_results to empty, then you will lose whatever used to be in homework_results.

The other, more powerful way of using code is to create functions. These have a specific header at the top:

function [outputs] = myfunction(inputs)

which tells Matlab that this .m file is defining a new function called myfunction that returns outputs and takes in inputs. Multiple arguments for both outputs and inputs can be separated with commas -- yes, that's right, Matlab allows you to return multiple output variables. This is a somewhat useful feature that leads to very lazy programming, so I don't know if I should have told you about it. In any case, the important point is that functions cannot modify the workspace in any way. This means that there is no such thing as passing by reference, or pointers; instead, to modify a workspace variable with a function requires usage like:

>> A = do_something(A);

which would overwrite the current value of variable A with whatever is returned from the function defined in do_something.m

Concept: Basic variable types

Matlab has a lot different variables, classes, etc., but for the most part you never ever seen more than 2 or 3. The most important ones are numeric arrays, strings, cell arrays, structs, and graphics handles.

Numeric arrays

A numeric array is a scalar, vector, matrix, or in general an n-dimensional array. Unlike other programming languages, Matlab is made so that matrix equations can be translated from paper into Matlab code very quickly:

Brackets. Matlab uses [ and ] to denote concatenation, like you would in a math paper. E.g. entering the command

>>A = [1  2
       3  4]

is equivalent to writing {$A = \left[\begin{array}{cc}1 & 2\\ 3 & 4 \end{array}\right]$} in terms of a natural equation. Equivalently we could have written A = [1,2; 3,4] using semi-colons to denote vertical concatenation and commas to denote horizontal concatenation.

Parentheses. Matlab uses ( and ) to denote subscripts. NOTE: Matlab also uses 1 based indexing, not 0-based indexing like C or Java. So A(1,1) would retrieve the value 1 from the example above, not 4. Make note of this point since it can lead to a lot frustration and problems for anyone new to Matlab who is used to a language like C or Java. For example, if you want to loop over all the elements in an array, your loop would look like

for i = 1:numel(x)
    x(i) = y(i);
end

and not

for i = 0:numel(x)-1
   x(i) = y(i);
end

which would give an error; there is no zeroth element!

Operations. Just as Matlab tries to make typing matrices intuitive, it also makes matrix operations easy. If you have to matrices, A and B, and you want to multiply them, then the command

A*B

will perform matrix multiplication. Matrix operations are the default for many operators that you will use frequently including *, / and ^. If you want to multiply two matrices together element-by-element, then you can do so by adding a period before the operations: .*, ./ and .^.

Other variable types

In Matlab, strings are denoted with single quotes, e.g. 'string'. To make an array of strings, you need to use a cell array, which is treated just like a matrix, but where each entry is a "cell" that can contain any type of object, not simply a number. Instead of hard brackets, curly braces are used for cell arrays, e.g.

mycell = {'cell arrays', 'rock'}

Finally, other useful variable types are the struct and graphics handles. Structs are best demonstrated by example, so we save those for later. We'll talk about graphics next.

Figures and plotting

Once you start using Matlab, you will never make graphs in Excel ever again. Here's a basic overview of how Matlab makes beautiful graphics for you. Why do I need to know this, you ask?? Part of the programming assignments will be turning in plots of your results, and because plotting is an extremely useful tool for interacting with and getting to know a dataset. Before doing anything else, plot!

Now that that's out of the way, let's talk about the high level view of how Matlab organizes your graphics:

The idea is that a figure is just a container; all the actual graphics are drawn to an axes object, which contains both a y and an x axis, any labels, tick marks, colorations, legends, etc., generated automatically. To make kickin plots with multiple graphics organized side-by-side, you use subplot to put multiple axes into one figure container. (See doc subplot).
Technically, A figure is a Window (like any other, that can be snapped in and out of frames, etc.) that can hold one or more axes objects. Use figure to create figures, and gcf to get a graphics handle variable to the current figure. Running one of Matlab's plotting commands will create axes for you, which you can get a handle to using gca. Handles can be used to experiment with and set properties of graphics: use get(gca) to see a list of all the properties of the current axis and set(gca, 'Property', Value) to set the property named 'Property' to the given Value.
Matlab is designed so that you rarely have to use gca and gcf and deal with graphics handles, because there are many useful plotting commands that operate on the current figure and current axes by default (for instance, the plot command is used for most generic plotting purposes.) If you put several of these commands in a row, it will appear that Matlab is only running the last one -- in reality, each plot command is using the single current axes and ovewriting the output of the previous command.

In practice, creating plots is very simple. The process goes like this:

First, you have some numeric data that you wish to plot, typically in matrix or vector form.
So, you (optionally) create or focus on a figure that you want the plot to go into, and then call one of Matlab's many graphics commands to start drawing the figure you envision. By default commands create new axes and overwrite any old ones; to add additional graphics to an existing axes, you use turn on hold with the command hold on, run as many graphics commands as you like, and then turn hold off with hold off.
Next, you customize the plot using helpful shortcuts like xlim (to set x-axis limits), ylim, ylabel, xlabel, legend, and so forth. Alternatively, you can use the get and set commands above to access advanced options, or double clicking on an element of the figure will activate the figure configuration GUI.
Finally, you can get a beautiful vector-based PDF of your figure by using the Export menu from the Figure, or by calling the print -dpdf filename.pdf command when your current figure is highlighted.

OK, that was all pretty vague and probably didn't make all that much sense. However, check out the remainder of the tutorial in example code below, and you will see that it works very simply and easily. Furthermore, the graphics commands are documented very well. Start with doc plot and you will see some awesome examples about how to easily make really visualling appealing figures.

Vectorization and indexing

Indexing.Suppose that A is an nxn matrix. How do we access and assign specific elements of the matrix? Accessing individual elements is easy. To access the element at the 5th row and 3rd column, we would write

A(5,3)

Again, note that this is using 1-based indexing. Now, if we want to get the entire 5th row of A, we can use the : (colon) operator, which tells Matlab to get everything:

A(5,:)

We can also access multiple specific elements using a vector of numbers. To get the 5th row's 2nd, 3rd and 4th elements, we would write

A(5, [2 3 4])

The colon operator can also be used here. By placing numbers on each side of the colon, we get a list of numbers. The command

2:4

will therefore return the vector

[2 3 4]

and so we can access the same elements of A as before using the command

A(5, 2:4)

Logical indexing.If we have a scalar variable, then a binary operator such as > or == will return a single value of either 0 or 1 (true or false). These operators can be used on matrices, and the result is a matrix where the operator is applied to each element of the matrix. Using a logical matrix like this, we can retrieve elements of a matrix that satisfy some condition. For example,

A(A>5)

will return all the values of A which are greater than 5. This is known as logical indexing. If you want to find which rows and columns these occur at, use the find command:

[row col] = find(A>5);

Vectorization. If you find that your code has a large number of for loops and is particularly slow, you might want to see if you can vectorize your code. Vectorization is the process of using a single command to operate on every element of a matrix at the same time rather than looping over every element. For example, to take the mean of each row of a matrix A, we can write

m = zeros(size(A,1),1);
for i = 1:size(A,1)
    m(i) = mean(A(i,:));
end

but it is both faster and simpler to just do

m = mean(A, 2);

Many built-in Matlab functions allow you to pass in an entire matrix as input even if the operation you're trying to do is on individual elements or rows. Try to vectorize your code as much as possible, it will greatly speed it up and make it easier to read. Take a look at the sample KNN code for a good example of vectorization.

Debugging

We always have to deal with bugs in everything we do. Luckily, Matlab makes it a little easier to debug by incorporating the debugger into the editor directly. When debugging, the first thing you will want to do is set a debugging stop point any time there is an error:

 >> dbstop if error;

Now, instead of simply returning an error, Matlab will open up the .m-file in which the error occured, and you can use the debugging buttons on the toolbar to step up and down the stack, inspecting variables, etc., to try to figure out what happened.

To add additional breakpoints, you can simply click next to the line number in the editor.

It is typically easiest to add the above command directly to your matlab startup file. That is located either in ~/matlab/startup.m or a startup.m in the directory that you started matlab from. Commands in startup.m are executed immediately after Matlab starts.

Miscellaneous tips

Remember, if a line is not terminated with a ;, the Command Window will display the return value of that statement. To avoid scrolling delays, always suppress outputs with ;.

Watch for common mistakes with not necessarily clear error messages:

??? Subscript indices must either be real positive integers or logicals.

Translation: Somehow, you are trying to index or subscript into a matrix with another variable that is numeric, but either has a zero or a negative number. Remember that logical matrices and a binary matrix are different, and only logical matrices (created through a logical operation, like A>0) can have a "zero" value and be passed in as a index.

???  In an assignment  A(:) = B, the number of elements in A and B must be the same.

Translation: You have a typo or a bug somewhere in your code. This happens when you trying to assign a subset of one variable values from another variable. Suppose you are trying to assign the j'th row of A to be equal to the i'th row of B, but you accidentally type A(j) = B(i,:), instead of A(j,:) = B(i,:). Now you are trying to assign a vector into a scalar, and that will throw this error. There are many ways of this happening and its the most common error that happens to me in my own programs.

Error: The expression to the left of the equals sign is not a valid target for an assignment.

Translation: You used an assignment = somewhere where you intended to use a comparison == (most likely).

??? Error using ==> mtimes
Inner matrix dimensions must agree.

Translation: Check the dimensions of your variables -- most likely one of them is transposed incorrectly. Matrix multiplication is only valid for {$p \times n, n \times q$} matrices.

Octave is an open source alternative to matlab; it is similar, but has slightly less strict syntax, and doesn't come with the high quality GUI. You can find out more information about Octave at : http://www.gnu.org/software/octave/.