Stochastic Systems Analysis and Simulation
Stochastic systems analysis and simulation (ESE 303) is a class that explores stochastic systems which we could loosely define as anything random that changes in time. Stochastic systems are at the core of a number of disciplines in engineering, for example communication systems and machine learning. They also find application elsewhere, including social systems, markets, molecular biology and epidemiology.
Figure 1. Ranking of nodes in networks. The idea is to associate importance of a node with its degree of connectivity. A broadly applicable definition of the importance of node $i$ is its degree of connectivity. However, it is not only a matter of counting the number of links to $i$, but also of pondering the importance of the nodes that point to $i$. A link by node 4 is more important than a link from node 1. We will study the algorithm used by Google to solve this ranking problem.
The goal of the class is to learn how to model, analyze and simulate stochastic systems. With respect to analysis we distinguish between what we could call theoretical and experimental analysis. By theoretical analysis we refer to a set of tools which let us discover and understand properties of the system. These analyses can only take us so far and is usually complemented with numerical analysis of experimental outcomes. Although we use the word experiment more often than not we simulate the stochastic system in a computer and analyze the outcomes of these virtual experiments.
To get a better feel of what this class is about you might want to check the slides for the first two lectures that study a simple stochastic system. This example considers a certain game in a certain casino where your chances of winning are slightly better than your chances of loosing. More specifically, you place 1-dollar bets, with probability $p$ you are paid 1 dollar and with probability $1-p$ you loose you bet. The good news is that $p>1/2$. The catch is that you have to keep playing forever or until you loose all your money. If you have $w$ dollars is it a good idea to play this game or not? You have to balance the fact that while you are more likely to win than loose, there is always a chance of getting unlucky a few times and losing all your money. And since you have to keep playing forever the latter possibility cannot be ignored heedlessly. To try things on your own here is the Matlab code to simulate one experiment. This other code repeats the experiment a hundred times and the numerical analysis of outcomes is undertaken here.
The remaining of this page contains information about the following
- Class Contents. Summary description of topics covered.
- Applications. Applications of stochastic processes that will be studied in class.
- References. Textbook and other materials.
- Prerequisites, grading and office hours.
The class is roughly divided in four blocks. The first block is a quick review of probability. As part of this block we study some commonly used probability distributions including normal, uniform, binomial and exponential. We will also talk about the definition of limits in probability theory. We'll close this block with the introduction of the concept of stochastic process. This will consume six lectures.
Figure 2. The lac operon. Cells can consume glucose, but if glucose is not available they can reduce lactose to glucose by synthesizing the enzyne $\beta$-galactoxidase. The production of $\beta$-galactoxidase is controlled by the lac operon that includes a repressor and promoter gene (top). The repressor can be activated, resulting in copious production of $\beta$-galactoxidase (medium), or repessed, resulting in basal production of the enzyme. Since reactions occur at random and the number of molecules involved is small, stochastic models are needed to understand the lac operon and biochemical reactions in general.
Not to get too technical about it, a stochastic process is a function that assigns a function to a random event (compare this with the definition of a random variable as a function that assigns a value to a random event). These are complicated entities, and we usually restrict our attention to cases that have a more tractable description. In the second block of the class we encounter such first tractable class: Markov chains (MC). MCs involve a discrete time index $n$ and a time-varying state $X(n)$ taking values in a finite or countable space. The defining property of MCs is that the probability distribution of $X(n+1)$ is unequivocally determined if $X(n)$ is known. This is called the memoryless property as it implies that the past history of the process, that is, what happened until time $n-1$ is irrelevant for the future of the process (what will happen after time $n$). The block on MCs will close with a description of the algorithm used by Google to rank web pages. We will use nine lectures for this block. You can download slides for the lectures and Matlab code for examples covered in class.
We will then consider a continuous time index $t$, but still restrict our attention to finite or countable states $X(t)$. We will also keep the memoryless property which in this case is stated as requiring the probability distribution of $X(s)$ given $X(t)$ for any $s>t$ to be independent of $X(u)$ for all $u<t$. Processes with this property are unimaginatively called continuous time Markov chains (CTMC). Albeit more mathematically challenging, CTMCs are very common in practice. In rough terms, we can say that any system that can be deterministically modeled with a differential equation can be stochastically modeled as a CTMC. We will close this block with two examples: (i) A description of queueing systems and their application to communication networks. (ii) Simulation of chemical reactions with small number of reactants and their application in biochemistry. We will devote twelve lectures to this topic.
The natural progression of the class is to eliminate the restriction on the countable number of states and the memoryless property. This will be covered in the fourth block of the class on general stationary random processes (SSP). Simple examples of SSPs, including Brownian motion, geometric Brownian motion and white noise will be introduced. The most important tools in the analysis of SSP are the autocorrelation function and the power spectral density, the latter of which is a generalization of the Fourier transform to random settings. This block will close with a discussion of Black–Scholes model for option pricing. Nine lectures will be used for these digressions.
As described in the class contents above, interspersed throughout the class we will study stochastic systems that arise in different disciplines. In the last 30 years or so we have come to realize that randomness is a fundamental property of systems. Therefore, much of systems' analysis that used to be done assuming deterministic behavior is nowadays done from a stochastic systems purview.
Figure 3. Price of Cisco (CSCO) between November 25, 2008 and November 24, 2009. A look at this picture suffices to convince oneself that randomness plays a role in the price of stocks. Empirically, it has been observed that stock prices can be modeled as a geometric Brownian motion. This observation is the basis for the pricing of options and derivatives in general.
The first application concerns the problem of ranking elements of a network that arises in social sciences as well as in computer science. The idea is to associate importance of a node with its degree of connectivity. However, it is not only a matter of counting the number of links to a node, but also of pondering the importance of the nodes that point to it, see Fig. 1. An elegant solution to this problem is a stochastic algorithm based on a random walk through elements of the network. The best known uses of this algorithm are Google's PageRank that ranks web pages returned by Google searches and the categorization of scientific papers. This algorithm will be covered in class and in Homework 6.
We will also consider simulation and modeling of biochemical reactions that are characterized by the involvement of a small number of reactants. In such cases it is known that randomness may play an important role in the overall behavior of the system. Gillespie's algorithm is the workhorse technique that will be covered in class along with representative examples. In Homework 10 you will use this algorithm to study the gene autoregulatory network that controls the digestion of lactose; see Fig. 2.
A third set of examples is queueing systems. These systems comprise arrival of customers and their service. In between their arrival and their service customers stay in a queue. We purport to answer questions on how much time customers stay in the queue, and how many customers are expected to be waiting for service. This model is pervasive in communications where "customers" represent information packets to be transmitted.
The last example covered in class comes from financial engineering, namely the Black–Scholes model for option pricing. A look at Fig. 3 is enough to realize that randomness plays a role in the price of stocks. Empirically, it has been observed that stock prices can be modeled as a geometric Brownian motion. This fact and the concept of arbitrage, that is, the opportunity to realize a benefit without risk, permit introduction of the risk neutral measure. Black-Scholes formula for option pricing, as well as the pricing of derivatives in general, is a simple application of the risk neutral measure. In Homework 13 you will use the risk neutral measure to price an option on Cisco's stock.
There are many other examples that could be studied, and we are indeed, considering many more in the homework assignments. You are invited to take a look.
The textbook for the class is
- Sheldon M. Ross "Introduction to Probability Models", Academic Press, 9th ed.
Those of you interested in delving into some topics may be interested in the following books
- Sheldon M. Ross "Stochastic Processes", John Wiley & sons, 2nd ed. A similar book by the same author that is used for graduate-level classes on stochastic processes. It contains essentially the same material as the class's textbook except that the presentation has more rigor and proofs of theorems are given.
- Darren J. Wilkinson "Stochastic Modelling for Systems Biology", Chapman & Hall/CRC, 1st ed. An accessible introduction to the use of stochastic systems in the modeling of biological systems. The part of the class dealing with simulation of biochemical reactions follows the treatment in this book.
- Masaaki Kijima "Stochastic Processes with Applications to Finance", Chapman & Hall/CRC, 1st ed. Use of stochastic processes in finance.
Prerequisites, grading and office hours
This a junior level class that you can take it on your senior year as well. Some sophomores have taken this class in the past, they struggled through the first two or three weeks but did fine afterward. The class is self contained to the extent possible. Important tools are always described when introduced. That said, to take this class it is beneficial to have good knowledge of Probability Theory and Linear Algebra. Familiarity with differential equations and Fourier transforms is helpful. If the latter two things are unknown to you it shouldn't be a problem. If you are not fluent in the former two topics, Probability Theory and Linear Algebra, you will need to learn them as we go. The decision on whether to take the class or not is yours, not mine. If you want to ask my opinion, you can find me at my office in Levine 362 or send me an email.
For homework assignments we will use Matlab. Programming in Matlab is easy. If you have not used it before you will need a couple of days worth of work to pick up the basics. Matlab's user guide can be downloaded from here.
We will have a midterm and a final valued at 36 points each. We will also have 14 weekly homework assignments valued at 2 points each. If your homework is truly outstanding, you will be awarded 3 points for it. The grade scale is the usual. At least 60 points are required for passing, a C requires at least 70 points, a B at least 80 and an A at least 90. In earlier editions of this class 3/4 of students scored A's (between A+ and A-).
The professor teaching this class is Alejandro Ribeiro. He holds office hours on Wednesdays from 11am to noon in Levine 362. The class meets on Mondays, Wednesdays, and Fridays from 10am to 11am in Moore 216.