# Homework

Homework sets and solutions for ESE 303. The class comprises 14 homework sets, each of which is a small project that studies a particular stochastic system. You are typically required to answer some questions about properties of the stochastic system, write a simulation and use this simulation to understand some properties of the system. The list of homework sets and solutions follows. If you are the kind of person that prefers bare bone lists, refer to the calendar. You can refer to the end of the page for matters about logistics and the policy on grading, collaboration and use of posted solutions.

## List of homework sets and solutions

**Introduction.** In this first block you analyze a simple stochastic system. This first assignment is a little unconventional since we are asking you to perform the type of analysis this class is supposed to teach you. Therefore, some parts of it might not be very clear to you but will become clearer as we progress in the class. The idea is to gain an appreciation for what the challenges are in analyzing a stochastic system. I also want you to appreciate how it is possible to state facts about the system's behavior even if randomness implies that anything is possible.

- Week 1, analysis of a lower bounded biased random walk (due on Friday 9/1). This example considers a certain game in a certain casino where your chances of winning are slightly better than your chances of loosing. More specifically, you place 1-dollar bets, with probability $p$ you are paid 1 dollar and with probability $1-p$ you loose you bet. The good news is that $p>1/2$. The catch is that you have to keep playing forever or until you loose all your money. If you have $w$ dollars is it a good idea to play this game or not? You have to balance the fact that while you are more likely to win than loose, there is always a chance of getting unlucky a few times and losing all your money. And since you have to keep playing forever the latter possibility cannot be ignored heedlessly. If you are having trouble with this homework, this piece of code simulates one experiment. This other code repeats the experiment a hundred times and the numerical analysis of outcomes is undertaken here. You are well advised to first try things on your own. Homework solution is available here.

**Probability review.**

- Week 2, relationship between Bernoulli, binomial, Poisson and normal random variables (due on Friday 9/8). On a fundamental level all random phenomena depend on something happening or not happening. Thus, all random phenomena are coin tosses. If that is true, then why do we study probability? The answer is that random phenomena tend to generate regular structure. If you throw a coin a thousand times, you can expect quite certainly to get at least 400 heads. We started with an uncertain system but ended up with a certain statement. This regularity appears in the limit of sequences of random variables and is exploited by anyone doing serious applications of probability to real world problems. In the end, the fact that randomness generates structure is the core epistemological value of the theory of probability (this said by Kolmogorov, not me). To help you appreciate the beauty of that this problem starts with sequences of coin tosses (Bernoulli random variables) and considers the number of heads in the sequence. As the number of tosses increases, you'll see how Poisson and normal random variables appear naturally. These two limit behaviors are examples of random phenomena with a particular structure (sequences of Bernoulli, described by binomial distributions) that when considered in a large scale give rise to a different type of structure (normal or Poisson). Homework solution is available here.
- Week 3 (due on Friday 9/15). Say you are selling your car for which you are going to receive $J$ offers. Offers will be of different value, and if all $J$ of them were presented upfront, the car would be sold to the highest bidder. Alas, potential buyers make their offers in a random order and if not accepted they withdraw their bid – presumably, they can ﬁnd a different seller willing to accept their offer. The issue here is the design of a decision mechanism that allows you to sell your car at a reasonable price. In principle, it makes sense to discard the first few bids since you do not know where they fit in the pool of potential offers. Then, you can choose the first offer that fares comparatively well against those that you rejected. The questions here are how many bids do we need to discard and what does it mean for an offer to fare comparatively well with respect to others. Homework solution is available here.

**Markov chains.**

- Week 4, propagation of mitochondrial DNA types (due on Friday 9/22). Mitochondrial DNA is passed from mother to children without genetic contribution from the father. All the variability in mitochondrial DNA is due to random mutations accumulated over time. Using estimates of the mutation rate and differences on mitochondrial DNA between humans it becomes possible to estimate the time at which groups became distinct populations. We are not computing these times here but a few facts about a Markov chain that models the propagation of mitochondrial DNA. In particular, you are asked to follow your mitochondrial DNA for the next few centuries. The Markov chain that we use here belongs to the class of branching processes that are also used, for example, to model population dynamics, chain reactions, the growth of links to web pages and the emergence of hub airports. Homework solution is available here.
- Week 5, Aloha protocol for random access communication (due on Friday 9/29). A pervasive situation in communication systems is to have a common infrastructure access point serving a group of physically distributed devices. This is, for example, the situation of your cellphone sending information to the cellular base station, your laptop transmitting to the 802.11 wireless router, or a group of satellites communicating with a common ground station. In these three cases there is a common AP, the base station, the router, or the ground station, serving a group of distributed devices, cellphones, laptops, or satellites. A problem in the uplink communication from physically distributed terminals to the AP is how to separate the information transmitted by different devices. One possibility is to assign different times or frequencies to different terminals. This is called time or frequency division multiplexing. Another possibility is to let terminals transmit at random and hope for luck to avoid simultaneous transmissions. This is called random access. Random access is advantageous because it requires little coordination between terminals. However, it is not clear that random access is a viable communication strategy. You will see in this problem that random access actually performs surprisingly well. A minor variation of this protocol is used in WiFi standards. Cellphones use this protocol to reserve resources for a voice call (the actual voice is transmitted using a different protocol). Short message services (SMS) also use a version of this protocol. Random access goes under the picturesque name of Aloha because it was first applied to provide satellite communication services to the islands of Hawaii. Homework solution is available here.
- Week 6, Google's PageRank algorithm (due on Monday 10/9). The most popular algorithms to rank pages in a web search are stochastic. Consider a web surfer that visits a page and clicks on any of the page's links at random. She repeats this process forever. What fraction of his time will be spent on a given page? The answer to this question is the rank assigned to the page. The same idea can be used to understand the structure of networks in different settings. For example, we can use this algorithm to extract connectivity information from a social graph. Say we choose a student and ask her to direct us to any of her friends selected randomly. We then go to this friend repeat the question and are directed to this new student. This is no different from the random web surfer model. Repeating this process forever we can therefore infer the degree of connectedness of students in the class from the average number of visits to each of them. This is not a pointless exercise. To market products, for example, it is worthwhile to concentrate the effort in the individuals that are most connected to other persons. The important insight here is that the network possesses knowledge that individuals do not. For this homework we use a homework collaboration matrix connecting students that cooperate in homework assignments. Homework solution is available here.

**Continuous time Markov chains.**

- Week 7, arrival of passengers at a subway station (due on Monday 10/16). In order to dimension public transportation systems we need to determine the number of customers that arrive at different stations. Since this a random quantity what we actually need to determine is the probability distribution of the number of customers that arrive in a certain time interval. In this problem you will see that as long as we can think of customers as behaving independently, the number of customers that arrive at a train station can be modeled as a Poisson process. Notice how a very unassuming hypothesis - customers behave independently -, leads to a rich structure - Poisson arrivals. This homework is very short. Since this week is your midterm I do not want you to waste time in it. If you get stuck, feel free to use this code as a reference. It is not acceptable to hand in this code as a solution. You have to provide your own. Homework solution is available here.
- Week 8, cash flow of an insurance company (due on Wednesday 10/25). The purpose of this problem is study a simplified version of the cash flow of an insurance company, and, more specifically, to determine the probability that the company pays dividends in a particular quarter. At any point in time three things might happen: (i) a customer pays a premium, (ii) a claim is paid to a client, or (iii) the company pays dividends. The probabilities of these events are different as are the amounts of money involved. A further constraint on the payment of dividends is that they are paid only if the cash on hand exceeds a certain amount. You will build a continuous time Markov chain model of the company's cash flow and use it to determine the probability that the company pays dividends in a particular quarter. This homework set is challenging and very important. If you do it correctly, you will become very proficient on continuous time Markov chains. Homework solution is available here.
- Week 9, Traffic engineering of a cellular base station (due on Wednesday 11/1). A cellular communication system like the one providing service to your cellphone divides the area of service into small units called cells. Each of these cells is serviced by a base station. There are two reasons for this arrangement, the decay of electromagnetic power with distance and the scarcity of electromagnetic spectrum. Power decay is the reason why there are places in which your phone doesn't work and you get a no-service message, while spectrum scarcity is the reason why there are times at which you cannot place a call and get a busy signal. The effect of having a limited amount of bandwidth is that there is a limit in the number of calls that a base station can handle. Exactly what this limit is and what the system's behavior is close to this limit, depends on the type of technology used. For our purposes suffices to say that there is a maximum number $K$ of calls that the system can handle. A problem that systems engineers have to solve is to determine the need to subdivide existing cells utilizing statistical traffic information collected by base stations. In the first section of this problem you are asked to build and analyze a stochastic model for the placement of calls in the service area of a base station. In the second section, you are asked to solve the problem of deciding when to add a new base station. Homework solution is available here.
- Week 10, stochastic simulation of the Lotka-Volterra system and the lac operon (due on Friday 11/10). In a departure from norm, this homework contains two problems. The first problem concerns the Lotka and Volterra's (LV) predator-prey model that involves a prey species $Y$ and a predator species $X$. The possible events in the system are the birth and death of prey and predators as well as the consumption of prey by predators. The second problem studies the lac operon, which forms the auto-regulatory gene network that controls digestion of lactose in some bacteria. Cells can only use glucose to generate energy, but they can reduce lactose to glucose if the latter is unavailable. Such digestion of lactose into glucose occurs in presence of the enzyme $\beta$-galactosidase. In a well engineered system, $\beta$-galactosidase would be produced only when there is lactose present and there is no glucose available. This is exactly the behavior exhibited by the lac operon auto-regulatory network. Homework solution is available here.

**Gaussian processes.**

- Week 11, white Gaussian noise (due on Friday 11/17). White Gaussian noise (WGN) is probably the most common stochastic model used in engineering applications. The idea is to model a stochastic process $X(t)$ for which individual values are normally distributed and values $X(t_{1})$ and $X(t_{2})$ for different times $t_{1}$, $t_{2}$ are independent. It is not difficult to believe that this is a very simple model. It simply represents the drawing of independent normal random variables at different times. In the first part of this problem you will develop a model of WGN. In the second part you will use WGN to model somewhat more complex systems. The goal is to observe that while WGN is very simple, it can be used to model complex stochastic systems. Homework solution is available here.
- Week 12, arbitrages in sports betting (due on Wednesday 11/22). Consider a random system in which you can place bets on different outcomes. An arbitrage is an opportunity to realize a gain independently of the outcome. More specifically, consider a sports event in which you can bet $x$ dollars on team A and $y$ dollars on team B. If team A wins you are paid $ax$ dollars and realize a profit $ax-(x+y)$ (the profit is actually a loss if this number if negative). If team B wins, you are paid $by$ dollars for a profit of $by-(x+y)$ dollars. Given $a$ and $b$, an arbitrage is possible if there exist a combination of bets $x$ and $y$ such that both profits are positive. Since bookies are not in the business of giving many away, this is in general impossible. However, it is often possible to bet $x$ with one bookie and $y$ with a different bookie to exploit an arbitrage opportunity. This is what you will do in this problem. Since this week is Thanksgiving, this is a very simple homework that you should be able to finish in less than two hours. Homework solution is available here.
- Week 13, pricing of stock options (due on Friday 12/1). The goal of this homework is to determine the worth of a European style option to buy a stock. A european style option on a stock with price $X(t)$ at time $t$ is a contract to have the option to buy the stock at a predetermined price on a predetermined future time. The option is described by the strike price $K$, the strike time $t$ and its price $c$. Paying $c$ to buy an option at time $0$ gives us the opportunity to buy the stock for the strike price $K$ at the strike time $t$. At time $t$ the worth of the option depends on the value of the stock $X(t)$. If the stock has fallen below the strike price $K$, i.e., if $X(t)\leq K$ the option becomes worthless. If, on the contrary, the price has risen beyond $K$, i.e., if $X(t)>K$ we can realize a gain $X(t)-K$ by exercising the option to buy the stock at price $K$. Given historic data about the stock's value, it is possible to have some expectations about the value of the stock at time $t$. The relevant question here is how to use these expectations to determine the price $c$ that should be payed to buy the option at time $0$. In this problem we use the closing price of Cisco (CSCO) for the last year to determine the price $c$ of an option to buy CSCO. We are using actual data, as you can corroborate from information available in Google finance (click on 1yr). Homework solution is available here.
- Week 14, Linear filtering of stochastic signals (Friday 12/8). In any system that involves electric signals we are interested in modifying them. For example, signals can be modified to remove noise or equalized to improve the perception of sound quality. The problem is how to design these operators for signals that are not know at the time of system design. In this problem you will design a sound equalizer to alter the spectral components of a sound wave. Homework solution is available here.

## Logistics

Homework sets are due by 5pm on Mondays, Wednesdays, or Fridays as per the calendar above. I ask that you please make your best effort to meet the deadline. If you can't, for whatever reason, I have no problem granting homework extensions. If you can't make the deadline, don't give me an excuse. Just tell me you need a few more days and that's it. That said, do not abuse that prerogative, if you are late a couple of times in the term, that's OK. If you are late more than that you need to organize your time better.

There is a drop box for you to hand in your homework at the ESE office. You have to drop your homework in that box by 5pm on the due date.

The class has two TAs to help you out with homework: Luiz Chamon and Kate Tolstaya. Office hours can be found here. Emails about homework or anything related to the course should be sent to ese303@seas.upenn.edu.

## Policy on grading, collaboration and use of posted solutions

The purpose of homework is to help you learn. Their implication in your final grade is mostly irrelevant. For each of the sets you will get a grade of 0, 1, or 2. You will get 1 point if the work is well done, 2 points if it is very well done. You will get no points if you skip the assignment. Occasionally, we'll give you 3 points for work that is exceptional in some way. The typical grading distribution includes 10 no shows, 45 to 50 very good sets receive a 2, 2 or 3 students receive a 1, and one student receives a 3. That means that at the end of the term almost all of you will have between 26 and 29 homework points as long as you don't skip homework. The points are essentially given for hard work. Therefore, it is foolish of you to use collaboration or posted solutions to avoid doing work. It'll prevent you from learning and you won't gain any advantage with respect to your peers. Despite of this we are aware that some of you will, for example, hand in the posted solution as your own. This is not only foolish but unethical. If you engage in unethical behavior you will get no points for the assignment and I will recommend that you drop the class and retake it next year.

Going back to the topic of learning, collaboration with peers is allowed and encouraged. If you are struggling with the material, your classmates are a more valuable resource than the teaching staff. If you are doing well in the class, there is no better use of your time than helping your peers. It is a very surprising fact of life that happiness is to be found when you give to someone, not when you receive from someone. The appropriate use of the posted solution is whatever you think helps you learn best. In general, that means you do not look at it after you submit your solution. Work on the assignment and try to make progress. If you're stuck, go talk with a classmate. If you are still stuck go talk with the TAs. Then come talk with me. After you submit your solution look at ours, it may teach you one or two extra things.