## Multi-Robot Information Gathering

## Introduction and Related Work

Networks of robotic sensors have the ability to collect data in a wide variety of unknown environments, being particularly useful in areas that are difficult or unsafe for humans to enter. This project is about the implementation of one such algorithm based on the gradient of mutual information, and is a special case of the algorithm described in [1]. The difference between the work in [1] and this project is that we do not consider sensor failures due to hazards in the environment here.

One example of a scenario where such an algorithm could be used is the recent accident at the Fukushima nuclear power plant, where an earthquake and tsunami damaged the facility leading to radiation leaks. While workers need to locate damaged areas to repair them, it is hazardous for humans to enter the facility. It would obviously be preferable for a team of robots do this exploration and then let other robots, or human teams, could then come in to make the repairs. A cartoon illustrating this example can be seen below.

**Figure 1**: *Hypothetical map of the events shown over an image of the Fukushima plant from http://maps.google.com/. The events of interest are structural defects represented by the explosion symbols, and the contour lines represent the probability of detecting these defects. Sensors move to determine where the structural defects are by increasing the informativeness of their sensor readings, with the directions represented by the arrows. Black circles represent sensors that see a defect while white circles do not see a defect.*

Mutual information, as defined by Shannon in [2], is a commonly used quantity to measure the performance in an estimation task. In [3] mutual information was used in a target tracking application, where the authors use particle filters to approximate the mutual information in an effort to decentralize and scale the algorithm. Another paper [4] also uses the gradient of mutual information for state estimation tasks, similar to this, but utilizes a Monte Carlo-based sampling method to increase computational speed.

There are many other methods that do not use mutual information for exploration and estimation tasks. One such method is a Voronoi tessellation-based controller for coverage and tracking from [5], similar to that described by Kartik Mohta.

## Problem Formulation

In this section we will cover the parts of the framework from [1] necessary for the estimation and control problems. We also give the equations used for the specific example shown in the simulation at the end.

### Robot Model

We assume that we have n mobile robots in the plane, each with a path-following controller that can bring the robot from its current position x_i to a position determined by the controller. For the specific example here, we are using the differential-drive scarab robots with the path-following controller used in project 1, i.e. a PI controller where the proportional error is the distance from the path and the integral term is the velocity error.

### Environment

The environment is assumed to be a planar polygon, with the vertices of the polygon given. While the robots move in continuous space, we divide the environment into grid cells for the estimation problem. The task is then to determine which grid cells contain an object of interest (ex. structural damage) with high probability, thereby localizing the targets. Let q_j be the centroid of grid cell j, and let s_j be the state of that cell, i.e. occupied or empty. In our example, the environment is a 15x15m square divided into a 3x3 square grid of cells.

### Sensor Model

We assume that the robots are equipped with simple binary sensors, which return a 1 if they see an event of interest (ex. structural damage in the power plant) and a 0 otherwise. Let y_i denote the output of the sensor on robot i. Note that this does not preclude having other sensors on board for obstacle avoidance, localization, etc. The effects of the coarseness of this sensor model will be discussed later.

For our specific scenario, the scarab robots have a Hokuyo laser range finder on board, which has a 4m range and 240 degree field of view. The targets were 0.5m diameter barrels distributed in the environment. To reduce this down to a binary sensor, RANSAC was used on the laser data to fit a cylinder, returning a 1 if a cylinder of the correct size was detected. The actual model for the sensor is given by

where P_{fn} is the probability of a false negative (0.30), P_{fp} is the probability of a false positive (0.001), r_{ij} is the distance from sensor i to the centroid q_j, r_{\rm max} is the maximum range (4m), and a is a parameter dictating the rate of the logistic drop-off (10 m^{-1}). In the simulation this would actually be a step function, but was approximated with a logistic function because that is smooth.

Then the probability that the sensor does not detect anything is just the complement of the equation above, since the output y_i is a Bernoulli random variable. We also make the assumption that all targets are independent so that the probability of sensor i not detecting any target is given by

## Control Law

As noted in the introduction, the control law is based on the gradient of the mutual information of the time-history of the sensor readings and the environment state, i.e. the occupancy grid. However we do not know the true environment state, so this must be estimated. To do this, we use the recursive Bayes filter given by

where script S is the set of all occupancy grids. The initial distribution is uniform over all possibilities, since we have no prior information about event locations. Also note that taking the expectation of this distribution gives the expected probability of occupancy in each grid cell at a given time step.

### Mutual Information

As the name indicates, the mutual information of two random variables is a measure of how much having knowledge of one variable tells you about the other. In other words, if two random variables were completely independent then they would have no mutual information. In fact we see this appear in the definition of mutual information

where the argument in the log will be identically 1 if the variables are independent, making the mutual information zero. While this expression was written assuming continuous-valued random variables, an analogous formula holds for discrete variables where the integral is replaced by a sum.

### Gradient of Mutual Information

In this scenario, we would like the robots to efficiently gather information as they move about the environment. So by following the gradient of the mutual information the sensors should be near the local optimum to get the maximum information about the target locations at the next time step. While I will not go through the derivation here, it is relatively straightforward to analytically compute the gradient of the mutual information with respect to some dependent parameter, in this case the positions of the sensors. See Theorem 2 in [1] for the derivation of the gradient. This gives us the expression

Note the similarity between this expression and the definition of mutual information.

Using this gradient, we can then define the control law

where the epsilon in the denominator is to prevent singularities when the controller reaches a local maximum. The k is the control gain which, along with the time step, dictates the distance each robot should travel per time step. It is also important to note that while the environment is assumed to be static, the mutual information will be time-varying due to the recursive estimation of the environment state. Due to this, even if the robot returns to the same position as a previous measurement, the gradient that it calculates there can be quite different.

## Simulation Results and Discussion

After laying the foundation above, we can now show simulation results. The first such result is shown in the the figure below. Note that there is no obstacle avoidance in the controller, so at one point the scarab will drive through the barrel.

**Figure 2**: *This figure shows the path of the sensor during the simulation. The sensor starts at the black triangle and ends at the black square. The red circles denote every fifth measurements to give a sense of the time scale. The blue circles are the targets.*

The resulting occupancy grid is given below in table 1. There are a few important things that we can see from this. Firstly, the original sensor model assumes that the field of view is isotropic, which in reality it is not. We partially took this into account by setting the probability of a false negative to be 0.30, but if we look at the cell in row 2, column 3 we see that our estimate is completely wrong. From the simulation, I saw that when the sensor was over there, it was facing towards the top so that the target was in the sensor's blind spot, causing it to miss the target repeatedly and driving the estimate away from the true state.

1.0000 | 0.0000 | 0.0000 |

0.6444 | 1.0000 | 0.0000 |

0.5778 | 0.6369 | 0.5778 |

**Table 1**: *This table shows the expected probability of occupancy for each grid cell.*

We can also see that the controller reaches a local maximum before it has explored the entire environment. This causes our estimates in the unexplored regions to be close to 0.5, meaning that we are very unsure of the state of the environment in those cells. Plots of the mutual information, and the entropy of the state estimate, are shown below.

**Figure 3**: *On the left is the entropy of the state estimate (using log base 2) during the course of the simulation. On the right is the mutual information. The x-axis on both figures is the iteration number.*

Another important observation is about the size of the grid cells. Due to the simplicity of our sensor model, i.e. that the only information about relative location comes from repeated measurements, we will not be able to localize targets in a very precise manner. In other words, having grid cells that are significantly smaller than the footprint of the sensor only adds computational load.

### Simulation 2

If we run the same simulation again, we get different results due to noise added by ROS, RANSAC missing an obstacle, etc. The results from a second run are shown below,

**Figure 4**: *This figure shows the path of the sensor during the simulation. The sensor starts at the black triangle and ends at the black square. The red circles denote every fifth measurements to give a sense of the time scale. The blue circles are the targets.*

0.3104 | 0.0000 | 0.0000 |

0.3911 | 1.0000 | 0.0000 |

0.3105 | 0.3831 | 0.3105 |

**Table 2**: *This table shows the expected probability of occupancy for each grid cell.*

**Figure 5**: *On the left is the entropy of the state estimate (using log base 2) during the course of the simulation. On the right is the mutual information. The x-axis on both figures is the iteration number.*

As we can see, the robot does not get up to the top left corner as it did before and it again misses the target in row 2, column 3. Even running this for more time steps than the last simulation, it remains stuck at the same local maximum. One interesting thing is that all of the uncertain cells are biased towards empty, while last time they were biased toward occupied, however I do not have an intuitive explanation for why this is the case. I also ran this same configuration several more times, with most of the results being relatively similar. However, there was one outlier where the robot went to the upper left area but had the target in the blind spot. The robot then appeared to quickly reach a local maximum and just sat there for the remainder of the time taking measurements of the wall and missing the target.

### Simulation 3

For this simulation I started the robot near the top right corner instead of near the center. As you can see from the video, the robot quickly reaches a local maximum, only exploring a few cells. I found that this behavior was typical for starting positions near the edges of the environment, with the robot not even leaving the starting cell in some cases. Again, I have no intuitive explanation for this behavior but it will be a subject of further study.

**Figure 6**: *This figure shows the path of the sensor during the simulation. The sensor starts at the black triangle and ends at the black square. The red circles denote every fifth measurements to give a sense of the time scale. The blue circles are the targets.*

0.3206 | 0.3206 | 0.3206 |

0.3206 | 0.3206 | 1.0000 |

0.3206 | 0.9761 | 0.0000 |

**Table 3**: *This table shows the expected probability of occupancy for each grid cell.*

**Figure 7**: *On the left is the entropy of the state estimate (using log base 2) during the course of the simulation. On the right is the mutual information. The x-axis on both figures is the iteration number.*

## Future Work

While this project shows a basic proof of concept for this controller, it is only in limited setting of a single robot with a relatively course grid. One obvious next step is to incorporate multiple robots, as the title of the project includes the phrase "multi-robot". While I was not able to accomplish this in the time frame of the project, I plan to continue working on this in the coming weeks.

Secondly, some modifications to the controller are necessary to make it more realistic. We need some sort of obstacle avoidance to avoid having the robot pass through obstacles and walls. Also, when the robot is at a local maximum for a sufficiently long time, we should add a rotational motion to ensure that a target is not hiding in the laser scanner's blind spot.

Lastly, we need to find a way to scale the algorithm. In the current implementation, it is infeasible to run with a finer grid than 3x3; even going up to a 4x4 takes minutes to calculate a single control step. Several of the papers in the related work section make approximations to speed up calculations so we should look into these, and other methods.

## Acknowledgements

I would like to thank Mac Schwager, Prof. Daniela Rus (at MIT), and Prof. Vijay Kumar for their contributions to this project. In particular, much of the MATLAB code used to calculate the control update was written by Mac. I would also like to thank Ben Charrow for his help in setting up the ROS simulation and for patiently answering my barrage of questions on the code.

## Bibliography

- M. Schwager, P. Dames, D. Rus, and V. Kumar. A Multi-Robot Control Policy for Information Gathering in the Presence of Unknown Hazards. In Proceedings of ISRR 2011 (Submitted).
- Shannon, C. E. (1948). A mathematical theory of communication. Bell Systems Technical Journal, 27:379–423.
- Hoffmann, G. M. and Tomlin, C. J. (2010). Mobile sensor network control using mutual information methods and particle filters. IEEE Transactions on Automatic Control, 55(1):32–47.
- Julian, B. J., Angermann, M., Schwager, M., and Rus, D. (2011). A scalable information theoretic approach to distributed robot coordination. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS). Sumbitted.
- Schwager, M., Rus, D., and Slotine, J. J. (2009). Decentralized, adaptive coverage control for networked robots. International Journal of Robotics Research, 28(3):357–375.

## Final Presentation Comments

### Anirudha Mahumdar

I like the idea of using the natural dynamics of a robot to control it, this seems like a very useful vein of research. The presentation would have been improved by more discussion of the dynamic model since it is at the core of the project

### Avik De

I liked the connection between differential geometry and path planning that we had not talked about in class. Nice plots and animations to illustrate what you were doing. However more quantitative results regarding the optimality of the paths would strengthen it.

### Ben Charrow

This was a very cool demonstration and a there was a very clear connection between this specific task and a set of high-level goals (i.e. robots working in human environments). As mentioned in the next steps, adding more perception to detect door handles would be good.

### Brian MacAllister

I liked the demonstration video, it summed up the project well and showed that you were able to get this to work even in a somewhat complicated environment. More figures/diagrams would have been nice to explain the project.

### Caitlin Powers

I like the idea of doing simultaneous control and estimation of an object, it's a much more realistic scenario than exactly knowing the parameters or needing a separate estimation step. I think that having a movie of the arm moving would give more intuition for how a poor estimate of the mass affected the motion of the arm.

### Fei Miao

I liked the comparison between the A* and RRT path planning methods, it was nice to hear more about RRT's since we didn't cover them in class. It seemed like with your control scheme that only one robot moved at a time, so I think simultaneous motion would be a good next step.

### Jason Owens

I liked the video, your ability to segment an amorphous shape like a backpack from the background was impressive. As you go forward with the project you could add feedback based on the vision, for example to get the robot on the correct side of the backpack to open the zipper, or lift it without dumping the contents out.

### Kartik Mohta

I liked the explanation of Lloyd's algorithm and that collision avoidance is, in some sense, implicitly built in to the controller since it keeps the robots spread out. However, a real-world example where this could be used would have strengthend the presentation.

### Menglong Zhu

I like that this project was very different from all of the others, tackling a vision/perception problem rather than control. The video was really interesting. I think that a bit more explanation of how the PR2 was detecting and parsing words would have been useful.

### Stephen McGill

I liked the video of the project and that it combined planning, vision, and control in one concrete setting. A nice way to wrap up the course. I'm not sure of the sensing capabilities of the Darwin robots, but seems like having feedback about the object being lifted would be useful in a coordination setting. For example, if one robot is carrying more of the load then it will likely move slower so the other robot will have to compensate for this.