Collaborative Mobile Manipulation Among Humanoid Robots
The goal of this project is to enable mobile robots to collaborate on tasks that exploit synergies that no independent robotic actor could achieve. The specific system that will be implemented provides a cooperative system where two DARwIn-OP humanoid robots pick-up a table and carry it.
This cooperative system demands a level of synchronization between robots. Firstly, both robots must agree on a coherent object state model. Maneuver a common object, requires coordination of a grasping strategy. Finally, the dynamics of humanoid robots require two harmonized bipedal walking gaits.
From the University of Toyko, there are a few examples of cooperation between humanoid robots [1,2]. In these systems, machine learning techniques to find optimal motion plans and motion primitives (approach, slide, spin, etc.) in transporting objects.
In these articles, however, the specific implementation is not discussed, with minimal information on experimental setup. In another article , the experimentation is better discussed, but the vision system lies exogenous to the robots. Thus there is a "third person omniscient" view of the situation to command both robots. In our system, we would like to have local information rather than global information.
The final paper  describes a camera system for each robot, on board, but little else in the experimentation.
In order to tackle these demands, the project is split into four distinct problems. There is Object State Recognition, Robot Alignment, Grasping Strategy, and Harmonized Gait.
The Object State Recognition is broken down into a problem of identifying potential tables in an image, and filtering down to reduce false positives. The most likely object remaining (if any objects remaining) is determined to be the table. Once the table is identified, the key features (table orientation, position, and distance) are calculated.
Since the table consists of two colored parallel and cylindrical poles, we will identify and filter potential tables based on these known characteristics.
Firstly, all connected pink pixel regions are deemed to be potential tables. As such, a color classifier is trained on sample images to form a look up table. This look up table assigns a label (table or non-table) to every pixel in an image, thereby making a labeled image.
From this labeled image, the connected components algorithm is run to collect a set of "objects." Each object is then run through a size filter - if the object does not include a certain amount of pixels, then it is cast off as noise. Next, the remaining objects are filtered on aspect ratio. The relative height to width of the poles is known to be a certain fraction, so objects that are not close to this fraction are also cast aside.
The two largest remaining objects are then considered to be the two poles. The relative ground coordinates (from the image coordinates) is calculated. The calculation is done by... The table "centroid" is the midpoint between the calculated centroids of the two poles. The orientation of the table is the direction that is perpendicular to the line segment connecting the two centroids.
If only one pole is seen, then the same calculations are carried out, but the centroid is left to be just the centroid of the single pole.
To align itself correctly, Darwin needs to have the same orientation as the table. Darwin also needs to be approximately 22mm from the centroid in order to directly over top of the table handles. This alignment is crucial in order to pick up the table without "wiff"ing or dropping the table.
This alignment is achieved through a Finite State Machine governing locomotion strategy. First in the strategy is that Darwin must walk around and move its head until it finds a pole. It then orients its body to face the poles (discovering both poles since they shortly both come into the camera frame when one pole is focused upon). In the orientation process, it lines itself up 40mm away from the centroid, so as not to knock into the handles in the orbiting phase (next). The orbiting phase has Darwin walk around the table, while still facing it. Darwin walks around until it holds the same orientation as the table (within +/- 5 degrees). With roughly the same orientation, Darwin then proceeds to approach the table, controlling on its position and orientation until it is 22mm (+/-1mm) from the centroid, and facing the same direction. Both robots follow this same policy. If at any time Darwin is "off trajectory," it will return to a previous state and start again.
Upon reaching this "Aligned" state, the first robot to reach the state broadcasts an "I'm ready" packet over Wifi. Actually, each robot's state is constantly broadcasted over "wifi." When both robots are in "Aligned," they proceed to execute the pickup motion.
The grasping strategy for the pickup motion is comprised jointly of keyframe motion and inverse kinematics (IK) with balance feedback. The desired hip height is sent to the ZMP engine for IK control, while the arms are given keyframe values over time.
Finally, with the table in the hands of the two robots, a Harmonized Gait is considered for stability purposes. In this system, the touchdown times of each step of each robot are coordinated. They are coordinated to work in gallop or trot gait fashion. A trot synchronizes diagonal feet, while gallop synchronizes directly opposing feet.
To accomplish these synchronizations, each robot broadcasts current stepping foot, and progress on that step. The opposing robot controls its step period using this information. Time delay compensation required due to wireless communication link.
In trials, two separate portions of the system worked very well - gait harmonization and walking to and pickup the table. However, transitioning from picking up the table to walking with it was a tough problem.
Over a set of 5 trials, it took an average of 23 seconds for both Darwins to reach the correct alignment in front of the posts. The Darwins would fail to go into place if they decided to assume the same side of the stretcher.
Once in position, the Darwin would pick up and grip the stretcher with a good grip (one where each Darwin holds the stretcher with both hands) 50 % of the time, over 4 trials.
Once grasped, it took an average of 10 seconds before the Darwins would fall after moving the stretcher backwards or forwards. This averaging was done over five trials - 2 of which were thrown out because Darwin or the stretcher fell with a bad grip under 5 seconds.
To test the synchronizations, the sway (yaw) of the table was qualitatively observed. In these tests, the gallop style adheres to rigid table better.
I believe that I can make the approach phase of the alignment faster by increasing the step rate of various phases of the state machine. To keep the Darwins from running into each other, there are two remedies. First, make a Darwin recognition function that detects a darwin in an image. Second, color the two poles differently.
To achieve a better grip, I believe that I need to tighten the thresholds of alignment process. The Darwin would need to be even more confident of how close it is the stretcher. This would sacrifice on speed, however, as a slower gait would be required.
For the Darwins to walk together for a longer period of time, there are two immediately evident options. The first is to reduce the number of direction changes. In my trials, I used 3 or four direction changes in a relatively short period of time. These direction changes put the Darwins in misaligned positions and also loosen the grip. The second option is to provide a control on the distance between the robots in addition to synchronization.
I would like to try more scenarios to test the differences between gallop and trot, as yaw is only one indication of stability.
In summary, the system was implemented fully on the robots, but certain parameters need to be tuned. Potentially, an extra "wait" period should be used once the table is picked up so that the two robot system can stabilize before moving on to carrying the table.
 Yutaka Inoue, Takahiro Tohge, and Hitoshi Iba. Cooperative Transportation by Humanoid Robots - Learning to Correct Positioning
 Yutaka Inoue, Takahiro Tohge, and Hitoshi Iba. Learning to Acquire Autonomous Behavior - Cooperation by Humanoid Robots
 Heonyoung Lim, Yeonsik Kang, Joongjae Lee, Jongwon Kim, Bum-Jae You. Multiple Humanoid Cooperative Control System for Heterogeneous Humanoid Team. Proceedings of the 17th IEEE International Symposium on Robot and Human Interactive Communication, Technische Universität München, Munich, Germany, August 1-3, 2008
 S. Kamio, H. Iba. Object Transportation Using Two Humanoids. Intelligent autonomous systems 9. http://books.google.com/books?id=Z6u1un0AwVEC&lpg=PA318&ots=uwhTNNEUQA&lr&pg=PA325#v=onepage&q&f=false
Implement quadruped gaits on two humanoids that are coupled through a grasped rigid/soft body
Assign 3 or more humanoid actors to manipulate a common object
My Comments on the Final Presentations
The use a tissue box was novel and useful. I would have liked to know which color the fourth side had.
I appreciated the optimization of the paths. How long did this take? Is it possible to "close the loop" for any kicks for online updates?
I loved the very clear examples of how your algorithm worked. The use of splines was clearly outlined as the best first approach. How does the technique scale with larger numbers of joints? Aside from arm planning, where do you see your technique being used?
I think it would benefit to describe briefly the hard coded/static areas of your project vs. the dynamic portions. I did like how you involved a "modular" approach, where you take any door detector module. Also - great job getting the pinch to work. Can the Scarab relay useful obstacle information to the PR2?
The approached seemed to scale well with different primitives. How does your algorithm scale with more degrees of freedoms?
It would be cool to see the various situations where your approach works well compared to traditional approaches, and where yours fails - just to get a sense of "robustness". I appreciated the error metrics for response to a disturbance for visualizing an example use.
How can noise be modeled in your system? What happens with more agents?
What happens with two of those magnetized objects? Is independent control possible?
What is an appropriate application for me to use your technique?
I appreciated your chalkboard example scenarios where certain planners can fail, and the appropriate fixes. I'd like to see metrics for how these planners scale (time, memory, theoretical bounds)
O snap - thought it was impressive how you were gathering data straight from point clouds. If you could make the objects colored and cylindrical/spherical, could you identify the objects more rapidly, a la Darwin's pink posts?
I liked how you explored the details of various pattern matching techniques with regard to scaling. How does K-means compare to connected components? Do you use backpack observations from one frame to the next to give a certain area in the image a higher probability of containing the backpack?
I liked seeing the algorithm with different numbers of agents (5 vs. 9). How does your methods compare to other methods?
Impressive and thorough demonstration. I'd like to see how Microsoft's original algorithm compared to yours on a selection of images. Where did yours succeed where it failed? Visa versa.
How does your approach scale with the number of hunters and with the number of targets, a la MAGIC 2010?
I appreciated the descriptive animation - very informative how the robot behaved. Can you run the animations without video, though, to see that tic-tac-toe occupancy guess? Would this allow n-agents to be used?
How does your pitch compare to PhillieBot? Is there a video of your windup?
Great demonstration. Is there a way to make different color obstacles weigh different costs, so that you may cross some (rough terrain) pieces of paper, while still totally avoiding pink?
Well done. Is it possible to extend your method so that you can step over an obstacle, since the robots are humanoid?