Robotic Grasping using Demonstration and Deep Learning

Reyes Osorio, Victor

Robotic Grasping using Demonstration and Deep Learning

Files

ReyesOsorio_Victor.pdf (8.96 MB)

Date

2019-09-11

Authors

Reyes Osorio, Victor

Advisor

Tripp, Bryan

Publisher

University of Waterloo

Abstract

Robotic grasping is a challenging task that has been approached in a variety of ways. Historically grasping has been approached as a control problem. If the forces between the robotic gripper and the object can be calculated and controlled accurately then grasps can be easily planned. However, these methods are difficult to extend to unknown objects or a variety of robotic grippers. Using human demonstrated grasps is another way to tackle this problem. Under this approach, a human operator guides the robot in a training phase to perform the grasping task and then the useful information from each demonstration is extracted. Unlike traditional control systems, demonstration based systems do not explicitly state what forces are necessary, and they also allow the system to learn to manipulate the robot directly. However, the major failing of this approach is the sheer amount of data that would be required to present a demonstration for a substantial portion of objects and use cases. Recently, we have seen various deep learning grasping systems that achieve impressive levels of performance. These systems learn to map perceptual features, like color images and depth maps, to gripper poses. These systems can learn complicated relationships, but still require massive amounts of data to train properly. A common way of collecting this data is to run physics based simulations based on the control schemes mentioned above, however human demonstrated grasps are still the gold standard for grasp planning. We therefore propose a data collection system that can be used to collect a large number of human demonstrated grasps. In this system the human demonstrator holds the robotic gripper in one hand and naturally uses the gripper to perform grasps. These grasp poses are tracked fully in six dimensions and RGB-D images are collected for each grasp trial showing the object and any obstacles present during the grasp trial. Implementing this system, we collected 40K annotated grasps demonstrations. This dataset is available online. We test a subset of these grasps for their robustness to perturbations by replicating scenes captured during data collection and using a robotic arm to replicate the grasps we collected. We find that we can replicate the scenes with low variance, which coupled with the robotic arm’s low repeatability error means that we can test a wide variety of perturbations. Our tests show that our grasps can maintain a probability of success over 90% for perturbations of up 2.5cm or 10 degrees. We then train a variety of neural networks to learn to map images of grasping scenes to final grasp poses. We separate the task of pose prediction into two separate networks: a network to predict the position of the gripper, and a network to predict the orientation conditioned on the output of the position network. These networks are trained to classify whether a particular position or orientation is likely to lead to a successful grasp. We also identified a strong prior in our dataset over the distribution of grasp positions and leverage this information by tasking the position network to predict corrections to this prior based on the image being presented to it. Our final network architecture, using layers from a pre-trained state of the art image classification network and residual convolution blocks, did not seem able to learn the grasping task. We observed a strong tendency for the networks to overfit, even when the networks had been heavily regularized and parameters reduced substantially. The best position network we were able to train collapses to only predicting a few possible positions, leading to the orientation network to only predict a few possible orientations as well. Limited testing on a robotic platform confirmed these findings.