Robotic Reach, Grasp, and Pick-and-Place using Combined Reinforcement Learning and Traditional Controls

Lobbezoo, Andrew

Robotic Reach, Grasp, and Pick-and-Place using Combined Reinforcement Learning and Traditional Controls

Files

Lobbezoo_Andrew.pdf (3.81 MB)

Date

2022-09-01

Authors

Lobbezoo, Andrew

Advisor

Kwon, Hyock Ju

Publisher

University of Waterloo

Abstract

Electrically actuated robotic arms have been implemented to complete tasks which are repetitive, strenuous, and/or dangerous since they were first developed in the 1970s. More than 50 years have passed since initial development; however, robots in factories today are still operated with the conventional control strategies requiring individual programming on a task-by-task basis, with no margin for error. The implementation of conventional controls relies on experienced technicians and skilled robotic engineers sending commands on graphical or text-based programming interfaces to perform simple actions. Although automation has been shown to drastically increase productivity and reduce workplace injuries, the initial time and R&D cost for setting up robotic agents with traditional methods is presently too large for many firms. As an alternative to traditional operation planning and task programming, machine learning has shown significant promise with the development of reinforcement learning (RL) based control strategies. With RL, robotic agents can be presented with a task which they learn to solve through the exploration of various action sequences in the real world, or on internal simulated models of the environment. There are some existing RL applications; however, most examples are based on relatively simple video games and basic robotic tasks (inverted pendulum, vector-based reach, and so on). Additionally, the documentation for much of this research is limited, there is little real-world testing, and there is room for significant improvement in performance. The objective of this project is to implement RL based control strategies in simulated and real environments to validate the RL approach for standard industrial tasks such as reach, grasp, and pick-and-place. The goal for this approach is to bring intelligence to robotic control so that tasks can be completed without precisely defining the environment, target object positions, and action plan. To achieve the primary objective of this research, the following sub-objectives were pursued: 1) develop a custom simulation task environment, 2) create an RL pipeline for tuning and training a robotic RL agent, 3) develop a methodology for a novel semi-supervised RL system for improving image-based RL, 4) setup the Panda robot and establish a communication, control, and path planning system, and 5) tune, train, and test simulated and real-world RL based control. After developing the environments, creating the training and tuning framework, and establishing the real-world robotic control in objectives 1 to 4, extensive training and testing was conducted in objective 5. Results from testing showed that model performance was highly dependent on task difficulty. A high task completion rate was the outcome from training an RL network in simulation with coordinate-based positional feedback. For this simulation set, the robotic agents were able to independently learn to complete tasks with a high precision and repeatability. The outcome from training a network in simulation with image-based positional feedback was respectively poor. For the image-based tasks, the agent converged on sub-optimal solutions and underperformed expectations due to difficulties training the CNN positional location extractor. To overcome the issues with image-based RL training, the novel semi-supervised RL approach was implemented and tested. The results from this testing indicate that RL training performed well with image-based inputs given a pre-trained feature extractor. The semi-supervised methodology shows potential; however, this approach has the downside of requiring additional data collection for supervised training. After training in simulation, real-world reach, grasp, and pick-and-place testing was completed with coordinate-based positional inputs. The real-world testing validated the communication framework between the simulated and real environments and indicated that real-world policy transference was possible. Accuracy of coordinate-based reach, grasp, and pick-and-place was reduced by 10-20% compared to the simulation environment, which indicates that additional model calibration is required. The results from this research provide optimistic preliminary data on the application of RL to robotics. Further research, which is required to bridge the gaps on image-based learning, should include network generalization, domain adaptation, and imitation learning.