A New Approach to Reinforcement Learning for Sequential Robotic Tasks using a Chained Options Model and Subtask-Focused Rewards
Loading...
Date
2021-09-09
Authors
Daga, Somesh
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Reinforcement Learning for Robotics is a trending area of research with tremendous
potential for widescale industry adoption. To its detriment, large amounts of environmental
interactions are typically required by robotic agents to discover good behaviours.
In response, Hierarchical Reinforcement Learning methods are gaining traction and have
demonstrated improved learning efficiencies through employing abstractions in the learning
process. Additionally, implications on safety arising from black-box agents operating in
physical environments, has generated interest in exploring explainable forms of learning.
In this thesis, we leverage a popular form of Hierarchical Reinforcement Learning,
known as the Options Framework, to address learning for tasks that may be expressed
as a sequential composition of subtasks. This form of task decomposition is prevalent
in classical approaches to many robotic planning and control applications, and offers an
avenue to segment tasks into sets of distinct and interpretable behaviors.
As our primary contribution, we propose a novel, potential-based reward formulation
and decomposition, that is conducive to subtask behavior specialization and incentivizes a
learning agent to solve the composite task, under the Options Framework. As a result, we
offer increased visibility into the actions of the agent at the subtask level. An off-policy
Maximum Entropy Deep Reinforcement Learning algorithm is developed to simultaneously
discover relevant policies across subtasks and determine when to transition between
subtasks in an end-to-end learning scheme. Furthermore, we propose a chained option
execution model to leverage expert knowledge of the task and promote stability in the
learning of subtask transitions. Finally, segmenting agent behaviors at the subtask level
allows for the injection of expert knowledge into the action spaces of individual subtasks,
which we exploit through the use of default actions.
We demonstrate the results of our work on high-dimensional, simulated 2D and 3D
manipulator environments, for the tasks of pick-and-place and opening a door.
Description
Keywords
reinforcement learning, robotics, manipulation