Use of Slip Prediction for Learning Grasp-Stability Policies in Robotic-Grasp Simulation

Loading...
Thumbnail Image

Date

2023-07-04

Authors

Stracovsky, Lukas

Advisor

Kofman, Jonathan
Soo, Jeon

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

The purpose of prosthetic hands is to restore a portion of dexterity lost through upper limb amputation. However, a key capability of human grasping that is missing from most currently available prosthetic hands is the ability to adapt grasp forces in response to slip or disturbances without visual information. Current prosthetic hands do not have the integrated tactile sensors or control policies to support adaptive grasp stabilization or manipulation. Research on slip detection and classification has been providing a pathway towards integrating tactile sensors on robotic and prosthetic hands; however, current literature focuses on specific sensors and simple graspers. Policies that use slip prediction to adapt grasp forces are still largely unexplored. Rigid-body simulations have recently emerged as a useful tool for training control policies due to improvements in machine learning techniques. Simulations allow large amounts of interactive data to be generated for training. However, since simulations only approximate reality, policies trained in simulation may not be transferable to physical systems. Several grasp policies with impressive dexterity have been trained in simulation and transferred successfully to physical systems. However, these grasp policies used visual data as policy inputs instead of tactile data. This research investigates if rigid-body simulations can use slip prediction as the primary input for training grasp stabilization policies. Since current slip detection and prediction literature is based on specific tactile sensors and grasper setups, testing slip-reactive grasp policies is difficult, especially with an anthropomorphic hand. As an alternative to implementing a system-specific policy, real human grasp poses and motion-trajectories were used to test if the trained policy could replicate known human grasp stability. To acquire the human grasp data, grasp and motion trajectories from a human motion-capture dataset were adapted into a simulation. Since motion-capture only includes grasp and object pose data, grasp forces had to be inferred through a combination of analytical and iterative methods. Simulation contacts are also just approximate models; therefore, slip in the simulation was characterized for detection and prediction. The stability of the converted grasps was tested by simulating the grasp manipulation episodes with no control policy. Viable grasps were expected to maintain stability until the manipulation trajectory caused grasp degradation or loss. The initial grasps maintained stability for an average of 27.7% of the grasp episode durations, though with a wide standard deviation of 35%. The large standard deviation is due to episodes with high hand acceleration trajectories, as well as grasp objects with varying grasping difficulty. Policy training using the imported grasps and trajectories was performed using reinforcement learning, specifically proximal-policy optimization. Policies were trained with and without slip prediction inputs, using different reward functions: a reward proportional to the duration of grasp stability, and a reward that also added a grasp-force magnitude penalty. A multi-layer perceptron was used as the policy function approximator. The policies without slip-prediction inputs did not converge, while the policy with slip inputs and the grasp-force penalty-reward function converged on a poorly performing policy. On average, episodes tested with the policy that used a grasp-force-penalty had a 0.11 s reduction in grasp stability duration compared to the initial grasp duration results. However, episodes that did have improved stability under the learned policy improved on average by 0.38 s, significantly higher than the average stability loss. Moreover, the change in stability duration under the trained policy negatively correlated with the initial stability duration (Pearson -0.69, p-value 9.79e-11). These results suggest that slip predictions contribute to learned grasp policies, and that reward shaping is critical to the grasp-stability task. Ultimately, the trained policies did not perform better than the baseline no-policy grasp stability, suggesting that the slip predictions were not sufficient to train reasonable grasp policies in simulation.

Description

Keywords

Mujoco, Reinforcement Learning, Grasp stabilization

LC Keywords

Citation