Asking for Help with a Cost in Reinforcement Learning
MetadataShow full item record
Reinforcement learning (RL) is a powerful tool for developing intelligent agents, and the use of neural networks makes RL techniques more scalable to challenging real-world applications, from task-oriented dialogue systems to autonomous driving. However, one of the major bottlenecks to the adoption of RL is efficiency, as it often takes many time steps to learn an acceptable policy. To address this problem, we investigate the idea of allowing the agent to ask for advice from a teacher. We formalize this concept in a framework called ask-for-help RL, which entails augmenting a Markov decision process with a teacher-query action that can be taken at a fixed cost in any state. In this task, the agent faces a dilemma between exploration, exploitation, and teacher-querying. To make this trade-off, we propose an action selection strategy that is rooted in the classical notion of value-of-information, and suggest a practical implementation that is based on deep Q-learning. This algorithm, called VOE/Q, can jointly decide between taking a particular environment action or querying the teacher, and is sensitive to the query cost. We then perform experiments in two domains: a maze navigation task and the Atari game Freeway. When the teacher is excluded, the algorithm shows substantial gains over many other exploration strategies from the literature. With the teacher included, we again find that the algorithm outperforms baselines. By taking advantage of the teacher, higher cumulative reward can be achieved than with standard RL alone. Together, our results point to a promising approach to both RL and ask-for-help RL.
Cite this version of the work
Colin Vandenhof (2020). Asking for Help with a Cost in Reinforcement Learning. UWSpace. http://hdl.handle.net/10012/15872
Showing items related by title, author, creator and subject.
Song, Haobei (University of Waterloo, 2019-09-12)The exploration/exploitation dilemma is a fundamental but often computationally intractable problem in reinforcement learning. The dilemma also impacts data efficiency which can be pivotal when the interactions between the ...
Sucholutsky, Ilia (University of Waterloo, 2021-06-15)The tremendous recent growth in the fields of artificial intelligence and machine learning has largely been tied to the availability of big data and massive amounts of compute. The increasingly popular approach of training ...
Minhas, Manpreet Singh (University of Waterloo, 2019-12-17)Detecting anomalies in textured surfaces is an important and interesting problem that has practical applications in industrial defect detection and infrastructure asset management with a lot of potential financial benefits. ...