Asking for Help with a Cost in Reinforcement Learning

dc.contributor.authorVandenhof, Colin
dc.date.accessioned2020-05-15T19:43:16Z
dc.date.available2020-05-15T19:43:16Z
dc.date.issued2020-05-15
dc.date.submitted2020-04-16
dc.description.abstractReinforcement learning (RL) is a powerful tool for developing intelligent agents, and the use of neural networks makes RL techniques more scalable to challenging real-world applications, from task-oriented dialogue systems to autonomous driving. However, one of the major bottlenecks to the adoption of RL is efficiency, as it often takes many time steps to learn an acceptable policy. To address this problem, we investigate the idea of allowing the agent to ask for advice from a teacher. We formalize this concept in a framework called ask-for-help RL, which entails augmenting a Markov decision process with a teacher-query action that can be taken at a fixed cost in any state. In this task, the agent faces a dilemma between exploration, exploitation, and teacher-querying. To make this trade-off, we propose an action selection strategy that is rooted in the classical notion of value-of-information, and suggest a practical implementation that is based on deep Q-learning. This algorithm, called VOE/Q, can jointly decide between taking a particular environment action or querying the teacher, and is sensitive to the query cost. We then perform experiments in two domains: a maze navigation task and the Atari game Freeway. When the teacher is excluded, the algorithm shows substantial gains over many other exploration strategies from the literature. With the teacher included, we again find that the algorithm outperforms baselines. By taking advantage of the teacher, higher cumulative reward can be achieved than with standard RL alone. Together, our results point to a promising approach to both RL and ask-for-help RL.en
dc.identifier.urihttp://hdl.handle.net/10012/15872
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectreinforcement learningen
dc.subjectapprenticeship learningen
dc.subjectimitation learningen
dc.subjectlearning from demonstrationen
dc.subjecthuman-in-the-loopen
dc.subjectinteractive reinforcement learningen
dc.subjectdeep reinforcement learningen
dc.subjectactive learningen
dc.subject.lcshReinforcement learningen
dc.subject.lcshActive learningen
dc.titleAsking for Help with a Cost in Reinforcement Learningen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws.contributor.advisorLaw, Edith
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Vandenhof_Colin.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: