Show simple item record

dc.contributor.authorVandenhof, Colin
dc.date.accessioned2020-05-15 19:43:16 (GMT)
dc.date.available2020-05-15 19:43:16 (GMT)
dc.date.issued2020-05-15
dc.date.submitted2020-04-16
dc.identifier.urihttp://hdl.handle.net/10012/15872
dc.description.abstractReinforcement learning (RL) is a powerful tool for developing intelligent agents, and the use of neural networks makes RL techniques more scalable to challenging real-world applications, from task-oriented dialogue systems to autonomous driving. However, one of the major bottlenecks to the adoption of RL is efficiency, as it often takes many time steps to learn an acceptable policy. To address this problem, we investigate the idea of allowing the agent to ask for advice from a teacher. We formalize this concept in a framework called ask-for-help RL, which entails augmenting a Markov decision process with a teacher-query action that can be taken at a fixed cost in any state. In this task, the agent faces a dilemma between exploration, exploitation, and teacher-querying. To make this trade-off, we propose an action selection strategy that is rooted in the classical notion of value-of-information, and suggest a practical implementation that is based on deep Q-learning. This algorithm, called VOE/Q, can jointly decide between taking a particular environment action or querying the teacher, and is sensitive to the query cost. We then perform experiments in two domains: a maze navigation task and the Atari game Freeway. When the teacher is excluded, the algorithm shows substantial gains over many other exploration strategies from the literature. With the teacher included, we again find that the algorithm outperforms baselines. By taking advantage of the teacher, higher cumulative reward can be achieved than with standard RL alone. Together, our results point to a promising approach to both RL and ask-for-help RL.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectreinforcement learningen
dc.subjectapprenticeship learningen
dc.subjectimitation learningen
dc.subjectlearning from demonstrationen
dc.subjecthuman-in-the-loopen
dc.subjectinteractive reinforcement learningen
dc.subjectdeep reinforcement learningen
dc.subjectactive learningen
dc.subject.lcshReinforcement learningen
dc.subject.lcshActive learningen
dc.titleAsking for Help with a Cost in Reinforcement Learningen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws.contributor.advisorLaw, Edith
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages