Generalization on Text-based Games using Structured Belief Representations

Adhikari, Ashutosh Devendrakumar

Generalization on Text-based Games using Structured Belief Representations

dc.contributor.advisor	Lin, Jimmy
dc.contributor.advisor	Poupart, Pascal
dc.contributor.author	Adhikari, Ashutosh Devendrakumar
dc.date.accessioned	2020-12-23T19:36:03Z
dc.date.available	2020-12-23T19:36:03Z
dc.date.issued	2020-12-23
dc.date.submitted	2020-12-21
dc.description.abstract	Text-based games are complex, interactive simulations where a player is asked to process the text describing the underlying state of the world to issue textual commands for advancing in a game. Playing these games can be formulated as acting in a partially observable Markov decision process (POMDP), as the player needs to issue actions to reach the goal, by optimizing rewards, given textual observations that may not fully describe the underlying state. Previous art has focused on developing agents to achieve high rewards or faster convergence to the optimal policy for single games. However, with the recent advances in reinforcement learning and representation learning for language we argue it is imperative to start looking for agents that can play a set of games drawn from a distribution of games rather than single games at a time. In this work, we will be looking at TextWorld as a testbed for developing generalizable policies and benchmarking them against previous work. TextWorld is a sandbox environment for training and evaluating reinforcement learning agents on text-based games. TextWorld is suitable to check the generalizability of agents as it enables us to generate hundreds of unique games with varying levels of difficulties. Difficulty in text-based games are determined by a variety of factors like the number of locations in the environment and length of the optimal walkthrough to name a few. Playing text-based games requires skills in sequential decision making and processing language. In this thesis we evaluate the learnt control policies by training them on a set of games and then observing their scores on unseen games during the training phase. We check for the quality of the policies learnt, their ability to generalize on a distribution of games and their ability to transfer on games from different distributions. We define game distributions based on the difficulty level parameterized by the number of locations in the game, number of objects, etc. We propose generalizable and transferrable policies by extracting structured information from the raw textual observations describing the state. Additionally, our agents learn these policies in a purely data-driven fashion without using any handcrafted component -- a common practice found in prior work. Specifically, we learn dynamic knowledge graphs from raw text to represent our agents' beliefs. The dynamic belief graphs a) allow agents to extract relevant information from text observations and, b) act as memory to act optimally in the POMDP. Experiments on 500+ different games from the TextWorld suite show that our best agent outperforms previous baselines by an average of 24.2%.	en
dc.identifier.uri	http://hdl.handle.net/10012/16604
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	Natural Language Processing	en
dc.subject	Machine Learning	en
dc.subject	Reinforcement Learning	en
dc.subject	Graph Representation Learning	en
dc.title	Generalization on Text-based Games using Structured Belief Representations	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Lin, Jimmy
uws.contributor.advisor	Poupart, Pascal
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Adhikari_Ashutosh_Devendrakumar.pdf
Size:: 4.49 MB
Format:: Adobe Portable Document Format
Description:: Main article

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science