Show simple item record

dc.contributor.authorSong, Haobei
dc.date.accessioned2019-09-12 17:41:34 (GMT)
dc.date.available2019-09-12 17:41:34 (GMT)
dc.date.issued2019-09-12
dc.date.submitted2019-09-05
dc.identifier.urihttp://hdl.handle.net/10012/15042
dc.description.abstractThe exploration/exploitation dilemma is a fundamental but often computationally intractable problem in reinforcement learning. The dilemma also impacts data efficiency which can be pivotal when the interactions between the agent and the environment are constrained. Traditional optimal control theory has some notion of objective criterion, such as regret, maximizing which results in optimal exploration and exploitation. This approach has been successful in multi-armed bandit problem but becomes impractical and mostly intractable to compute for multi-state problems. For complex problems with large state space when function approximation is applied, exploration/exploitation during each interaction is in practice generally decided in an ad hoc approach with heavy parameter tuning, such as ε-greedy. Inspired by different research communities, optimal learning strives to find the optimal balance between exploration and exploitation by applying principles from optimal control theory. The contribution of this thesis consists of two parts: 1. to establish a theoretical framework of optimal learning based on reinforcement learning in a stochastic (non-Markovian) decision process and through the lens of optimal learning unify the Bayesian (model-based) reinforcement learning and the partially observable reinforcement learning. 2. to improve existing reinforcement learning algorithms in the optimal learning view and the improved algorithms will be referred to as approximate optimal learning algorithms. Three classes of approximate optimal learning algorithms are proposed drawing from the following principles respectively: (1) Approximate Bayesian inference explicitly by training a recurrent neural network en- tangled with a feed forward neural network; (2) Approximate Bayesian inference implicitly by training and sampling from a pool of prediction neural networks as dynamics models; (3) Use memory based recurrent neural network to extract features from observations. Empirical evidence is provided to show the improvement of the proposed algorithms.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectreinforcement learningen
dc.subjectmachine learningen
dc.subjectexplorationen
dc.subjectexploitationen
dc.subjectoptimal learningen
dc.subjectBayesian reinforcement learningen
dc.subjectmodel based reinforcement learningen
dc.subjectneural networken
dc.titleOptimal Learning Theory and Approximate Optimal Learning Algorithmsen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degree.disciplineElectrical and Computer Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws.contributor.advisorTripunitara, Mahesh
uws.contributor.affiliation1Faculty of Engineeringen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages