Optimal Learning Theory and Approximate Optimal Learning Algorithms

Song, Haobei

Optimal Learning Theory and Approximate Optimal Learning Algorithms

dc.contributor.advisor	Tripunitara, Mahesh
dc.contributor.author	Song, Haobei
dc.date.accessioned	2019-09-12T17:41:34Z
dc.date.available	2019-09-12T17:41:34Z
dc.date.issued	2019-09-12
dc.date.submitted	2019-09-05
dc.description.abstract	The exploration/exploitation dilemma is a fundamental but often computationally intractable problem in reinforcement learning. The dilemma also impacts data efficiency which can be pivotal when the interactions between the agent and the environment are constrained. Traditional optimal control theory has some notion of objective criterion, such as regret, maximizing which results in optimal exploration and exploitation. This approach has been successful in multi-armed bandit problem but becomes impractical and mostly intractable to compute for multi-state problems. For complex problems with large state space when function approximation is applied, exploration/exploitation during each interaction is in practice generally decided in an ad hoc approach with heavy parameter tuning, such as ε-greedy. Inspired by different research communities, optimal learning strives to find the optimal balance between exploration and exploitation by applying principles from optimal control theory. The contribution of this thesis consists of two parts: 1. to establish a theoretical framework of optimal learning based on reinforcement learning in a stochastic (non-Markovian) decision process and through the lens of optimal learning unify the Bayesian (model-based) reinforcement learning and the partially observable reinforcement learning. 2. to improve existing reinforcement learning algorithms in the optimal learning view and the improved algorithms will be referred to as approximate optimal learning algorithms. Three classes of approximate optimal learning algorithms are proposed drawing from the following principles respectively: (1) Approximate Bayesian inference explicitly by training a recurrent neural network en- tangled with a feed forward neural network; (2) Approximate Bayesian inference implicitly by training and sampling from a pool of prediction neural networks as dynamics models; (3) Use memory based recurrent neural network to extract features from observations. Empirical evidence is provided to show the improvement of the proposed algorithms.	en
dc.identifier.uri	http://hdl.handle.net/10012/15042
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	reinforcement learning	en
dc.subject	machine learning	en
dc.subject	exploration	en
dc.subject	exploitation	en
dc.subject	optimal learning	en
dc.subject	Bayesian reinforcement learning	en
dc.subject	model based reinforcement learning	en
dc.subject	neural network	en
dc.title	Optimal Learning Theory and Approximate Optimal Learning Algorithms	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws.contributor.advisor	Tripunitara, Mahesh
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Song_Haobei.pdf
Size:: 3.89 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.08 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering