Linearizing Contextual Multi-Armed Bandit Problems with Latent Dynamics

dc.contributor.authorNelson, Elliot
dc.date.accessioned2022-02-10T21:24:32Z
dc.date.available2024-02-11T05:50:03Z
dc.date.issued2022-02-10
dc.date.submitted2022-01-19
dc.description.abstractIn many real-world applications of multi-armed bandit problems, both rewards and observed contexts are often influenced by confounding latent variables which evolve stochastically over time. While the observed contexts and rewards are nonlinearly related, prior knowledge of latent graphical structure can be leveraged to reduce the problem to the linear bandit setting. We develop a linearized latent Thompson sampling algorithm (L2TS), which exploits prior knowledge of the dependence of observed contexts on the hidden state to build a least-squares estimator of the latent transition matrix, and uses the resulting approximate posterior beliefs over the latent space as context features in a linear bandit problem. We upper bound the error in reward parameter estimates in our method, demonstrating the role of the latent dynamics and evolution of posterior beliefs. We also demonstrate through experiments the superiority of our approach over related bandit algorithms. Lastly, we derive a theoretical bound which demonstrates the influence of the latent dynamics and information theoretic structure of the problem on Bayesian inference over the latent space. Overall, our approach uses prior knowledge to reduce a complex decision-making problem to a simpler problem for which existing solutions and methods can be applied.en
dc.identifier.urihttp://hdl.handle.net/10012/18068
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectmulti-armed banditsen
dc.subjectsequential decision makingen
dc.subjectBayesian inferenceen
dc.subjectlatent variable modelsen
dc.subjecthidden Markov modelsen
dc.subjectlinear regressionen
dc.subjectThompson samplingen
dc.subjectexplorationen
dc.titleLinearizing Contextual Multi-Armed Bandit Problems with Latent Dynamicsen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms2 yearsen
uws.contributor.advisorPoupart, Pascal
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nelson_Elliot.pdf
Size:
1.34 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: