Show simple item record

dc.contributor.authorKhan, Omar Zia 19:13:12 (GMT) 19:13:12 (GMT)
dc.description.abstractDecision-theoretic systems, such as Markov Decision Processes (MDPs), are used for sequential decision-making under uncertainty. MDPs provide a generic framework that can be applied in various domains to compute optimal policies. This thesis presents techniques that offer explanations of optimal policies for MDPs and then refine decision theoretic models (Bayesian networks and MDPs) based on feedback from experts. Explaining policies for sequential decision-making problems is difficult due to the presence of stochastic effects, multiple possibly competing objectives and long-range effects of actions. However, explanations are needed to assist experts in validating that the policy is correct and to help users in developing trust in the choices recommended by the policy. A set of domain-independent templates to justify a policy recommendation is presented along with a process to identify the minimum possible number of templates that need to be populated to completely justify the policy. The rejection of an explanation by a domain expert indicates a deficiency in the model which led to the generation of the rejected policy. Techniques to refine the model parameters such that the optimal policy calculated using the refined parameters would conform with the expert feedback are presented in this thesis. The expert feedback is translated into constraints on the model parameters that are used during refinement. These constraints are non-convex for both Bayesian networks and MDPs. For Bayesian networks, the refinement approach is based on Gibbs sampling and stochastic hill climbing, and it learns a model that obeys expert constraints. For MDPs, the parameter space is partitioned such that alternating linear optimization can be applied to learn model parameters that lead to a policy in accordance with expert feedback. In practice, the state space of MDPs can often be very large, which can be an issue for real-world problems. Factored MDPs are often used to deal with this issue. In Factored MDPs, state variables represent the state space and dynamic Bayesian networks model the transition functions. This helps to avoid the exponential growth in the state space associated with large and complex problems. The approaches for explanation and refinement presented in this thesis are also extended for the factored case to demonstrate their use in real-world applications. The domains of course advising to undergraduate students, assisted hand-washing for people with dementia and diagnostics for manufacturing are used to present empirical evaluations.en
dc.publisherUniversity of Waterlooen
dc.subjectDecision-Theoretic Planningen
dc.subjectMarkov Decision Processesen
dc.subjectSequential Decision Makingen
dc.subjectReasoning under Uncertaintyen
dc.subjectBayesian Networksen
dc.subjectPolicy Explanationen
dc.subjectParameter Learningen
dc.subjectModel Refinementen
dc.titlePolicy Explanation and Model Refinement in Decision-Theoretic Planningen
dc.typeDoctoral Thesisen
dc.subject.programComputer Scienceen of Computer Scienceen
uws-etd.degreeDoctor of Philosophyen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages