Prediction and Planning in Dynamical Systems with Underlying Markov Decision Processes

Banijamali, Seyedershad

Prediction and Planning in Dynamical Systems with Underlying Markov Decision Processes

Files

Banijamali_Seyedershad.pdf (7.37 MB)

Date

2021-08-24

Authors

Banijamali, Seyedershad

Advisor

Ghodsi, Ali

Publisher

University of Waterloo

Abstract

Predicting the future state of a scene with moving objects is a task that humans handle with ease. This is due to our understanding about the dynamics of the objects in the scene and the way they interact. However, teaching machines such understanding has always been a challenging task in machine learning. In recent years, with the abundance of data and enormous growth in computational power, there have been an outstanding progress in filling the gap between humans and machines perception and prediction. Deep learning, specifically, has been the main framework to address this problem. Prediction models are not only crucial problems by themselves but also many downstream tasks in machine learning and robotics rely on the quality of output of these models. Model-based control and planning require an accurate modelling of the underlying dynamics of the systems. A common assumption about the underlying dynamics, which is also the main theme of this thesis, is that it can be expressed using Markov Decision Processes (MDPs). However, the major portion of the thesis is dedicated to the problems in which we do not have access to the actual underlying MDP and only observe some high-dimensional observations from the dynamical system. The objective is then to model the underlying dynamics from the data and built a model that can potentially be used for planning and control. We consider both single-agent and multi-agent systems and employ deep generative models for modelling the dynamics. For the single-agent problem we propose a model that maps the high-dimensional observations to a low-dimensional space in which the dynamics of the system is modelled by a locally-linear function. We find this mapping by a proper modelling of the variables using graphical models and show that the mapping is robust against dynamics noise and suitable for control. For the multi-agent problem we provide a formulation that describes the prediction problem in terms of the reaction of the environment to the action of one agent (ego-agent) and show that such formulation can improve the prediction accuracy as well as broaden the range of environment conditions. From a different perspective, we also consider the problem in which we have access to the MDP and would like to obtain the optimal policy. More specifically, given a set of base policies on the MDP, we want to find the best policy in their convex hull. We show that this problem is NP-hard in general and provide an approximating algorithm with linear complexity, which outputs a policy that performs close to the optimal policy. This policy can be found under the condition that base policies have overlap in the occupancy measure space.