Techniques to learn constraints from demonstrations
No Thumbnail Available
Date
2025-05-27
Authors
Advisor
Poupart, Pascal
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Given demonstrations from an optimal expert, inverse reinforcement learning aims to learn an underlying reward function. However, it is limiting to assume that the reward function fully explains the expert behaviour, since in many real world settings the expert might be acting to satisfy additional behavioural constraints. Recovering these additional constraints falls within the paradigm of constraint learning from demonstrations. Specifically, in this work, we focus on the setting of inverse constraint learning (ICL), where we wish to learn a single but arbitrarily complex constraint from demonstrations assuming the reward is known in advance.
For this setting, we first provide a framework to learn an expected constraint from constrained expert demonstrations. We then show how to translate an expected constraint into a probabilistic constraint and additionally extend the proposed framework to learn a probabilistic constraint from constrained expert demonstrations. Here, an expected constraint refers to a constraint that bounds the cumulative costs averaged over a batch of trajectories to be within a budget. Similarly, a probabilistic constraint upper bounds the probability that cumulative costs are above a certain threshold. Finally, we provide convergence guarantees for the proposed frameworks.
Following these approaches, we consider the complementary challenge of learning a constraint in a high dimensional state-action space. In such a setting, the constraint function may truly depend only on a subset of the input features. We propose using a simple test from the hypothesis testing literature to select this subset of features in order to construct a reduced input space for the constraint function. We also discuss the implications of using this approach in conjunction with an ICL algorithm.
To validate our proposed approaches, we conduct experiments with synthetic, robotics and environments based on real-world driving datasets. For feature selection, we test our approach by considering environments with varying state-action space sizes.
Description
Keywords
reinforcement learning, inverse reinforcement learning, machine learning, constraint learning