Generative Models for Planning and Decision-Making

No Thumbnail Available

Date

2025-08-14

Advisor

Pant, Yash

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Generative models have achieved remarkable progress across domains such as vision and language. However, their application to sequential decision-making and planning remains challenging. In reinforcement learning and robotics, agents must handle task hierarchies, long-horizon dependencies, adapt to harder unseen tasks and environments, and, especially in multi-agent settings, respond to adversarial or evolving opponents. Despite progress in behavioral cloning and offline policy learning, existing approaches often struggle to generalize beyond the train distribution or to learn robust, interactive behaviors in competitive games. These limitations restrict current systems to narrow tasks with short temporal horizons, or deterministic settings. For instance, behavioral planners trained on single-goal environments struggle scaling to multi-task missions requiring subgoal discovery and adaptive reasoning, as there is no straightforward mechanism for iterative test-time adaptation to these unseen tasks. Similarly, in multi-agent reinforcement learning, standard policy optimization often yields unimodal, brittle strategies that overfit to specific opponents and fail to converge to a Nash equilibrium in continuous state-action games. This thesis explores challenges and opportunities in using generative models for planning and decision-making tasks, specifically energy-based and diffusion-based models which serve as both representations and solvers for planning and policy learning. In the single-agent setting, we introduce GenPlan, a discrete-flow planner that reframes planning as iterative denoising over trajectories using an energy-guided diffusion process. This formulation enables task and goal discovery, and adaptation to unseen environments. In the multi-agent setting, we propose DiffFP, a diffusion policy gradient method within the fictitious play framework. By approximating best responses through diffusion models, DiffFP captures multimodal strategies, improves sample efficiency, and remains robust to evolving opponents in dynamic, continuous state-action games. Our empirical studies show that GenPlan outperforms baselines by over 10% on adaptive planning tasks, generalizing from single-task demonstrations to complex, compositional multi-task missions. Likewise, DiffFP achieves up to 3× faster convergence and 30× higher success rates compared to other baseline reinforcement learning algorithms in multi-agent benchmarks. These results demonstrate the potential of generative modeling not only for representation learning, but as a unified substrate for planning, learning, and decision-making across settings.

Description

Keywords

generative models, reinforcement learning, diffusion model

LC Subject Headings

Citation