Generative Models for Planning and Decision-Making

dc.contributor.authorKarthikeyan, Akash
dc.date.accessioned2025-08-14T20:04:33Z
dc.date.available2025-08-14T20:04:33Z
dc.date.issued2025-08-14
dc.date.submitted2025-08-08
dc.description.abstractGenerative models have achieved remarkable progress across domains such as vision and language. However, their application to sequential decision-making and planning remains challenging. In reinforcement learning and robotics, agents must handle task hierarchies, long-horizon dependencies, adapt to harder unseen tasks and environments, and, especially in multi-agent settings, respond to adversarial or evolving opponents. Despite progress in behavioral cloning and offline policy learning, existing approaches often struggle to generalize beyond the train distribution or to learn robust, interactive behaviors in competitive games. These limitations restrict current systems to narrow tasks with short temporal horizons, or deterministic settings. For instance, behavioral planners trained on single-goal environments struggle scaling to multi-task missions requiring subgoal discovery and adaptive reasoning, as there is no straightforward mechanism for iterative test-time adaptation to these unseen tasks. Similarly, in multi-agent reinforcement learning, standard policy optimization often yields unimodal, brittle strategies that overfit to specific opponents and fail to converge to a Nash equilibrium in continuous state-action games. This thesis explores challenges and opportunities in using generative models for planning and decision-making tasks, specifically energy-based and diffusion-based models which serve as both representations and solvers for planning and policy learning. In the single-agent setting, we introduce GenPlan, a discrete-flow planner that reframes planning as iterative denoising over trajectories using an energy-guided diffusion process. This formulation enables task and goal discovery, and adaptation to unseen environments. In the multi-agent setting, we propose DiffFP, a diffusion policy gradient method within the fictitious play framework. By approximating best responses through diffusion models, DiffFP captures multimodal strategies, improves sample efficiency, and remains robust to evolving opponents in dynamic, continuous state-action games. Our empirical studies show that GenPlan outperforms baselines by over 10% on adaptive planning tasks, generalizing from single-task demonstrations to complex, compositional multi-task missions. Likewise, DiffFP achieves up to 3× faster convergence and 30× higher success rates compared to other baseline reinforcement learning algorithms in multi-agent benchmarks. These results demonstrate the potential of generative modeling not only for representation learning, but as a unified substrate for planning, learning, and decision-making across settings.
dc.identifier.urihttps://hdl.handle.net/10012/22172
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/CL2-UWaterloo/GenPlan/
dc.relation.urihttps://github.com/CL2-UWaterloo/DiffFP/
dc.subjectgenerative models
dc.subjectreinforcement learning
dc.subjectdiffusion model
dc.titleGenerative Models for Planning and Decision-Making
dc.typeMaster Thesis
uws-etd.degreeMaster of Applied Science
uws-etd.degree.departmentElectrical and Computer Engineering
uws-etd.degree.disciplineElectrical and Computer Engineering
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms1 year
uws.comment.hiddenFull Name: Akash Karthikeyan Student ID: 21104705
uws.contributor.advisorPant, Yash
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Karthikeyan_Akash.pdf
Size:
5.02 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: