Learning to Reach Goals from Suboptimal Demonstrations via World Models

dc.contributor.authorAli, Qasim
dc.date.accessioned2026-01-14T15:20:53Z
dc.date.available2026-01-14T15:20:53Z
dc.date.issued2026-01-14
dc.date.submitted2026-01-12
dc.description.abstractA central challenge for training autonomous agents is the scarcity of high-quality and long-horizon demonstrations. Unlike fields such as natural language or computer vision—where abundant internet data exists—many robotics and decision-making domains lack large, diverse, and high-quality datasets. One underutilized resource is leveraging suboptimal demonstrations, which are easier to collect and potentially more abundant. This limitation is particularly pronounced in goal-conditioned reinforcement learning (GCRL), where agents must learn to reach diverse goal states from limited demonstrations. While methods such as contrastive reinforcement learning (CRL) show promising scaling behavior when given access to abundant and high-quality training demonstrations, they struggle when demonstrations are suboptimal. In particular, when training demonstrations are short or exploratory, CRL struggles to generalize beyond the training demonstrations, and the resulting policy exhibits lower success rates. To overcome this, we explore the use of self-supervised representation learning to extract general-purpose representations from demonstrations. The intuition is that if an agent can first learn robust representations of environment dynamics—without relying on demonstration optimality—it can then use these representations to guide reinforcement learning more effectively. Such representations can serve as a bridge between noisy demonstrations and goal-directed control, allowing policies to learn faster. In this thesis, we propose World Model Contrastive Reinforcement Learning (WM-CRL), which augments CRL with representations from a world model (WM). The world model is trained to anticipate future state embeddings from past state–action pairs, thereby encoding the dynamics of the environment. As the world model aims to only learn environment dynamics, it can leverage both high and low quality demonstrations. By integrating these world model embeddings into CRL’s framework, it can help CRL more easily comprehend the environment dynamics and select actions that more effectively achieve its goals. We evaluate WM-CRL on tasks from the OGBench benchmark. We explore performance on multiple locomotion and manipulation environments and multiple datasets varying in quality. Our results show that WM-CRL can substantially improve performance over CRL in suboptimal-data settings, such as stitching short trajectories or learning from exploratory behavior. However, we find our method offers limited benefit when abundant expert demonstrations are available. Ablation studies further reveal that success depends critically on the stability of world model training and on how its embeddings are integrated into the agent’s architecture.
dc.identifier.urihttps://hdl.handle.net/10012/22823
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectreinforcement learning
dc.subjectworld models
dc.subjectrepresentation learning
dc.subjectmodel-based reinforcement learning
dc.subjectcontrastive learning
dc.subjectself-supervised learning
dc.titleLearning to Reach Goals from Suboptimal Demonstrations via World Models
dc.typeMaster Thesis
uws-etd.degreeMaster of Applied Science
uws-etd.degree.departmentSystems Design Engineering
uws-etd.degree.disciplineSystem Design Engineering
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorWong, Alexander
uws.contributor.advisorShafiee, Javad
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ali_Qasim.pdf
Size:
8.41 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections