Show simple item record

dc.contributor.authorBalakrishnan, Aravind
dc.date.accessioned2020-01-24 16:51:41 (GMT)
dc.date.available2020-01-24 16:51:41 (GMT)
dc.date.issued2020-01-24
dc.date.submitted2020-01-21
dc.identifier.urihttp://hdl.handle.net/10012/15570
dc.description.abstractThe behaviour planning subsystem, which is responsible for high-level decision making and planning, is an important aspect of an autonomous driving system. There are advantages to using a learned behaviour planning system instead of traditional rule-based approaches. However, high quality labelled data for training behaviour planning models is hard to acquire. Thus, reinforcement learning (RL), which can learn a policy from simulations, is a viable option for this problem. However, modelling inaccuracies between the simulator and the target environment, called the ‘transfer gap’, hinders its deployment in a real autonomous vehicle. High-fidelity simulators, which have a smaller transfer gap, come with large computational costs that are not favourable for RL training. Therefore, we often have to settle for a fast, but lower fidelity simulator that exacerbates the transfer learning problem. In this thesis, we study how a low-fidelity 2D simulator can be used in place of a slower 3D simulator for training RL behaviour planning models, and analyze the resulting policies in comparison with a rule-based approach. We develop WiseMove, an RL framework for autonomous driving research that supports hierarchical RL, to serve as the low-fidelity source simulator. A transfer learning scenario is set up from WiseMove to an Unreal-based simulator for the Autonomoose system to study and close the transfer gap. We find that perception errors in the target simulator contribute the most to the transfer gap. These errors, when naively modelled in WiseMove, provide a policy that performs better in the target simulator than a carefully constructed rule-based policy. Applying domain randomization on the environment yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has less reliance on the velocity compared to the rule-based algorithm, as its measurement is unreliable in the target simulator. To understand the exact learned behaviour, we also distill the RL policy using a decision tree to obtain an interpretable rule-based policy. We show that constructing a rule-based policy manually to efficiently handle perception errors is not trivial. Future work can explore more driving scenarios under fewer constraints to further validate this result.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectreinforcement learningen
dc.subjectautonomous drivingen
dc.subjecttransfer learningen
dc.subject.lcshAutomated vehiclesen
dc.subject.lcshMachine learningen
dc.titleClosing the Modelling Gap: Transfer Learning from a Low-Fidelity Simulator for Autonomous Drivingen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws.contributor.advisorCzarnecki, Krzysztof
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages