Closing the Modelling Gap: Transfer Learning from a Low-Fidelity Simulator for Autonomous Driving

Balakrishnan, Aravind

dc.contributor.author	Balakrishnan, Aravind
dc.date.accessioned	2020-01-24 16:51:41 (GMT)
dc.date.available	2020-01-24 16:51:41 (GMT)
dc.date.issued	2020-01-24
dc.date.submitted	2020-01-21
dc.identifier.uri	http://hdl.handle.net/10012/15570
dc.description.abstract	The behaviour planning subsystem, which is responsible for high-level decision making and planning, is an important aspect of an autonomous driving system. There are advantages to using a learned behaviour planning system instead of traditional rule-based approaches. However, high quality labelled data for training behaviour planning models is hard to acquire. Thus, reinforcement learning (RL), which can learn a policy from simulations, is a viable option for this problem. However, modelling inaccuracies between the simulator and the target environment, called the ‘transfer gap’, hinders its deployment in a real autonomous vehicle. High-fidelity simulators, which have a smaller transfer gap, come with large computational costs that are not favourable for RL training. Therefore, we often have to settle for a fast, but lower fidelity simulator that exacerbates the transfer learning problem. In this thesis, we study how a low-fidelity 2D simulator can be used in place of a slower 3D simulator for training RL behaviour planning models, and analyze the resulting policies in comparison with a rule-based approach. We develop WiseMove, an RL framework for autonomous driving research that supports hierarchical RL, to serve as the low-fidelity source simulator. A transfer learning scenario is set up from WiseMove to an Unreal-based simulator for the Autonomoose system to study and close the transfer gap. We find that perception errors in the target simulator contribute the most to the transfer gap. These errors, when naively modelled in WiseMove, provide a policy that performs better in the target simulator than a carefully constructed rule-based policy. Applying domain randomization on the environment yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has less reliance on the velocity compared to the rule-based algorithm, as its measurement is unreliable in the target simulator. To understand the exact learned behaviour, we also distill the RL policy using a decision tree to obtain an interpretable rule-based policy. We show that constructing a rule-based policy manually to efficiently handle perception errors is not trivial. Future work can explore more driving scenarios under fewer constraints to further validate this result.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	reinforcement learning	en
dc.subject	autonomous driving	en
dc.subject	transfer learning	en
dc.subject.lcsh	Automated vehicles	en
dc.subject.lcsh	Machine learning	en
dc.title	Closing the Modelling Gap: Transfer Learning from a Low-Fidelity Simulator for Autonomous Driving	en
dc.type	Master Thesis	en
dc.pending	false
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Master of Mathematics	en
uws.contributor.advisor	Czarnecki, Krzysztof
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Balakrishnan_Aravind.pdf
Size:: 3.042Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record