Show simple item record

dc.contributor.authorBhalla, Sushrut 19:44:18 (GMT) 19:44:18 (GMT)
dc.description.abstractDeep Learning and back-propagation have been successfully used to perform centralized training with communication protocols among multiple agents in a cooperative Multi-Agent Deep Reinforcement Learning (MARL) environment. In this work, I present techniques for centralized training of MARL agents in large scale environments and compare my work against current state of the art techniques. This work uses model-free Deep Q-Network (DQN) as the baseline model and allows inter agent communication for cooperative policy learning. I present two novel, scalable and centralized MARL training techniques (MA-MeSN, MA-BoN), which are developed under the principle that the behavior policy and message/communication policies have different optimization criteria. Thus, this work presents models which separate the message learning module from the behavior policy learning module. As shown in the experiments, the separation of these modules helps in faster convergence in complex domains like autonomous driving simulators and achieves better results than the current techniques in literature. Subsequently, this work presents two novel techniques for achieving decentralized execution for the communication based cooperative policy. The first technique uses behavior cloning as a method of cloning an expert cooperative policy to a decentralized agent without message sharing. In the second method, the behavior policy is coupled with a memory module which is local to each model. This memory model is used by the independent agents to mimic the communication policies of other agents and thus generate an independent behavior policy. This decentralized approach has minimal effect on degradation of the overall cumulative reward achieved by the centralized policy. Using a fully decentralized approach allows us to address the challenges of noise and communication bottlenecks in real-time communication channels. In this work, I theoretically and empirically compare the centralized and decentralized training algorithms to current research in the field of MARL. As part of this thesis, I also developed a large scale multi-agent testing environment. It is a new OpenAI-Gym environment which can be used for large scale multi-agent research as it simulates multiple autonomous cars driving cooperatively on a highway in the presence of a bad actor. I compare the performance of the centralized algorithms to existing state-of-the-art algorithms, for ex, DIAL and IMS which are based on cumulative reward achieved per episode and other metrics. MA-MeSN and MA-BoN achieve a cumulative reward of at-least 263% higher than the reward achieved by the DIAL and IMS. I also present an ablation study of the scalability of MA-BoN and show that MA-MeSN and MA-BoN algorithms only exhibit a linear increase in inference time and number of trainable parameters compared to quadratic increase for DIAL.en
dc.publisherUniversity of Waterlooen
dc.subjectMachine Learningen
dc.subjectReinforcement Learningen
dc.subjectMulti-Agent Reinforcement Learningen
dc.titleDeep Multi Agent Reinforcement Learning for Autonomous Drivingen
dc.typeMaster Thesisen
dc.pendingfalse and Computer Engineeringen and Computer Engineeringen of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws.contributor.advisorCrowley, Mark
uws.contributor.affiliation1Faculty of Engineeringen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages