Real-time 3D Object Detection for Autonomous Driving

Loading...
Thumbnail Image

Date

2018-05-10

Authors

Mozifian, Melissa Farinaz

Advisor

Waslander, Steven

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

This thesis focuses on advancing the state-of-the-art 3D object detection and localization in autonomous driving. An autonomous vehicle requires operating within a very unpredictable and dynamic environment. Hence a robust perception system is essential. This work proposes a novel architecture, AVOD, an 𝐀ggregate 𝐕iew 𝐎bject 𝐃etection architecture for autonomous driving capable of generating accurate 3D bounding boxes on road scenes. AVOD uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network. The proposed RPN uses a novel architecture capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes. Using these proposals, the second stage detection network performs accurate oriented 3D bounding box regression and category classification to predict the extents, orientation, and classification of objects in 3D space. AVOD is differentiated from the state-of-the-art by using a high resolution feature extractor coupled with a multimodal fusion RPN architecture, and is therefore able to produce accurate region proposals for small classes in road scenes. AVOD also employs explicit orientation vector regression to resolve the ambiguous orientation estimate inferred from a bounding box. Experiments on the challenging KITTI dataset show the superiority of AVOD over the state-of-the-art detectors on the 3D localization, orientation estimation, and category classification tasks. Finally, AVOD is shown to run in real time and with a low memory overhead. The robustness of AVOD is also visually demonstrated when deployed on our autonomous vehicle operating under low lighting conditions such as night time as well as in snowy scenes. Furthermore, AVOD-SSD is proposed as a 3D Single Stage Detector. This work demonstrates how a single stage detector can achieve similar accuracy as that of a two-stage detector. An analysis of speed and accuracy trade-offs between AVOD and AVOD-SSD are presented.

Description

Keywords

Computer Vision, Object Detection, Deep Learning

LC Subject Headings

Citation