SIVO: Semantically Informed Visual Odometry and Mapping

Loading...
Thumbnail Image

Date

2018-11-12

Authors

Ganti, Pranav

Advisor

Waslander, Steven

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Accurate localization is a requirement for any autonomous mobile robot. In recent years, cameras have proven to be a reliable, cheap, and effective sensor to achieve this goal. Visual simultaneous localization and mapping (SLAM) algorithms determine camera motion by tracking the motion of reference points from the scene. However, these references must be static, as well as viewpoint, scale, and rotation invariant in order to ensure accurate localization. This is especially paramount for long-term robot operation, where we require our references to be stable over long durations and also require careful point selection to maintain the runtime and storage complexity of the algorithm while the robot navigates through its environment. In this thesis, we present SIVO (Semantically Informed Visual Odometry and Mapping), a novel feature selection method for visual SLAM which incorporates machine learning and neural network uncertainty into an information-theoretic approach to feature selection. The emergence of deep learning techniques has resulted in remarkable advances in scene understanding, and our method supplements traditional visual SLAM with this contextual knowledge. Our algorithm selects points which provide significant information to reduce the uncertainty of the state estimate while ensuring that the feature is detected to be a static object repeatedly, with a high confidence. This is done by evaluating the reduction in Shannon entropy between the current state entropy, and the joint entropy of the state given the addition of the new feature with the classification entropy of the feature from a Bayesian neural network. Our method is evaluated against ORB SLAM2 and the ground truth of the KITTI odometry dataset. Overall, SIVO performs comparably to ORB SLAM2 (average of 0.17% translation error difference, 6.2 × 10 −5 deg/m rotation error difference) while removing 69% of the map points on average. As the reference points selected are from static objects (building, traffic signs, etc.), the map generated using our algorithm is suitable for long-term localization.

Description

Keywords

localization, mapping, SLAM, deep learning, machine learning, information theory, semantic segmentation, visual odometry

LC Subject Headings

Citation