On Landmarks for Introducing 3D SLAM Structure to VPR

Zelek, JohnBradley, Matthew2024-04-292024-04-292024-04-292024-04-17http://hdl.handle.net/10012/20515Simultaneous Localization and Mapping (SLAM) is a critical foundation to a wide variety of robotic applications. Visual SLAM systems rely on Visual Place Recognition (VPR) for map maintenance and loop-closing so their quality suffers when VPR performance is impacted. In most VPR systems images are described compactly and stored for later comparison, with matches indicating that a scene has been seen before and has been revisited. Changes in illumination are a common difficulty for VPR image descriptors based on vocabularies of local features. Global descriptors which incorporate high-level structure are more robust to illumination, but are often sensitive to changes in viewpoint. There is an overall focus in VPR on describing single images despite the fact that SLAM systems recover 3D structure from the environment, and that this structure is both illumination invariant and remains the same regardless of vantage point. Work leveraging SLAM-recovered structure in the form of 3D points, in conjunction with LiDAR scan descriptors, has demonstrated superior VPR performance under harsh illumination compared with SoTA visual vocabulary descriptors. However, performance in general is not as high. A significant observed limitation was difficulty matching pseudo-LiDAR scans with significantly differing sub-regions. This is due to an assumption by the LiDAR descriptors used, that the entire volume of two corresponding scans should match. This does not fit well with the inherent sparsity of accumulated pointclouds from traversal by visual SLAM, due to differences in route, incomplete coverage, and the inherent sparsity of SLAM feature tracking in general. What is needed is an approach based on matching sub-regions which are common between pseudo-scans, in other words an approach performing place recognition based on landmarks. Here we explore generation of landmarks from accumulated SLAM structure through various clustering-based techniques, as well as the application of SoTA Grassmannian Graph-based association to match them. We present the challenges and successes of this approach to introducing 3D structure into VPR and propose various avenues of exploration to address the challenges faced. One of the foremost challenges is that pointclouds derived from SLAM are very sparse and uneven, making reliable and repeatable clustering difficult to achieve. We make significant improvement in landmark quality by using semantic labeling to provide better separation before clustering. While this has a noticeable impact on the number of outlier landmarks, we also find that there is an extreme sensitivity to outliers in the association method used. This sensitivity persists across data sets and seems inherent to this method of association. This precludes effective place recognition at this time, however in future work we expect this will be alleviated through the use of landmark descriptors for more effective outlier rejection. Descriptors can also provide putative associations which can be beneficial to landmark matching. We also propose various other enhancements to help improve landmark generation and association of landmarks for place recognition. It is our firm expectation that incorporation of 3D structure from SLAM systems into underlying VPR will be mutually beneficial, with VPR systems gaining additional descriptive capability which is fully invariant to illumination but more stable than viewpoint-sensitive 2D image structure.enplace recognitionsimultainious localization and mappingvisual place recognitionLiDARnavigationcomputer visionOn Landmarks for Introducing 3D SLAM Structure to VPRMaster Thesis