Advancements in Road Lane Mapping: Comparative Analysis of Deep Learning-based Semantic Segmentation Methods Using Aerial Imagery

Liu, Xuanchen (Willow)

Advancements in Road Lane Mapping: Comparative Analysis of Deep Learning-based Semantic Segmentation Methods Using Aerial Imagery

Files

Liu_Xuanchen (Willow).pdf (6.87 MB)

Date

2024-05-01

Authors

Liu, Xuanchen (Willow)

Advisor

Li, Jonathan

Publisher

University of Waterloo

Abstract

The rapid advancement of autonomous vehicles (AVs) underscores the necessity for high-definition (HD) maps, with road lane information being crucial for their navigation. The widespread use of Earth observation data, including aerial imagery, provides invaluable resources for constructing these maps. However, to fully exploit the potential of aerial imagery for HD road map creation, it is essential to leverage the capabilities of artificial intelligence (AI) and deep learning technologies. Conversely, the domain of remote sensing has not yet fully explored the development of specialized models for road lane extraction, an area where the field of computer vision has made significant progress with the introduction of advanced semantic segmentation models. This research undertakes a comprehensive comparative analysis of twelve deep learning-based semantic segmentation models, specifically to measure their skill in road lane marking extraction, with a special emphasis on a novel dataset characterized by partially labeled instances. This investigation aims to examine the models' performance when applied to scenarios with minimal labeled data, examining their efficiency, accuracy, and ability to adapt under conditions of limited annotation and transfer learning. The outcome of this study highlights the distinct advantage of Transformer-based models over their Convolutional Neural Network (CNN) counterparts in the context of extracting road lanes from aerial imagery. Remarkably, within the state-of-the-art models, such as Segmenting Transformers (SegFormer), Shifted Window (Swin) Transformer, and Twins Scaled Vision Transformer (Twins-SVT) exhibit superior performance. The empirical results on the Waterloo Urban Scene dataset mark substantial progress, with mean Intersection over Union (IoU) scores ranging from 33.56% to 76.11%, precision from 64.33% to 77.44%, recall from 66.0% to 98.96%, and F1 scores from 44.34% to 85.35%. These findings underscore the benefits of model pretraining and the distinctive attributes of the dataset in strengthening the effectiveness of models for HD road map development, announcing new possibilities in the advancement of autonomous vehicle navigation systems.