Detection of Small Objects in UAV Images via an Improved Swin Transformer-based Model

Liang, Weidong

Detection of Small Objects in UAV Images via an Improved Swin Transformer-based Model

dc.contributor.advisor	Li, Jonathan
dc.contributor.advisor	Xu, Linlin
dc.contributor.author	Liang, Weidong
dc.date.accessioned	2023-05-23T17:40:45Z
dc.date.available	2023-05-23T17:40:45Z
dc.date.issued	2023-05-23
dc.date.submitted	2023-05-11
dc.description.abstract	Automated detection of small objects such as vehicles in images of complex urban environments taken by unmanned aerial vehicles (UAV) is one of the most challenging tasks in computer vision and remote sensing communities, with various applications ranging from traffic congestion surveillance to vision systems in intelligent transportation. Deep learning models, most of which are based on convolutional neural networks (CNNs), have been commonly used to automatically detect objects in UAV images. However, the detection accuracy is still often unsatisfactory due to the shortcomings of CNNs. For instance, CNN collects data from nearby pixels, but spatial information is lost due to the pooling operations. As such, it is difficult for CNNs to model certain long-range dependencies. In this thesis, we propose a Swin Transformer-based model that incorporates convolutions with the Swin Transformer to extract more local information, mitigating the problem of small object detection from complex backgrounds in UAV images and further improving the detection accuracy. By using the Swin Transformer, our model leverages both the local feature extraction of convolutions and the global feature modeling of transformers. The framework was designed with two main modules, a local context enhancement (LCE) module and a Residual U-Feature Pyramid Network (RSU-FPN) module. The LCE module is used to implement dilated convolution and increase the receptive field of each image pixel. By combining with the Swin Transformer block, it can efficiently encode various spatial contextual information and detect local associations and structural information within UAV images. In addition, the RSU-FPN module is designed as a two-level nested U-shaped structure with skip connections to integrate multi-scale feature maps. A loss function combining normalized Gaussian Wasserstein distance and L1 loss is also introduced, which allows the model to be trained using imbalanced data. The proposed method was compared with the state-of-the-art methods on the UAVDT dataset and Vis-Drone dataset. Our experimental results obtained on the UAVDT dataset indicated that our proposed method increased the average precision (AP) by 21.6%, 22.3% and 25.5% over Cascade R-CNN, PVT and Dynamic R-CNN detectors, respectively, demonstrating its effectiveness and reliability on small object detection from UAV images.	en
dc.identifier.uri	http://hdl.handle.net/10012/19468
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.title	Detection of Small Objects in UAV Images via an Improved Swin Transformer-based Model	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Systems Design Engineering	en
uws-etd.degree.discipline	System Design Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Li, Jonathan
uws.contributor.advisor	Xu, Linlin
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Liang_Weidong.pdf
Size:: 82.71 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Systems Design Engineering