Video-Based Object Detection in Security Monitoring System
Loading...
Date
2022-09-28
Authors
Li, Chao
Advisor
Ban, Dayan
Wang, Zhou
Wang, Zhou
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Object detection technology has been widely used in many real world applications.
With the development of the deep learning method, the accuracy and speed of object de tection method have been improved significantly, demonstrating great promises to increase
the efficiency of security-related business activities. Nevertheless, the robustness of the
existing object detection methods on security video datasets is still lacking. This could
substantially reduce performance in complex application scenarios, such as changeable tar get size, target occlusion and bad weather. This cannot be solved perfectly by image-based
object detection because a single image’s information is limited. On the other hand, the
video dataset consists of a series of still images of rich temporal and spatial information,
which could be used as supplements for the detection methods. Based on this idea, this
thesis proposes an incremental optimization method that solves the existing problems of
the object detection method. We first improve the accuracy of the image-based object
detection method by adding new features, and then aggregate the temporal and spatial
information of the target to enhance the performance of the video-based object detec tion method. Furthermore, a multi-layer feature cascade aggregation pyramid structure
is adopted based on the traditional Faster-RCNN model. The Faster-RCNN is one of the
most famous convolution neural networks used in object detection and recognition tasks,
which was firstly proposed in 2016. It replaced the traditional selective search method with
the region proposal network (RPN), which improved detection speed significantly. Because
of its excellent detection performance, many recent proposed approaches still selected it
as the backbone network. The new multi-layer feature cascade aggregation feature pyra mid network (MCA-FPN) combines the deep and shallow semantic feature information to
optimize feature utilization and improve the feature representation ability of any size. In
order to address the negative effects generated by the imbalanced distribution of samples,
a sample asymmetric weighted loss function (SAW-Loss) is proposed, which improves the
efficiency of the network training. Experimental results show that the proposed MCA FPN and SAW-Loss modules can improve the mAP of traditional FPN by 2.4% and 1.5%
respectively, and the final improved object detection algorithm with both of two modules
obtains a mAP of 86.0% on Pascal VOC dataset which is higher than the mAP of 82.1%
tested from FPN. The proposed method performs significantly better than most of the
existing method, such as FCOS with a mAP of 78.7%, RFBNet with an mAP of 82.2%
and PFPNet with a mAP of 84.1%.
Video-based methods may make use of two types of information: local information
which is obtained from adjacent frames and global information which is extracted from
whole video series. We propose two types of information aggregation methods, namely local
information aggregation and global information aggregation based on the feature similarity
and the attention mechanism, and so as to aggregate features selectively by including more
of the correlated feature information and less of the uncorrelated feature information.
As such, the network could extract and learn more useful target features and abandon
the interfered features. The accuracy of the proposed local global information aggregation
methods could be improved by 0.9% and 1.1%, respectively compared with one of the most
advanced video-based object detection methods MEGA. By adding both two modules, the
mAP of the proposed method reaches 84.6% on the public dataset ImageNet VID, which is
1.7% higher than the mAP of MEGA. The proposed method also demonstrates potentials
to detect occluded targets with high confidence.