Video-Based Object Detection in Security Monitoring System
MetadataShow full item record
Object detection technology has been widely used in many real world applications. With the development of the deep learning method, the accuracy and speed of object de tection method have been improved significantly, demonstrating great promises to increase the efficiency of security-related business activities. Nevertheless, the robustness of the existing object detection methods on security video datasets is still lacking. This could substantially reduce performance in complex application scenarios, such as changeable tar get size, target occlusion and bad weather. This cannot be solved perfectly by image-based object detection because a single image’s information is limited. On the other hand, the video dataset consists of a series of still images of rich temporal and spatial information, which could be used as supplements for the detection methods. Based on this idea, this thesis proposes an incremental optimization method that solves the existing problems of the object detection method. We first improve the accuracy of the image-based object detection method by adding new features, and then aggregate the temporal and spatial information of the target to enhance the performance of the video-based object detec tion method. Furthermore, a multi-layer feature cascade aggregation pyramid structure is adopted based on the traditional Faster-RCNN model. The Faster-RCNN is one of the most famous convolution neural networks used in object detection and recognition tasks, which was firstly proposed in 2016. It replaced the traditional selective search method with the region proposal network (RPN), which improved detection speed significantly. Because of its excellent detection performance, many recent proposed approaches still selected it as the backbone network. The new multi-layer feature cascade aggregation feature pyra mid network (MCA-FPN) combines the deep and shallow semantic feature information to optimize feature utilization and improve the feature representation ability of any size. In order to address the negative effects generated by the imbalanced distribution of samples, a sample asymmetric weighted loss function (SAW-Loss) is proposed, which improves the efficiency of the network training. Experimental results show that the proposed MCA FPN and SAW-Loss modules can improve the mAP of traditional FPN by 2.4% and 1.5% respectively, and the final improved object detection algorithm with both of two modules obtains a mAP of 86.0% on Pascal VOC dataset which is higher than the mAP of 82.1% tested from FPN. The proposed method performs significantly better than most of the existing method, such as FCOS with a mAP of 78.7%, RFBNet with an mAP of 82.2% and PFPNet with a mAP of 84.1%. Video-based methods may make use of two types of information: local information which is obtained from adjacent frames and global information which is extracted from whole video series. We propose two types of information aggregation methods, namely local information aggregation and global information aggregation based on the feature similarity and the attention mechanism, and so as to aggregate features selectively by including more of the correlated feature information and less of the uncorrelated feature information. As such, the network could extract and learn more useful target features and abandon the interfered features. The accuracy of the proposed local global information aggregation methods could be improved by 0.9% and 1.1%, respectively compared with one of the most advanced video-based object detection methods MEGA. By adding both two modules, the mAP of the proposed method reaches 84.6% on the public dataset ImageNet VID, which is 1.7% higher than the mAP of MEGA. The proposed method also demonstrates potentials to detect occluded targets with high confidence.
Cite this version of the work
Chao Li (2022). Video-Based Object Detection in Security Monitoring System. UWSpace. http://hdl.handle.net/10012/18837