New Convolutional Neural Network Topology with Compressed Information to Enhance Accuracy for Image Classification Task

Loading...
Thumbnail Image

Date

2019-09-23

Authors

Jiang, Yanbing

Advisor

En-Hui, Yang

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Source coding and deep learning are two major branches in the field of information processing. Source coding encodes information that can be summarised with patterns into certain representation without semantic consideration. On the other hand, deep learning utilises multi-layers of representations with increasing levels of abstraction to learn the patterns that cannot be summarised easily. What is interesting is that source coding itself makes great contributions to the field of deep learning. The key that makes deep learning successful is the inclusion of cascading non-linear layers that help the network to abstract multi-level features. Source coding, such as image compression, contains fundamental non-linear operations including quantisation and rounding. How the non-linearity from the compression could further help deep learning is the inspiration of this research even though common sense tells us that compression usually results a worse ability to do recognition. This paper proposes the idea of integrating source coding and deep learning to have better accuracy performance in image classification. Image classification is one of the most popular tasks in the field of deep learning. Based on human vision’s perception to classify object(s) in images, when the images are compressed, such as by JPEG, the human’s recognition ability deteriorates. Nonetheless, it is not usually the case in machine's perspective. Compressed images may be recognised better by machine based on our observation. In order to improve the accuracy of image recognition, this study focuses on improving the pre-processing operation before image input into the neural network. At the meantime, we proposed a new Convolutional Neural Network (CNN) topology, which absorbs original input along with its various compressed versions. JPEG image compression is friendly for human when the images are compressed with higher quality. However, what level of the compressed image is machine friendly is uncertain. This topology facilitates the compressed information across the compression inputs from low to high qualities and lets the machine to learn from all potential compressed information by itself. We trained the topology with proposed Block-by-block training method and were able to increase the accuracy of state-of-art CNN for image classification: 0.374% increase in Top-1 accuracy, 0.346% increase in Top-5 accuracy in terms of Inception V3 model and 0.39% increase in Top-1 accuracy and 0.228% increase in Top-5 accuracy in terms of ResNet-50 V2 model. What's more, we can state that compression can highlight the contrast of the objects and discard interference information which helps our topology improve the accuracy of image classification based on visual observations. Furthermore, we believe the accuracy performance could be even more outstanding if our topology is applied to the state-of-art EfficientNet (published May 2019).

Description

Keywords

Image Compression, Machine Learning, Deep Learning, GPU Utilization

LC Keywords

Citation