An Analysis Framework for the Quantization-Aware Design of Efficient, Low-Power Convolutional Neural Networks
MetadataShow full item record
Deep convolutional neural network (CNN) algorithms have emerged as a powerful tool for many computer vision tasks such as image classification, object detection, and semantic segmentation. However, these algorithms are computationally expensive and difficult to adapt for resource constrained environments. With the proliferation of CNNs for mobile, there is a growing need for methods to reduce their latency and power consumption. Furthermore, we would like a principled approach to the design and understanding of CNN model behaviour. Computationally efficient CNN architecture design and running inference with limited precision arithmetic (commonly referred to as neural network quantization) have become ubiquitous techniques for speeding up CNN inference speed and reducing their power consumption. This work describes a method for analyzing the quantized behaviour of efficient CNN architectures and subsequently leveraging those insights for quantization-aware design of CNN models. We introduce a framework for fine-grained, layerwise analysis of CNN models during and after training. We present an in-depth, fine-grained ablation approach to understanding the effect of different design choices on the layerwise distributions of weights and activations of CNNs. This layerwise analysis enables us to gain deep insights on how the interaction of training data, hyperparameters, and CNN architecture can ultimately affect quantized behaviour. Additionally, analysis of these distributions can yield additional insights on how information is propagating through the system. Various works have sought to design fixed precision quantization algorithms and optimization techniques that minimize quantization-induced performance degradation. However, to the best of our knowledge, there has not been any prior works focusing on a fine-grained analysis of why a given CNN's quantization behaviour is observed. We demonstrate the use of this framework in two contexts of quantization-aware model design. The first is a novel ablation study investigating the impact of random weight initialization on final trained distributions of different CNN architectures and resulting quantized accuracy. Next, we combine our analysis framework with a novel "progressive depth factorization" strategy for an iterative, systematic exploration of efficient CNN architectures under quantization constraints. We algorithmically increase the granularity of depth factorization in a progressive manner while observing the resulting change in layer-wise distributions. Thus, progressive depth factorization enables the gain of in-depth, layer-level insights on efficiency-accuracy tradeoffs. Coupling fine-grained analysis with progressive depth factorization frames our design in the context of quantized behaviour. Thus, it enables efficient identification of the optimal depth-factorized macroarchitecture design based on the desired efficiency-accuracy requirements under quantization.
Cite this version of the work
Stone Yun (2022). An Analysis Framework for the Quantization-Aware Design of Efficient, Low-Power Convolutional Neural Networks. UWSpace. http://hdl.handle.net/10012/18196