An Analysis Framework for the Quantization-Aware Design of Efficient, Low-Power Convolutional Neural Networks

Yun, Stone

An Analysis Framework for the Quantization-Aware Design of Efficient, Low-Power Convolutional Neural Networks

dc.contributor.advisor	Wong, Alexander
dc.contributor.author	Yun, Stone
dc.date.accessioned	2022-04-29T13:30:17Z
dc.date.available	2022-04-29T13:30:17Z
dc.date.issued	2022-04-29
dc.date.submitted	2022-04-22
dc.description.abstract	Deep convolutional neural network (CNN) algorithms have emerged as a powerful tool for many computer vision tasks such as image classification, object detection, and semantic segmentation. However, these algorithms are computationally expensive and difficult to adapt for resource constrained environments. With the proliferation of CNNs for mobile, there is a growing need for methods to reduce their latency and power consumption. Furthermore, we would like a principled approach to the design and understanding of CNN model behaviour. Computationally efficient CNN architecture design and running inference with limited precision arithmetic (commonly referred to as neural network quantization) have become ubiquitous techniques for speeding up CNN inference speed and reducing their power consumption. This work describes a method for analyzing the quantized behaviour of efficient CNN architectures and subsequently leveraging those insights for quantization-aware design of CNN models. We introduce a framework for fine-grained, layerwise analysis of CNN models during and after training. We present an in-depth, fine-grained ablation approach to understanding the effect of different design choices on the layerwise distributions of weights and activations of CNNs. This layerwise analysis enables us to gain deep insights on how the interaction of training data, hyperparameters, and CNN architecture can ultimately affect quantized behaviour. Additionally, analysis of these distributions can yield additional insights on how information is propagating through the system. Various works have sought to design fixed precision quantization algorithms and optimization techniques that minimize quantization-induced performance degradation. However, to the best of our knowledge, there has not been any prior works focusing on a fine-grained analysis of why a given CNN's quantization behaviour is observed. We demonstrate the use of this framework in two contexts of quantization-aware model design. The first is a novel ablation study investigating the impact of random weight initialization on final trained distributions of different CNN architectures and resulting quantized accuracy. Next, we combine our analysis framework with a novel "progressive depth factorization" strategy for an iterative, systematic exploration of efficient CNN architectures under quantization constraints. We algorithmically increase the granularity of depth factorization in a progressive manner while observing the resulting change in layer-wise distributions. Thus, progressive depth factorization enables the gain of in-depth, layer-level insights on efficiency-accuracy tradeoffs. Coupling fine-grained analysis with progressive depth factorization frames our design in the context of quantized behaviour. Thus, it enables efficient identification of the optimal depth-factorized macroarchitecture design based on the desired efficiency-accuracy requirements under quantization.	en
dc.identifier.uri	http://hdl.handle.net/10012/18196
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.relation.uri	CIFAR-10 image dataset: https://www.cs.toronto.edu/~kriz/cifar.html	en
dc.subject	computer vision	en
dc.subject	machine learning	en
dc.subject	convolutional neural networks	en
dc.subject	artificial intelligence	en
dc.subject	edge computing	en
dc.subject	efficient machine learning	en
dc.subject	deep learning	en
dc.subject	neural network quantization	en
dc.title	An Analysis Framework for the Quantization-Aware Design of Efficient, Low-Power Convolutional Neural Networks	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Systems Design Engineering	en
uws-etd.degree.discipline	System Design Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Wong, Alexander
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Yun_Stone.pdf
Size:: 7.53 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Systems Design Engineering