Show simple item record

dc.contributor.authorYun, Stone 13:30:17 (GMT) 13:30:17 (GMT)
dc.description.abstractDeep convolutional neural network (CNN) algorithms have emerged as a powerful tool for many computer vision tasks such as image classification, object detection, and semantic segmentation. However, these algorithms are computationally expensive and difficult to adapt for resource constrained environments. With the proliferation of CNNs for mobile, there is a growing need for methods to reduce their latency and power consumption. Furthermore, we would like a principled approach to the design and understanding of CNN model behaviour. Computationally efficient CNN architecture design and running inference with limited precision arithmetic (commonly referred to as neural network quantization) have become ubiquitous techniques for speeding up CNN inference speed and reducing their power consumption. This work describes a method for analyzing the quantized behaviour of efficient CNN architectures and subsequently leveraging those insights for quantization-aware design of CNN models. We introduce a framework for fine-grained, layerwise analysis of CNN models during and after training. We present an in-depth, fine-grained ablation approach to understanding the effect of different design choices on the layerwise distributions of weights and activations of CNNs. This layerwise analysis enables us to gain deep insights on how the interaction of training data, hyperparameters, and CNN architecture can ultimately affect quantized behaviour. Additionally, analysis of these distributions can yield additional insights on how information is propagating through the system. Various works have sought to design fixed precision quantization algorithms and optimization techniques that minimize quantization-induced performance degradation. However, to the best of our knowledge, there has not been any prior works focusing on a fine-grained analysis of why a given CNN's quantization behaviour is observed. We demonstrate the use of this framework in two contexts of quantization-aware model design. The first is a novel ablation study investigating the impact of random weight initialization on final trained distributions of different CNN architectures and resulting quantized accuracy. Next, we combine our analysis framework with a novel "progressive depth factorization" strategy for an iterative, systematic exploration of efficient CNN architectures under quantization constraints. We algorithmically increase the granularity of depth factorization in a progressive manner while observing the resulting change in layer-wise distributions. Thus, progressive depth factorization enables the gain of in-depth, layer-level insights on efficiency-accuracy tradeoffs. Coupling fine-grained analysis with progressive depth factorization frames our design in the context of quantized behaviour. Thus, it enables efficient identification of the optimal depth-factorized macroarchitecture design based on the desired efficiency-accuracy requirements under quantization.en
dc.publisherUniversity of Waterlooen
dc.relation.uriCIFAR-10 image dataset:
dc.subjectcomputer visionen
dc.subjectmachine learningen
dc.subjectconvolutional neural networksen
dc.subjectartificial intelligenceen
dc.subjectedge computingen
dc.subjectefficient machine learningen
dc.subjectdeep learningen
dc.subjectneural network quantizationen
dc.titleAn Analysis Framework for the Quantization-Aware Design of Efficient, Low-Power Convolutional Neural Networksen
dc.typeMaster Thesisen
dc.pendingfalse Design Engineeringen Design Engineeringen of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws.contributor.advisorWong, Alexander
uws.contributor.affiliation1Faculty of Engineeringen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages