Eﬃcient Hardware Realization of Convolutional Neural Networks using Intra-Kernel Regular Pruning

Yang, Maurice

Eﬃcient Hardware Realization of Convolutional Neural Networks using Intra-Kernel Regular Pruning

dc.contributor.advisor	Gaudet, Vincent
dc.contributor.author	Yang, Maurice
dc.date.accessioned	2019-05-24T18:47:01Z
dc.date.available	2019-05-24T18:47:01Z
dc.date.issued	2019-05-24
dc.date.submitted	2019-05-17
dc.description.abstract	Convolutional neural networks (CNNs) have proven their success in a wide range of applications. While CNNs boast remarkable performance, they require signiﬁcant computational and memory resources for operation. As research strive towards higher classiﬁcation accuracy, CNN topologies have increased in depth, complexity and size. In response, algorithmic-level optimizations have been proposed to reduce the size of CNNs while retaining classiﬁcation accuracy. While these advances promise savings in theory, they often underperform in practice, especially when adopted into hardware. In order achieve practical savings, algorithmic changes must be considered in perspective of hardware, thus necessitating a software-hardware codesign philosophy. We propose an Intra-Kernel Regular (IKR) pruning scheme to reduce the size and computational complexity of CNNs by removing redundant weights at a ﬁne-grained level without loss in classiﬁcation accuracy. Unlike other pruning methods such as Fine-Grained pruning, IKR pruning maintains regular kernel structures and employs data compression techniques that translate well into hardware. At the hardware level, we propose an FPGAdesign framework targeting IKR-pruned CNNs. The organisational structure of the design enables potential for high parallelism and eﬃcient utilization of on-chip resources. Experimental results in software demonstrate up to 10×reduction in weights and 7×reduction in computation at a cost of less than 1% degradation in accuracy versus the un-pruned case. Evaluation of the accelerator indicate computational speeds up to 77.7 GOP/S (eﬀectively 403 GOP/S) with each DSP eﬀectively performing 0.53 GOP/S.	en
dc.identifier.uri	http://hdl.handle.net/10012/14716
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	neural network	en
dc.subject	FPGA	en
dc.subject	software-hardware codesign	en
dc.title	Eﬃcient Hardware Realization of Convolutional Neural Networks using Intra-Kernel Regular Pruning	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws.contributor.advisor	Gaudet, Vincent
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Yang_Maurice.pdf
Size:: 2.08 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.08 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering