NengoFPGA: an FPGA Backend for the Nengo Neural Simulator

Morcos, Benjamin

NengoFPGA: an FPGA Backend for the Nengo Neural Simulator

dc.contributor.advisor	Kapre, Nachiket
dc.contributor.author	Morcos, Benjamin
dc.date.accessioned	2019-08-22T18:36:43Z
dc.date.available	2019-08-22T18:36:43Z
dc.date.issued	2019-08-22
dc.date.submitted	2019-08-09
dc.description.abstract	Low-power, high-speed neural networks are critical for providing deployable embedded AI applications at the edge. We describe a Xilinx FPGA implementation of Neural Engineering Framework (NEF) networks with online learning that outperforms mobile Nvidia GPU implementations by an order of magnitude or more. Specifically, we provide an embedded Python-capable PYNQ FPGA implementation supported with a Xilinx Vivado High-Level Synthesis (HLS) workflow that allows sub-millisecond implementation of adaptive neural networks with low-latency, direct I/O access to the physical world. The outcome of this work is NengoFPGA, a seamless and user-friendly extension to the neural compiler Python package Nengo. To reduce memory requirements and improve performance we tune the precision of the different intermediate variables in the code to achieve competitive absolute accuracy against slower and larger floating-point reference designs. The online learning component of the neural network exploits immediate feedback to adjust the network weights to best support a given arithmetic precision. As the space of possible design configurations of such quantized networks is vast and is subject to a target accuracy constraint, we use the Hyperopt hyper-parameter tuning tool instead of manual search to find Pareto optimal designs. Specifically, we are able to generate the optimized designs in under 500 short iterations of Vivado HLS C synthesis before running the complete Vivado place-and-route phase on that subset, a much longer process not conducive to rapid exploration. For neural network populations of 64–4096 neurons and 1–8 representational dimensions our optimized FPGA implementation generated by Hyperopt has a speedup of 10–484× over a competing cuBLAS implementation on the Jetson TX1 GPU while using 2.4–9.5× less power. Our speedups are a result of HLS-specific reformulation (15× improvement), precision adaptation (3× improvement), and low-latency direct I/O access (1000× improvement).	en
dc.identifier.uri	http://hdl.handle.net/10012/14923
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	neural networks	en
dc.subject	FPGA	en
dc.subject	nengo	en
dc.subject	high-level synthesis	en
dc.title	NengoFPGA: an FPGA Backend for the Nengo Neural Simulator	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws.contributor.advisor	Kapre, Nachiket
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: morcos_benjamin.pdf
Size:: 1.42 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.08 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering