Implementing FPGA-optimized Systolic Arrays using 2D Knapsack and Evolutionary Algorithms
Loading...
Files
Date
2022-01-25
Authors
Chan, Long Chan
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Underutilization of FPGA resources is a significant challenge in deploying FPGAs as neural network accelerators.
We propose an FPGA-optimized systolic array architecture improving the CNN inference throughput by orders of magnitude compared to an un-partitioned systolic array through parallelism-aware partitioning of on-chip resources.
We fracture the FPGA into multiple square systolic arrays and formulate the placement of these arrays as a 2D knapsack problem.
We simulate the cycle counts needed for each neural network layer given different systolic array sizes using cycle-accurate systolic array simulator - SCALESim. We generate physical implementation and operating frequencies of systolic arrays placed in uniformly staggered locations on Xilinx VU37P and VU9P Ultrascale+ platforms.
We use the cycle and frequency information in an optimizer coupling CMA-ES evolutionary algorithm and a simple 2D Knapsack solver to discover packable and routable partitioned designs to maximize throughput.
Our experiments' most significant performance improvement comes from the implementation of layers with large kernel sizes. We demonstrate that inference throughput gain of 7-22.7x is possible with a 1.2-7.6x sacrifice of latency.
Our optimization tool can achieve up to ~8x higher throughput gain on eight MLPerf benchmark network topologies. Our tool also generates designs across various latency and throughput combinations, providing a wide degree of design selection.
Description
Keywords
FPGA, Neural Network, Systolic Array, Evolutionary Algorithms