UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

HopliteBuf FPGA Network-on-Chip: Architecture and Analysis

Loading...
Thumbnail Image

Date

2019-04-22

Authors

Garg, Tushar

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

We can prove occupancy bounds of stall-free FIFOs used in deflection-free, low-cost, and high-speed FPGA overlay Network-on-chips (NoCs). In our work, we build on top of the HopliteRT livelock-free overlay NoC with an FPGA-friendly 2D unidirectional torus topology to propose the novel HopliteBuf NoC. In our new NoC, we strategically introduce stall-free FIFOs in the network and support these FIFOs with static analysis based on network calculus to compute FIFO occupancy, latency, and bandwidth bounds. The microarchitecture of HopliteBuf combines the performance benefits of conventional buffered NoCs (high throughput, low latency) with the cost advantages of deflection-routed NoCs (low FPGA area, high clock frequencies). Specifically, we look at two design variants of the HopliteBuf NoC: (1) Single corner-turn FIFO (W to S), and (2) Dual corner-turn FIFO (W to S+N). The single corner-turn (W to S) design is simpler and only introduces a buffering requirement for packets changing dimension from X ring to the downhill Y ring (or West to South). The dual corner-turn variant requires two FIFOs for turning packets going downhill (W to S) as well as uphill (W to N). The dual corner-turn design overcomes the mathematical analysis challenges associated with single corner-turn designs for communication workloads with cyclic dependencies between flow traversal paths at the expense of small increase in resource cost. Essentially, we resolve an analysis challenge with extra hardware resources. Across a range of 100 synthetically-generated workloads on a 5 x 5 NoC, HopliteBuf outperforms HopliteRT by 1.2-2x in terms of latency, 10% in terms of injection rate, and 30-60% in terms of flowset feasibiliy. These advantages come at the cost of 3-4x higher FPGA resource requirement for buffers and muxes. Our analysis also deliver latency bounds that are not only better than HopliteRT in absolute terms but also tighter by 2-3x allowing us to provision less hardware to meet our specifications.

Description

Keywords

network-on-chip, fpga, programmable-interconnect, unidirectional-torus, reconfigurable-computing

LC Keywords

Citation