Hardware Implementations of the WG-16 Stream Cipher with Composite Field Arithmetic

Zidaric, Nusa

Hardware Implementations of the WG-16 Stream Cipher with Composite Field Arithmetic

Files

Zidaric_Nusa.pdf (44.12 MB)

Date

2014-09-22

Authors

Zidaric, Nusa

Publisher

University of Waterloo

Abstract

The WG stream cipher family consists of stream ciphers based on the Welch-Gong (WG) transformations that are used as a nonlinear filter applied to the output of a linear feedback shift register (LFSR). The aim of this thesis is an exploration of the design space of the WG-16 stream cipher. Five different representations of the field elements were analyzed, namely the polynomial basis representation, the normal basis representation and three isomorphic tower field constructions of F216: F(((22)2)2)2, F(24)4 and F(28)2. Each design option begins with an in-depth description of different field constructions and their impact on the top-level WG transformation circuit. Normal basis representation of elements for each level of the tower was chosen for field constructions F(((22)2)2)2 and F(24)4, and a mixed basis, with polynomial basis for the lower and normal basis for the higher level of the tower for F(28)2. Representation of field elements affects the field arithmetic, which in turn affects the entire design. Targeting high throughput, pipelined architectures were developed, and pipelining was based on the particular field construction: each extension over the prime field offers a new pipelining possibility. Pipelining at a lower level of the tower field reduces the clock period. Most flexible pipelining options are possible for F(((22)2)2)2, a highly regular construction, which permits an algebraic optimization of the WG transformation resulting in two multiplications being removed. High speed, achieved by adequate pipelining granularity, and smaller area due to removed multipliers deem the F(((22)2)2)2 to be the most suitable field construction for the implementation of WG-16. The best WG-16 modules achieve a throughput of 222 Mbit/s with 476 slices used on the Xilinx Spartan-6 FPGA device xc6slx9 (using Xilinx Synthesis Tool (XST) for synthesis and ISE for implementation [47]) and a throughput of 529 Mbit/s with area cost of 12215 GEs for ASIC implementation, using the 65 nm CMOS technology (using Synopsys Design Compiler for synthesis [45] and Cadence SoC Encounter to complete the Place-and-Route phase).