Parallel Multiplier Designs for the Galois/Counter Mode of Operation

Patel, Pujan

Parallel Multiplier Designs for the Galois/Counter Mode of Operation

Files

Final_Thesis_-_Pujan_Patel.pdf (2.02 MB)

Date

2008-06-09T17:33:59Z

Authors

Patel, Pujan

Publisher

University of Waterloo

Abstract

The Galois/Counter Mode of Operation (GCM), recently standardized by NIST, simultaneously authenticates and encrypts data at speeds not previously possible for both software and hardware implementations. In GCM, data integrity is achieved by chaining Galois field multiplication operations while a symmetric key block cipher such as the Advanced Encryption Standard (AES), is used to meet goals of confidentiality. Area optimization in a number of proposed high throughput GCM designs have been approached through implementing efficient composite Sboxes for AES. Not as much work has been done in reducing area requirements of the Galois multiplication operation in the GCM which consists of up to 30% of the overall area using a bruteforce approach. Current pipelined implementations of GCM also have large key change latencies which potentially reduce the average throughput expected under traditional internet traffic conditions. This thesis aims to address these issues by presenting area efficient parallel multiplier designs for the GCM and provide an approach for achieving low latency key changes. The widely known Karatsuba parallel multiplier (KA) and the recently proposed Fan-Hasan multiplier (FH) were designed for the GCM and implemented on ASIC and FPGA architectures. This is the first time these multipliers have been compared with a practical implementation, and the FH multiplier showed note worthy improvements over the KA multiplier in terms of delay with similar area requirements. Using the composite Sbox, ASIC designs of GCM implemented with subquadratic multipliers are shown to have an area savings of up to 18%, without affecting the throughput, against designs using the brute force Mastrovito multiplier. For low delay LUT Sbox designs in GCM, although the subquadratic multipliers are a part of the critical path, implementations with the FH multiplier showed the highest efficiency in terms of area resources and throughput over all other designs. FPGA results similarly showed a significant reduction in the number of slices using subquadratic multipliers, and the highest throughput to date for FPGA implementations of GCM was also achieved. The proposed reduced latency key change design, which supports all key types of AES, showed a 20% improvement in average throughput over other GCM designs that do not use the same techniques. The GCM implementations provided in this thesis provide some of the most area efficient, yet high throughput designs to date.