University of Waterloo
Permanent URI for this communityhttps://uwspace.uwaterloo.ca/handle/10012/1
This institution-wide community is designed to house collections of Waterloo created electronic research that span the institution or are not logically subsumed by a single faculty sub-community.
Browse
Browsing University of Waterloo by Author "Aagaard, Mark"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item Abstraction and Refinement Techniques for Ternary Symbolic Simulation with Guard-value Encoding(University of Waterloo, 2022-05-20) Yang, Bo; Aagaard, MarkWe propose a novel encoding called guard-value encoding for the ternary domain {0, 1, X}. Among the advantages it has over the more conventional dual-rail encoding, the flexibility of representing X with either of <0, 0> or <0, 1> is especially important. We develop data abstraction and memory abstraction techniques based on the guard-value encoding. Our data abstraction reduces much more of the state space than conventional ternary abstraction's approach of over-approximating a set of Boolean values with a smaller set of ternary values. We also show how our data abstraction can enable bit-width reduction which helps further simplify verification problems. Our memory abstraction is applicable to any array of elements which makes it much more general than the existing memory abstraction techniques. We show how our memory abstraction can effectively reduce an array to just a few elements even when existing approaches are not applicable. We make extensive use of symbolic indexing to construct symbolic ternary values which are used in symbolic simulation. Lastly, we give a new perspective on refinement for ternary abstraction. Refinement is needed when too much information is lost due to use of the ternary domain such that the property is evaluated to the unknown X. We present a collection of new refinement approaches that distinguish themselves from existing ones by modifying the transition function instead of the initial ternary state and ternary stimulus. This way, our refinement either preserves the abstraction level or only degrades it slightly. We demonstrate our proposed techniques with a wide range of designs and properties. With data abstraction, we usually observe at least 10X improvement in verification time compared to Boolean verification algorithms such as Boolean Bounded Model Checking (BMC), as well as usually at least 2X and often 10X improvement over conventional ternary abstraction. Our memory abstraction significantly improves how the verification time scales with the design parameters and the depth (the number of cycles) of the verification. Our refinement approaches are also demonstrated to be much better than existing ones most of the time. For example, when verifying a property of a synthetic example based on a superscalar microprocessor's bypass paths, with our data abstraction, it takes 505 seconds while both of ternary abstraction and BMC time out at 1800 seconds. The bit-width reduction can further save 44 seconds and our memory abstraction can save 237 seconds. This verification problem requires refinement. If we substitute our refinement with an existing approach, the verification time with the data abstraction doubles.Item Area and Energy Optimizations in ASIC Implementations of AES and PRESENT Block Ciphers(University of Waterloo, 2020-05-25) Yu, Jenny; Aagaard, MarkWhen small, modern-day devices surface with neoteric features and promise benefits like streamlined business processes, cashierless stores, and autonomous driving, they are all too often accompanied by security risks due to a weak or absent security component. In particular, the lack of data privacy protection is a common concern that can be remedied by implementing encryption. This ensures that data remains undisclosed to unauthorized parties. While having a cryptographic module is often a goal, it is sometimes forfeited because a device's resources do not allow for the conventional cryptographic solutions. Thus, smaller, lower-energy security modules are in demand. Implementing a cipher in hardware as an application-specific integrated circuit (ASIC) will usually achieve better efficiency than alternatives like FPGAs or software, and can help towards goals such as extended battery life and smaller area footprint. The Advanced Encryption Standard (AES) is a block cipher established by the National Institute of Standards and Technology (NIST) in 2001. It has since become the most widely adopted block cipher and is applied in a variety of applications ranging from smartphones to passive RFID tags to high performance microprocessors. PRESENT, published in 2007, is a smaller lightweight block cipher designed for low-power applications. In this study, low-area and low-energy optimizations in ASICs are addressed for AES and PRESENT. In the low-area work, three existing AES encryption cores are implemented, analyzed, and benchmarked using a common fabrication technology (STM 65 nm). The analysis includes an examination of various implementations of internal AES operations and their suitability for different architectural choices. Using our taxonomy of design choices, we designed Quark-AES, a novel 8-bit AES architecture. At 1960 GE, it features a 13% improvement in area and 9% improvement in throughput/area² over the prior smallest design. To illustrate the extent of the variations due to the use of different ASIC libraries, Quark-AES and the three analyzed designs are also synthesized using three additional technologies. Even for the same transistor size, different ASIC libraries produce significantly different area results. To accommodate a variety of applications that seek different levels of tradeoffs in area and throughput, we extend all four designs to 16-bit and 32-bit datawidths. In the low-energy work, round unrolling and glitch filtering are applied together to achieve energy savings. Round unrolling, which applies multiple block cipher rounds in a combinational path, reduces the energy due to registers but increases the glitching energy. Glitch filtering complements round unrolling by reducing the amount of glitches and their associated energy consumption. For unrolled designs of PRESENT and AES, two glitch filtering schemes are assessed. One method uses AND-gates in between combinational rounds while the other used latches. Both methods work by allowing the propagation of signals only after they have stabilized. The experiments assess how energy consumption changes with respect to the degree of unrolling, the glitch filtering scheme, the degree of pipelining, the spacing between glitch filters, and the location of glitch filters when only a limited number of them can be applied due to area constraints. While in PRESENT, the optimal configuration depends on all the variables, in a larger cipher such as AES, the latch-based method consistently offers the most energy savings.Item Automated Design Space Exploration and Datapath Synthesis for Finite Field Arithmetic with Applications to Lightweight Cryptography(University of Waterloo, 2020-05-27) Zidaric, Nusa; Aagaard, Mark; Gong, GuangToday, emerging technologies are reaching astronomical proportions. For example, the Internet of Things has numerous applications and consists of countless different devices using different technologies with different capabilities. But the one invariant is their connectivity. Consequently, secure communications, and cryptographic hardware as a means of providing them, are faced with new challenges. Cryptographic algorithms intended for hardware implementations must be designed with a good trade-off between implementation efficiency and sufficient cryptographic strength. Finite fields are widely used in cryptography. Examples of algorithm design choices related to finite field arithmetic are the field size, which arithmetic operations to use, how to represent the field elements, etc. As there are many parameters to be considered and analyzed, an automation framework is needed. This thesis proposes a framework for automated design, implementation and verification of finite field arithmetic hardware. The underlying motif throughout this work is “math meets hardware”. The automation framework is designed to bring the awareness of underlying mathematical structures to the hardware design flow. It is implemented in GAP, an open source computer algebra system that can work with finite fields and has symbolic computation capabilities. The framework is roughly divided into two phases, the architectural decisions and the automated design genera- tion. The architectural decisions phase supports parameter search and produces a list of candidates. The automated design generation phase is invoked for each candidate, and the generated VHDL files are passed on to conventional synthesis tools. The candidates and their implementation results form the design space, and the framework allows rapid design space exploration in a systematic way. In this thesis, design space exploration is focused on finite field arithmetic. Three distinctive features of the proposed framework are the structure of finite fields, tower field support, and on the fly submodule generation. Each finite field used in the design is represented as both a field and its corresponding vector space. It is easy for a designer to switch between fields and vector spaces, but strict distinction of the two is necessary for hierarchical designs. When an expression is defined over an extension field, the top-level module contains element signals and submodules for arithmetic operations on those signals. The submodules are generated with corresponding vector signals and the arithmetic operations are now performed on the coordinates. For tower fields, the submodules are generated for the subfield operations, and the design is generated in a top-down fashion. The binding of expressions to the appropriate finite fields or vector spaces and a set of customized methods allow the on the fly generation of expressions for implementation of arithmetic operations, and hence submodule generation. In the light of NIST Lightweight Cryptography Project (LWC), this work focuses mainly on small finite fields. The thesis illustrates the impact of hardware implementation results during the design process of WAGE, a Round 2 candidate in the NIST LWC standardization competition. WAGE is a hardware oriented authenticated encryption scheme. The parameter selection for WAGE was aimed at balancing the security and hardware implementation area, using hardware implementation results for many design decisions, for example field size, representation of field elements, etc. In the proposed framework, the components of WAGE are used as an example to illustrate different automation flows and demonstrate the design space exploration on a real-world algorithm.Item Hardware Implementations of the Lightweight Welch-Gong Stream Cipher Family using Polynomial Bases(University of Waterloo, 2019-01-28) Sattarov, Marat; Aagaard, MarkIn this thesis we develop a parametrized generic hardware implementation for the Welch-Gong (WG) stream cipher family for low power and low cost applications. WG stream ciphers operate over finite fields, and are comprised of Linear Feedback Shift Register (LFSR) and non-linear WG transformation as filtering function. These stream ciphers provide mathematically proven keystream properties. We begin with design of individual components that perform cryptographic functions. Then we construct WG transformation using these components and perform analysis of dependency between de- sign parameters and circuit area pre place-and-route for ASIC and two FPGAs. We also explored a second implementation approach that uses constant arrays or lookup tables generated with GAP by Zidaric. Finally, instances of the complete cipher of different sizes from WG-5 to WG-16 that output from 1 to 32 bits / cycle are shown, and their performance and area is analyzed for 65nm CMOS technology post place-and-route.Item Optimizations and Hardware Implementations for Composited de Bruijn Sequence Generators(University of Waterloo, 2016-01-21) Yang, Bo; Aagaard, MarkA binary de Bruijn sequence with period 2^n is a sequence in which every length-n sub-sequence occurs exactly once. de Bruijn sequences have randomness properties that make them attractive for pseudorandom number generators. Unfortunately, it is very difficult to find de Bruijn sequence generators with large periods (e.g., 2^{64}) and most known de Bruijn sequence construction techniques are computationally quite expensive. In this thesis we present a set of optimizations that reduces the computational complexity of the de Bruijn sequence generators constructed by the composited construction technique, which is the most effective one we know. We call optimized composited de Bruijn sequence generators "OcDeb". An original (k, n)-composited de Bruijn sequence generator generates a sequence with period 2^{n+k} and uses O(k^2 + nk) bit operations. Our optimizations reduce this to O(klog (k) + log (n)) operations, allow retiming, and enable parallel implementations that produce multiple bits per clock cycle while reusing some combinational hardware. Our optimizations are formulated in lemmas and theorems with proofs. The benefits of OcDeb-k-n over (k, n)-composited de Bruijn sequence generators are demonstrate by comprehensive results in a 65nm CMOS ASIC library. For example, before place-and-route, an instance of OcDeb-32-32 has a period of 2^{64}, an area of 656 GE and a maximum performance of 1.67 Gbps, representing 1.7X and 29.4X improvement on area and performance respectively over the previous implementation method presented by Mandal and Gong; with parallelization, this instance can achieve 8.30 Gbps with an area of 1229 GE. An instance of OcDeb-512-32 has a period of 2^{544}, an area of 7949 GE, and a maximum performance of 1.43 Gbps.Item Optimized Hardware Implementations of Lightweight Cryptography(University of Waterloo, 2017-01-20) Yang, Gangqiang; Aagaard, Mark; Gong, GuangRadio frequency identification (RFID) is a key technology for the Internet of Things era. One important advantage of RFID over barcodes is that line-of-sight is not required between readers and tags. Therefore, it is widely used to perform automatic and unique identification of objects in various applications, such as product tracking, supply chain management, and animal identification. Due to the vulnerabilities of wireless communication between RFID readers and tags, security and privacy issues are significant challenges. The most popular passive RFID protocol is the Electronic Product Code (EPC) standard. EPC tags have many constraints on power consumption, memory, and computing capability. The field of lightweight cryptography was created to provide secure, compact, and flexible algorithms and protocols suitable for applications where the traditional cryptographic primitives, such as AES, are impractical. In these lightweight algorithms, tradeoffs are made between security, area/power consumption, and throughput. In this thesis, we focus on the hardware implementations and optimizations of lightweight cryptography and present the Simeck block cipher family, the WG-8 stream cipher, the Warbler pseudorandom number generator (PRNG), and the WGLCE cryptographic engine. Simeck is a new family of lightweight block ciphers. Simeck takes advantage of the good components and design ideas of the Simon and Speck block ciphers and it has three instances with different block and key sizes. We provide an extensive exploration of different hardware architectures in ASICs and show that Simeck is smaller than Simon in terms of area and power consumption. For the WG-8 stream cipher, we explore four different approaches for the WG transformation module, where one takes advantage of constant arrays and the other three benefit from the tower field constructions of the finite field $\F_{2^8}$ and also efficient basis conversion matrices. The results in FPGA and ASICs show that the constant arrays based method is the best option. We also propose a hybrid design to improve the throughput with a little additional hardware. For the Warbler PRNG, we present the first detailed and smallest hardware implementations and optimizations. The results in ASICs show that the area of Warbler with throughput of 1 bit per 5 clock cycles (1/5 bpc) is smaller than that of other PRNGs and is in fact smaller than that of most of the lightweight primitives. We also optimize and improve the throughput from 1/5 bpc to 1 bpc with a little additional area and power consumption. Finally, we propose a cryptographic engine WGLCE for passive RFID systems. We merge the Warbler PRNG and WG-5 stream cipher together by reusing the finite state machine for both of them. Therefore, WGLCE can provide data confidentiality and generate pseudorandom numbers. After investigating the design rationales and hardware architectures, our results in ASICs show that WGLCE meets the constraints of passive RFID systems.