A Ring Oscillator Based Truly Random Number Generator

by

Stewart Robson

A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied Science in Electrical and Computer Engineering

Waterloo, Ontario, Canada, 2013

© Stewart Robson 2013
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners.

I understand that my thesis may be made electronically available to the public.
Abstract

Communication security is a very important part of modern life. A crucial aspect of security is the ability to identify with near 100% certainty who is on the other side of a connection. This problem can be overcome through the use of random number generators, which create unique identities for each person in a network. The effectiveness of an identity is directly proportional to how random a generator is. The speed at which a random number can be delivered is a critical factor in the design of a random number generator.

This thesis covers the design and fabrication of three ring oscillator based truly random number generators, the first two of which were fabricated in 0.13µm CMOS technology. The randomness from this type of random number generator originates from phase noise in a ring oscillator.

The second and third ring oscillators were designed to have a low slew rate at the inverter switching threshold. The outputs of these designs showed vast increases in timing jitter compared to the first design. The third design exhibited improved randomness with respect to the second design.
Acknowledgements

I would like to thank my supervisors, Dr. Bosco Leung and Dr. Guang Gong, for their time and great assistance with my research.

I would also like to acknowledge my reviewers, Dr. Vincent Gaudet and Dr. Peter Lavine, for their helpful comments and feedback on my work.

I am grateful of fellow graduate student Mohamed Amin for his help and guidance with all Cadence and fabrication related issues as well as for his advice on the architecture of my circuit designs.

My gratitude goes to Canadian Microelectronics Cooperation (CMC) and its staff for fabricating my circuits. Special thanks to Mariusz Jarosz for providing me with the necessary testing equipment.

I would also like to acknowledge and thank Allison Bawden and my family for their love and support.
Dedication

To Allison, Rhonda and John.
# Table of Contents

List of Tables ix

List of Figures x

1 Introduction 1
   1.1 Thesis Organization ........................................... 2

2 Background 4
   2.1 Random Number Generation ..................................... 4
      2.1.1 Linear Feedback Shift Register ........................... 5
      2.1.2 Truly Random Number Generator ............................ 6
   2.2 Randomness Tests ............................................. 10
      2.2.1 Frequency Test ............................................ 11
      2.2.2 Frequency within a Block Test ............................. 12
      2.2.3 Runs Test ................................................ 13
      2.2.4 Longest Run of Ones ....................................... 13
      2.2.5 Discrete Fourier Transform Test .......................... 14
      2.2.6 Serial Test ............................................... 15
      2.2.7 Approximate Entropy ....................................... 16
      2.2.8 Cumulative Summation Test ................................. 17
      2.2.9 Poker Test ................................................ 18
2.3 Definition of Phase Noise and Timing Jitter ........................................ 18
2.4 Phase and Jitter Models for Ring Oscillators ................................. 22
  2.4.1 First Passage Time ............................................................... 22
  2.4.2 Last Passage Time .............................................................. 25
2.5 Impact of Phase Noise on Random Number Generators .................. 27

3 Truly Random Number Generator ......................................................... 28
  3.1 Fast Ring Oscillator Design ....................................................... 28
    3.1.1 Transistor Level Simulation .............................................. 30
  3.2 Design 1 - Current-Starved Voltage Controlled Oscillator ............. 32
    3.2.1 Transistor Level Simulation .............................................. 32
  3.3 Design 2 - Current-Stealing VCO ............................................. 37
    3.3.1 Jitter Calculation .......................................................... 38
    3.3.2 Transistor Level Simulation .............................................. 39
  3.4 Design 3 - Current-Stealing VCO with Modifications .................... 44
  3.5 D Flip-Flop ............................................................................ 48
  3.6 Simulation Jitter Summary ....................................................... 51

4 TRNG Transistor Level Simulation ....................................................... 52
  4.1 Design 1 ............................................................................... 53
    4.1.1 Transistor Level Simulation .............................................. 53
    4.1.2 Randomness Test ............................................................ 54
  4.2 Design 2 ............................................................................... 56
    4.2.1 Transistor Level Simulation .............................................. 56
    4.2.2 Randomness Tests .......................................................... 57
  4.3 Design 3 ............................................................................... 59
    4.3.1 Randomness Tests ............................................................ 60
## 5 Fabrication and Testing

5.1 Buffer Design .............................................. 62
  5.1.1 Layout ................................................. 63
  5.1.2 Parasitic Extraction and Simulations ................. 64

5.2 Design 1 ...................................................... 67
  5.2.1 Layout ................................................. 67
  5.2.2 Parasitic Extraction and Simulations ................. 69

5.3 Design 2 ...................................................... 70
  5.3.1 Layout ................................................. 70
  5.3.2 Parasitic Extraction and Simulations ................. 72

5.4 Layout considerations ....................................... 73
  5.4.1 ESD protection ........................................ 75

5.5 PCB Layout .................................................. 76

5.6 Testing ....................................................... 77
  5.6.1 Design 1 ............................................... 78
  5.6.2 Design 2 ............................................... 86
  5.6.3 Summary ............................................... 91

## 6 Conclusions .................................................. 93
  6.1 Future work ............................................... 94

References ....................................................... 95
## List of Tables

<table>
<thead>
<tr>
<th>Table</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>2.1</td>
<td>One period of a LFSR in Figure 2.1.</td>
<td>6</td>
</tr>
<tr>
<td>2.2</td>
<td>Long run frequency bins.</td>
<td>14</td>
</tr>
<tr>
<td>3.1</td>
<td>Jitter calculation for current-starved VCO</td>
<td>35</td>
</tr>
<tr>
<td>3.2</td>
<td>Jitter calculation for current-stealing VCO</td>
<td>39</td>
</tr>
<tr>
<td>3.3</td>
<td>Transistor sizing chart for Design 2 delay cell.</td>
<td>41</td>
</tr>
<tr>
<td>3.4</td>
<td>Transistor sizing chart for Design 3 delay cells</td>
<td>46</td>
</tr>
<tr>
<td>3.5</td>
<td>Summary of timing jitter from Eldo simulations</td>
<td>51</td>
</tr>
<tr>
<td>4.1</td>
<td>Summary of randomness tests for Design 1</td>
<td>55</td>
</tr>
<tr>
<td>4.2</td>
<td>Summary of randomness tests for Design 2</td>
<td>58</td>
</tr>
<tr>
<td>4.3</td>
<td>Summary of randomness tests for Design 3</td>
<td>60</td>
</tr>
<tr>
<td>5.1</td>
<td>Summary of DPOjet jitter stats for Design 1 for clean clock</td>
<td>81</td>
</tr>
<tr>
<td>5.2</td>
<td>Summary of DPOjet jitter stats for Design 1 for regular clock</td>
<td>83</td>
</tr>
<tr>
<td>5.3</td>
<td>Summary of randomness tests for chip output of Design 1</td>
<td>85</td>
</tr>
<tr>
<td>5.4</td>
<td>Summary of DPOjet jitter stats for Design 3 clock from chip</td>
<td>89</td>
</tr>
<tr>
<td>5.5</td>
<td>Summary of randomness tests for chip output of Design 2.</td>
<td>90</td>
</tr>
</tbody>
</table>
List of Figures

2.1 An example of a 3-bit LFSR with maximal feedback polynomial $x^3 + x^2 + 1$.

2.2 Direct amplification random number generator.

2.3 System level design of TRNG using phase noise.

2.4 FO and SO waveforms with timing jitter PDF.

2.5 Metastability-based TRNG using two inverters.

2.6 Frequency spectrum plots for (a) an ideal periodic signal with frequency $f_c$ and (b) periodic signal with phase noise.

2.7 One period of oscillation with jitter included.

2.8 Time domain plot of absolute jitter.

2.9 Threshold crossing plot for FPT.

2.10 Schematic of a simple inverter delay-cell with noise current.

2.11 Threshold crossing plot for LPT.

3.1 A simple 3-stage ring oscillator.

3.2 Transient graph of the ring oscillator from Figure 3.1.

3.3 Delay cell for a fast 3-stage ring oscillator.

3.4 Transient simulation of the simple inverter ring oscillator. Frequency = 9.51GHz.

3.5 Transistor level schematic of one delay cell for a current-starved inverter VCO.

3.6 Eldo transient simulation of current-starved ring oscillator. Frequency = 76.4MHz.
3.7 Zoomed-in view of the threshold crossing spread after one period of
the current-starved VCO with 250 noise runs. 36
3.8 Threshold crossing histogram of Figure 3.7 at 0.6V. 36
3.9 System design of the current stealing delay cell 38
3.10 Transistor level schematic for one current stealing delay cell 40
3.11 Transient operation a current-stealing VCO. 42
3.12 Zoomed-in view of the threshold crossing spread after one period of
the current-stealing VCO with 250 noise runs. 42
3.13 Threshold crossing histogram of Figure 3.12 at 0.685V 43
3.14 Block diagram of Design 3. 44
3.15 Transistor level schematic for the main path of the Design 3 delay cell 45
3.16 Waveform of one period of the Design 3 VCO. 47
3.17 Timing jitter distribution for Design 3 at 300 noise runs and a thresh-
old of 0.8V. 48
3.18 Sense amplifier DFF schematic. 50
4.1 Simulation test bench for designs 1 and 2 53
4.2 Transistor level simulation of Design 1. 54
4.3 Four-bit distribution poker test for Design 1. 56
4.4 Transistor level simulation for Design 2 57
4.5 Four-bit distribution poker test for Design 2. 59
4.6 Four-bit distribution poker test for Design 3. 61
5.1 Layout of 8-stage slow-speed buffer. 64
5.2 Layout of 13-stage high-speed buffer. 64
5.3 Input and output signal for the slow-speed buffer at 200MHz with a
15pF load. 65
5.4 Input and output signal for the slow-speed buffer at 1GHz with a 15pF
load. 66
5.5 Input and output signal for the high-speed buffer at 1GHz with a 15pF
load. 67
5.6 Layout of Design 1 TRNG
5.7 Layout of one current-starved delay cell
5.8 Design 1 full extraction simulation with 15pF load on each output
5.9 Layout of Design 2 TRNG
5.10 Layout of one current-stealing delay cell
5.11 Design 2 full extraction simulation with 15pF load on each output
5.12 Full submitted chip layout for ICGWTRNG in 0.13um IBM technology
5.13 Schematic for the double-diode ESD protection
5.14 Screen shot of PCB design for testing the chip
5.15 Fast RO output from Design 1. Running frequency = 923MHz
5.16 On-chip Design 1 clock waveform with FO turned off. Running frequency = 72.4MHz
5.17 Threshold crossing histogram for a clean CLK signal with D turned off
5.18 DPOJet eye-diagram and time interval error distribution of the clean clock waveform
5.19 Screen shot of PCB design for testing the chip. Running frequency = 72.4MHz
5.20 Threshold crossing histogram for a clean CLK signal with D turned off
5.21 DPOJet eye-diagram and time interval error distribution of the clean clock waveform
5.22 Design 1 clock and Q output from the chip
5.23 Four-bit distribution poker test for chip output for Design 1
5.24 Design 2 clock output from chip. Frequency = 61MHz
5.25 Threshold crossing histogram for the Design 2 clock from chip
5.26 DPOJet eye-diagram and time interval error distribution of the Design 2 clock from chip
5.27 Design 2 CLK and Q output from the chip
5.28 Four-bit distribution of poker test for chip output for Design 2
Chapter 1

Introduction

Truly random number generators are a crucial part of everyday life in most modern cultures. In this information age, people send emails, call or message friends and make online transactions millions of times per day. Each of these everyday processes is assumed to be safe and confidential. The security of communication depends on the ability of these processes to verify that the people communicating are actually who they say they are. Security can only be accomplished through the distribution of private identities known only by the individual user, known as keys, so that malicious entities cannot impersonate anyone and/or cause some form of harm. A private key is a large randomly generated number that is unique to the user. To establish a safe connection, a public identity, or public key, that can be shared with others is created. An example of how to establish a safe connection is illustrated by the Diffie-Hellman Key Exchange protocol in [1]. A public key is created by taking a large prime number and raising it to the power of the value of the user’s private key. This creates a very large number, ensuring the original key cannot be obtained easily. The randomness of private key numbers determines how safe the actual connections and public keys are from attacks and impersonations. The ability to generate random
numbers is therefore a very important part of the security of communication systems.

Most random numbers in cryptology systems are generated using a linear feedback shift register (LFSR) or a combination of LFSRs. A simple LFSR is an $n$-bit long shift register with a series of XOR logic gates fed back to the first register. An LFSR will output every number from 1 to $2^n - 1$, where $n$ is the number of registers in the LFSR, and the order in which these numbers are outputted is determined by feedback portion. The output of an LFSR is periodic and will follow a pattern indicated by the feedback; because this is a deterministic process, it is known as a pseudo-random number generator (PRNG). Many large numbers can be accessed quickly using an LFSR, but what makes these numbers truly random is the starting position. A truly random number generator (TRNG) is used to determine this point and can be designed in a number of different manners. Within a computing environment, many natural phenomena can be used to create a TRNG, including the number of mouse clicks and their locations on the screen or the number of times a hard drive is accessed within a certain period of time. Other methods develop hardware to create this randomness. This thesis focuses on using the phase noise of a voltage controlled oscillator (VCO) to create this randomness.

1.1 Thesis Organization

This thesis consists of six chapters. Chapter 2 covers the background theory of the oscillator-based TRNG. The different components of the TRNG, including large noise VCOs are discussed in Chapter 3. Chapter 4 illustrates the results of the TRNG system level tests. The extracted design from the microchip is analyzed in
Chapter 5. A discussion of conclusions and recommended improvements is provided.

Chapter 6.
Chapter 2

Background

2.1 Random Number Generation

Random number generators (RNG) are used to create private keys in modern communication security systems. There are two broad types of RNGs: the TRNG, which was the type designed for this thesis, and the PRNG. The TRNG uses real world random occurrences, such as the number of times a computer hard drive is accessed, the number of mouse clicks a user makes, or the thermal noise produced by the circuits themselves, to generate a stream of completely random numbers, or bits. PSRGs are more common because they are easy to implement using an LFSR based structure which will generate a random number using a predetermined list of numbers based on the LFSR feedback function. The randomness comes from selecting a number in the stream that is some value away from the seed or initial value generated by a TRNG. LFSRs can be implemented in both hardware and software.
2.1.1 Linear Feedback Shift Register

An LFSR is simply a shift-register where the input for the next clock edge is generated from some algebraic combination of the register’s current contents [1]. A simple LFSR is given in Figure 2.1. Its size is three bits and the feedback polynomial is $x^3 + x^2 + 1$. This means that the third and second bits are XORed to provide the feedback to the first. This is a maximal-length feedback polynomial because it will provide the most random numbers possible for an LFSR, which is $2^n - 1$.

![Figure 2.1: An example of a 3-bit LFSR with maximal feedback polynomial $x^3 + x^2 + 1$.](image)

The entire output sequence can be seen in Table 2.1, which shows $2^3 - 1$ distinct numbers that will repeat periodically.
Table 2.1: One period of a LFSR in Figure 2.1.

<table>
<thead>
<tr>
<th>Iteration</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>seed</td>
<td>110</td>
</tr>
<tr>
<td>1</td>
<td>100</td>
</tr>
<tr>
<td>2</td>
<td>001</td>
</tr>
<tr>
<td>3</td>
<td>010</td>
</tr>
<tr>
<td>4</td>
<td>101</td>
</tr>
<tr>
<td>5</td>
<td>011</td>
</tr>
<tr>
<td>6</td>
<td>111</td>
</tr>
<tr>
<td>7</td>
<td>110</td>
</tr>
</tbody>
</table>

2.1.2 Truly Random Number Generator

For the TRNG designed in this thesis, thermal noise was used to generate randomness. There are three main ways to use thermal noise to generate random bits [2, 3]. The first is to amplify the resistor thermal noise and then compare it to the DC value of the amplifier output. The final output of the comparator will be random. This design is illustrated below in Figure 2.2.
The second method for generating a random bit-stream is to use the phase noise of an oscillator to create a random noisy clock input to a delay flip-flop (DFF) that has a fast oscillating D input. If the clock is noisy enough, the rising edge of the clock is highly uncertain and the output will be random. A block diagram of a system that implements this is given in Figure 2.3. It consists of two oscillators and a DFF. One oscillator goes to the clock input while the other goes to the D input. If the D input oscillator (denoted as the fast oscillator [FO]) is fast enough compared to the clock input oscillator (denoted as the slow oscillator [SO]) such that the timing jitter (discussed in Section 2.3) of the SO is the same length of time as the period of the FO, the output bit will be equally likely to be a zero or a one. This assumes that the FO has a perfect 50% duty cycle.

Figure 2.2: Direct amplification random number generator.
An added concern when designing for randomness using the second TRNG method is whether the next value of Q can be determined from a known clock edge, given that the average frequency of both the D and clock inputs are known. This problem is illustrated in Figure 2.4, which shows the probability density function (pdf) for the clock jitter as well as the D and clock input waveforms. For a random output, the chance of the output being a one or a zero should be equal or 50% for each. Using the pdf, it is known that the total area under the curve is equal to 1, corresponding to 100% of all clock edge threshold crossings. The probability of the D input equalling 1 when the clock edge rises is \( P(D = 1) = P(a < Z < b) + P(c < Z < d) \). Similar to a Z-test, \( P(a < Z < b) \) and \( P(c < Z < d) \) are equal to the area of the shaded regions a-b and c-d, respectively, over the whole area. From this, it can be determined that the standard deviation of the jitter should be wide enough such that the combined sum of the shaded regions on the pdf will be equal to 0.5 or 50% of the pdf. In other words, the value D will be equally likely to be a one or a zero [4].
CHAPTER 2. BACKGROUND

Figure 2.4: FO and SO waveforms with timing jitter PDF.

The last method to create a TRNG is to employ a metastable circuit that uses noise to push the output to one state or another. One design which is covered extensively by Intel is shown in Figure 2.5 [5].

Figure 2.5: Metastability-based TRNG using two inverters.

The operation of this TRNG is simple in theory: two inverters are connected to each
other’s inputs. This type of configuration might usually be used as a refresher to hold the output for a dynamic latch to stop leakage, but since both are connected to $V_{dd}$ through clock-controlled transistors, both the inputs and outputs will go high when the clock goes low. When the clock goes high and disconnects the $V_{dd}$, both sides force the other to lower to half $V_{dd}$ due to both inverters acting on each other’s input. This halfway point is the metastable state and the TRNG will stay here until thermal noise causes one inverter to overpower the other, forcing the output of the stronger inverter to go to zero and causing its input to swing to one. The challenging aspect of this configuration is making sure that it is highly process-voltage-temperature (PVT) variation resistant; otherwise, if the switching threshold is not identical and exactly $V_{dd}/2$, the metastable state will never be reached and the randomness of this TRNG will be ruined.

2.2 Randomness Tests

If the output bit-stream of a TRNG is predictable, such as if it always has a large percentage of ones, it would be more vulnerable to an attacker determining the seed and thus cracking the PRNG and decrypting the data. This would make for a very poor TRNG. To avoid this issue, there is a suite of tests that can be performed on a stream of bits to determine if the randomness is acceptable using statistical analysis. This package of tests was assembled by the National Institute of Standards and Technology (NIST) for application in the testing of random number generators [6, 7]. For each test, a data bit-stream ($\epsilon$) with length $n = 20,000$ was used. This particular length was chosen because it was known to be achievable by the available lab testing equipment. Since the longest bit-stream possible was 20,000, certain tests
in the NIST package were excluded due to lack of accuracy.

Each test generates a one-tail probability (P-value) for the null hypothesis that the bit-stream given is random. A confidence interval of 99% was used as outlined in [6]; a P-value greater than 0.01 would therefore result in a pass for that particular test. As another measure of precaution, NIST recommends that when using a 99% confidence level, 100 bit-streams of 20,000 bits be used from the number generator to verify that it is indeed random. If any number lower than 100 is tested, a lower confidence should be used. For a 99% confidence level including standard deviation, 96 of the 100 tests must pass for the number generator to be considered random. Some of the more advanced statistical functions are outlined in the NIST reference. In this section, the randomness tests used in this work are introduced.

2.2.1 Frequency Test

The purpose of the frequency test is to assess the distribution of ones and zeros in the bit-stream output. Ideally, for a random sequence, there should be the same number of ones as zeros, but that will not always be the case and the test suite outlines the acceptable error.

The procedure for the frequency test is to use Equation (2.1) to solve for the P-value;

\[
P = erfc\left(\frac{\left|\sum \epsilon_i - 1\right|}{\sqrt{n}}\right) \tag{2.1}
\]

where \(\epsilon_i\) is one bit in the \(i^{th}\) position of the bit-stream and \(n\) is the length of the bit-stream. \(erfc(z)\) is the complementary error function.
CHAPTER 2. BACKGROUND

The frequency test is passed if there is no evidence to indicate that the tested sequence is non-random, i.e. the P-value is greater than or equal to 0.01 (or a 99% confidence level). For a bit-stream of length 20,000, the acceptable number of ones more than zeros and vice versa is 364.

2.2.2 Frequency within a Block Test

The frequency within a block test involves determining how many ones are within a block of length $M$ bits and comparing this number to the frequency expected under the assumption of truly random input, $M/2$. The number of blocks $N$ is defined as the length of the bit-stream $n$ divided by the length of each block $M$. For these tests, $n$ was set to 20,000 and $M$ was set to 0.01$n$, therefore $M$ was determined to be 200 and the total number of blocks inspected $N$ was 100. The frequency within a block test involves first calculating the proportion of ones in each block:

$$\pi_i = \frac{\sum_{j=1}^{M} \epsilon(i-1)M+j}{M}$$  \hspace{1cm} (2.2)

How close the proportions are to 50% is then determined:

$$\chi^2_{obs} = 4M \sum_{i=1}^{N} (\pi_i - \frac{1}{2})^2$$  \hspace{1cm} (2.3)

$$P = Q \left( \frac{N}{2}, \frac{\chi^2_{obs}}{2} \right)$$  \hspace{1cm} (2.4)

The Q function is the complementary incomplete gamma function. The frequency within a block test is passed if the P-value is greater than or equal to 0.01.
2.2.3 Runs Test

The runs test looks for long strings of either ones or zeros that are uninterrupted. It will analyze the bit-stream to determine if the oscillation between zeros and ones is occurring too quickly (a deterministic signal that resembles a clock) or too slowly (a constant dc signal that is also deterministic). The number of switches can be determined by the following two equations:

\[ V_{n_{\text{obs}}} = \sum_{k=1}^{n-1} r(k) + 1 \]  \hspace{1cm} (2.5)

\[ r(k) = \begin{cases} 
0 & \text{if } \epsilon_k = \epsilon_{k+1} \\
1 & \text{otherwise} 
\end{cases} \]  \hspace{1cm} (2.6)

The deciding criteria for the passing of this test can be obtained through Equation (2.7).

\[ P = \text{erfc} \left( \frac{|V_{n_{\text{obs}}} - 2n\pi(1 - \pi)|}{2\sqrt{2n\pi(1 - \pi)}} \right) \]  \hspace{1cm} (2.7)

where erfc is the complementary error function, \( V_{n_{\text{obs}}} \) is the total number of runs in the bit-stream and \( \pi \) is the proportion of ones in the whole stream. At a 99% confidence level, the number of switches for a 20,000 bit-stream of data was determined to lie within 9,816 and 10,180 switches.

2.2.4 Longest Run of Ones

The longest runs test looks for every longest run of ones within blocks of length \( M \). This distribution of longest runs is then compared to the expected distribution for a random sequence. For the bit-stream length defined by the test, a block length of
128 bits was used. The frequencies of the longest runs for each block were counted and distributed into the bins outlined in Table 2.2.

Table 2.2: Long run frequency bins.

<table>
<thead>
<tr>
<th>$v_i$</th>
<th>Run Length</th>
</tr>
</thead>
<tbody>
<tr>
<td>$v_0$</td>
<td>$\leq 4$</td>
</tr>
<tr>
<td>$v_1$</td>
<td>5</td>
</tr>
<tr>
<td>$v_2$</td>
<td>6</td>
</tr>
<tr>
<td>$v_3$</td>
<td>7</td>
</tr>
<tr>
<td>$v_4$</td>
<td>8</td>
</tr>
<tr>
<td>$v_5$</td>
<td>$\geq 9$</td>
</tr>
</tbody>
</table>

Using these frequencies, the chi-square value was obtained:

$$\chi^2(\text{obs}) = \sum_{i=0}^{K} \frac{(v_i - N\pi_i)^2}{N\pi_i}$$

(2.8)

where $K=5$ and $N=49$ for $M=128$. The P-value was found with the complementary incomplete gamma function:

$$P = Q \left( \frac{K}{2}, \frac{\chi^2(\text{obs})}{2} \right).$$

(2.9)

### 2.2.5 Discrete Fourier Transform Test

The purpose of the discrete Fourier transform (DFT) test is to convert the bit-stream into a spectral graph to determine if there are any high peaks, indicating recurring or periodic patterns.

The DFT test involves first converting all zeros in $\epsilon$ to -1. The magnitude, $M$,
of the DFT of the new bit-stream is then calculated. The 95% threshold value, $T$, is
determined by:

$$T = \sqrt{n \log \frac{1}{0.05}}$$

(2.10)

Assuming the bit-stream is random, 95% of the values in $M$ should not exceed this
value. The normalized difference between the observed and expected number of
frequency components, $d$, is then calculated:

$$d = \frac{(N_1 - N_0)}{\sqrt{n(0.95)(0.05)/4}}$$

(2.11)

where $N_0 = 0.95n/2$ is the expected number of points above the value $T$ and $N_1$ is
the actual observed number. The P-value is found using the complementary error
function:

$$P = \text{erfc}\left(\frac{|d|}{\sqrt{2}}\right).$$

(2.12)

### 2.2.6 Serial Test

Similar to the frequency test, the serial test checks the frequency of $m$-bit patterns
and compares them to the expected number for an assumed random sequence. If $m$
$= 1$, this test is identical to the frequency test.

The serial test uses three different block lengths: $m$, $(m-1)$ and $(m-2)$. Three new
bit-streams are obtained for each block length by appending the first (block length
- 1) bits to the end. This creates exactly $n$ blocks for each block length. The fre-
quencies of all overlapping $m$-, $(m-1)$- and $(m-2)$-blocks which are denoted as $v_{i_1...im}$,
$v_{i_1...im-1}$ and $v_{i_1...im-2}$, respectively. Equations (2.13) and (2.14) are used to prepare
CHAPTER 2. BACKGROUND

to solve for the P-values:

\[
\Psi_m^2 = \frac{2^m}{n} \sum_{i_1 \ldots i_m} \left( v_{i_1 \ldots i_m} - \frac{n}{2^m} \right)^2
\]

\[
\Psi_{m-1}^2 = \frac{2^{m-1}}{n} \sum_{i_1 \ldots i_{m-1}} \left( v_{i_1 \ldots i_{m-1}} - \frac{n}{2^{m-1}} \right)^2
\]

\[
\Psi_{m-2}^2 = \frac{2^{m-2}}{n} \sum_{i_1 \ldots i_{m-2}} \left( v_{i_1 \ldots i_{m-2}} - \frac{n}{2^{m-2}} \right)^2
\]  \hspace{1cm} (2.13)

\[
\nabla \Psi_m^2 = \Psi_m^2 - \Psi_{m-1}^2
\]

\[
\nabla^2 \Psi_m^2 = \Psi_m^2 - 2\Psi_{m-1}^2 + \Psi_{m-2}^2.
\]  \hspace{1cm} (2.14)

Both P-values from Equation (2.15) must be greater than 0.01 to pass this test.

\[
P_1 = Q(2^{m-2}, \nabla \Psi_m^2)
\]

\[
P_2 = Q(2^{m-3}, \nabla^2 \Psi_m^2).
\]  \hspace{1cm} (2.15)

Q is the complementary incomplete gamma function.

2.2.7 Approximate Entropy

The approximate entropy test entails counting the frequency of \( m \) and \((m+1)\)-bit strings and comparing these results against the expected frequency from a random sequence. Firstly, for the \( m \)-bit block length, the bit-stream is appended by the first \( m-1 \) bits in that stream such that there are exactly \( n \) overlapping \( m \)-bit blocks. The frequency of each \( m \)-bit number that occurs is counted from all \( n \) blocks and is represented as \( \#i \), where \( i \) is the decimal number from 0 to \( 2^m - 1 \). The ratio of each number compared to \( n \) is determined by: \( C_i^m = \frac{\#i}{n} \).
\[ \Phi^{(m)} = \sum_{i=o}^{2^m-1} \pi_i \log \pi_i \]  
(2.16)

where \( \pi_i = C_i^m \). This is then repeated for \( m+1 \) to find \( \Phi^{(m+1)} \), the \( \chi^2 \) test in Equation (2.17) is used to compare the observed values to the expected values for randomness:

\[ \chi^2 = 2n[\log2 - (\Phi^{(m)} - \Phi^{(m+1)})] \]  
(2.17)

The P-value is found using the complementary incomplete gamma function:

\[ P - value = Q\left(2^{m-1}, \frac{\chi^2}{2}\right) \]  
(2.18)

2.2.8 Cumulative Summation Test

The purpose of this test is to determine if the random walks starting from both ends of the bit-stream deviate from the average too quickly. The test is enacted by taking the sums of successively larger subsequences from the bit-stream starting from one side. The test statistic \( z \) is the maximum value in the set of sums. The P-value is found with the following equation:

\[
P - value = 1 - \sum_{k=\left(\frac{n}{2}+1\right)/4}^{(n-1)/4} \Phi\left(\frac{z(4k+1)}{\sqrt{n}}\right) - \Phi\left(\frac{z(4k-1)}{\sqrt{n}}\right) + \sum_{k=\left(\frac{n}{2}-3\right)/4}^{(n-1)/4} \Phi\left(\frac{z(4k+3)}{\sqrt{n}}\right) - \Phi\left(\frac{z(4k+1)}{\sqrt{n}}\right) \]  
(2.19)

where \( \Phi \) is the normal cumulative distribution function.
2.2.9 Poker Test

The poker test is no longer a part of the NIST suite, although it is similar to the approximate entropy test. It is used in [2] and provides a graphical representation of the randomness of the stream by plotting the frequency of every non-overlapping 4-bit binary number $i$ in a bar graph. The desired output of this test is for each column in the bar graph to have the same height, indicating that each number is equally likely to occur. If the output displays primarily decimal zeros (0000) or fifteens (1111), it can be inferred that there is a dominant amount of zero or one runs, respectively, in the bit-stream. Alternatively, a large number of fives (0101) and tens (1010) would indicate a deterministic clock-like signal.

2.3 Definition of Phase Noise and Timing Jitter

Phase noise is the frequency domain representation of random changes in the frequency of the carrier signal. It is defined as the ratio of power at a chosen sideband frequency to the power of the carrier. Single-sideband phase noise is calculated using Equation (2.20) as described in [8]:

$$L(f_m) = 10 \log \left( \frac{P_{\text{sideband}}(f_c + f_m, 1Hz)}{P_{\text{carrier}}} \right)$$  \hspace{1cm} (2.20)

where $P_{\text{sideband}}$ is the power of the sideband frequencies, $f_c$ is the carrier frequency or oscillating frequency of an ideal oscillator, $f_m$ is the frequency offset from the carrier to the sideband, and $P_{\text{carrier}}$ is the power of the ideal oscillator signal. Phase noise is measured in dBc/Hz; dBc refers to decibels relative to the carrier, or more simply, how many decibels lower the sideband power is than the carrier.
Frequency spectrum plots of (a) an ideal oscillator and (b) a noisy oscillator are shown in Figure 2.6. The ideal oscillator contains only one tone exactly the frequency of oscillation. In reality, noise can alter the period of oscillation creating other frequencies centred on the carrier. These random frequencies form what is shown as a bell curve in Figure 2.6(b).

![Figure 2.6: Frequency spectrum plots for (a) an ideal periodic signal with frequency $f_c$ and (b) periodic signal with phase noise.](image)

Phase noise of an oscillator can be described below using a Lorentzian spectrum:

$$L(f_m) = 10 \log \left( \frac{1}{\pi f_m^2 c^2 + (\pi f_c^2 c^2)^2} \right)$$

where $c$ is a scalar constant that defines the shape of the phase noise. Equation (2.21) can be simplified if $f_m f_c^2 c$ to Equation (2.22)

$$L(f_m) = 10 \log \left( \frac{f_c^2 c}{f_m^2} \right)$$

A relationship can be formed between phase noise and cycle-to-cycle jitter in the
following equation:

\[ L(f_m) = 10 \log \left( \frac{\sigma_c^2 f_c^3}{f_m^2} \right) \]  

(2.23)

where \( \sigma_c \) is the timing jitter. The previous equations for phase noise assume that the noise source is completely white, meaning flicker (1/f) noise was ignored.

Timing jitter is the measurement of the noise from an oscillator in the time domain. There are two main components of timing jitter: random jitter and deterministic jitter. Only random jitter was considered in this thesis. Jitter is the random deviation in the period length of a periodic signal. Random jitter can be broken down further into cycle-to-cycle jitter and absolute jitter. Cycle-to-cycle jitter, denoted by \( \sigma_c \), is the threshold crossing deviation after one period of oscillation; an example of cycle-to-cycle jitter is shown in Figure 2.7.
Absolute jitter is the accumulation of cycle-to-cycle jitter, and therefore depends on the number of cycles observed. Absolute jitter can be defined as:

$$\sigma_{abs}(t = N\tau_{avg}) = \sum_{n=1}^{N} \tau_n - \tau_{avg}$$  \hspace{1cm} (2.24)

where \( N \) is the number of cycles, \( \sigma_{abs}(t = N\tau_{avg}) \) is the absolute jitter after \( N \) cycles, \( \tau_{avg} \) is the average period of oscillation and \( \tau_n \) is the actual period for a specific cycle. Absolute jitter only becomes a problem when using a free-running oscillator, which is an oscillator whose frequency is not corrected with negative feedback, such as is the case with a phase-locked loop. In a free-running oscillator, it does not matter when the threshold is crossed; it will continue as if nothing has changed. For an
illustration of absolute jitter, refer to Figure 2.8.

The equation for cycle-to-cycle jitter can be obtained using the absolute jitter and making sure enough samples (cycles) are taken.

\[
\sigma_c^2 = \lim_{N \to \infty} \left( \frac{1}{N} \sum_{n=1}^{N} (\tau_n - \tau_{avg})^2 \right)
\]  

(2.25)

2.4 Phase and Jitter Models for Ring Oscillators

2.4.1 First Passage Time

Jitter can be approximated using the first passage time (FPT) method covered in Abidi [9]. This method uses the noise current that integrates over a load capacitance looking at a single delay cell for a ring oscillator. For the first simple case, a two transistor digital CMOS inverter was used. This method is known as FPT because the jitter is measured from the first point that the actual voltage waveform crosses
the threshold level to the expected point that the waveform will cross. Refer to Figure 2.9 for an example of FPT.

\[ \sigma^2_c = \frac{v_n^2}{\left(\frac{\text{I}}{\text{C}}\right)^2} \]  

(2.26)

where \( v_n^2 \) is the noise voltage on the output capacitor and \( \left(\frac{\text{I}}{\text{C}}\right)^2 \) is the slew rate of the output squared. The noise voltage on the load capacitance is simply the noise current from the MOS transistors divided by the capacitance; this equation is discussed in
the paper by Leung [10]

\[
\bar{v}_n^2 = t_d \bar{i}_n^2 = t_d 4kT \frac{2}{3} g_m \frac{C^2}{C^2}
\]  

(2.27)

where \( t_d \) is the time to reach the switching threshold, \( g_m \) is the transconductance of the transistor, \( k \) is the Boltzmann constant, \( T \) is the temperature in Kelvin, \( C \) is the capacitance of the load, \( \bar{i}_n^2 \) is the rms noise current and \( \bar{v}_n^2 \) is the rms noise voltage. Equation (2.27) shows the current noise of a MOS transistor in saturation [11]. Only one noise source is used to simplify the problem displayed in Figure 2.10; the transistors \( M_1 \) and \( M_2 \) and the capacitor \( C \) are considered noiseless. Only the noise current source \( i_{np}^2 \) is considered since the \( g_m \) of \( M_2 \) will be considerably smaller as it will have been turned off, making \( i_{n2}^2 \) much smaller.

Figure 2.10: Schematic of a simple inverter delay-cell with noise current.

Once the jitter has been acquired for one stage and one rise or fall, the following
equation can be used to calculate the total FPT jitter of a ring oscillator:

\[ \sigma_{FPT} = \sqrt{2N \times \sigma_c} \]  

(2.28)

where \( N \) is the number of stages in the ring oscillator. The factor of 2 comes from the fact that, while the jitter calculated in Equation (2.27) was for only one edge, the PMOS and NMOS are assumed to generate the same noise current and therefore the jitter from both the rise and fall times are equal.

### 2.4.2 Last Passage Time

Another consideration with respect to jitter is last passage time (LPT). The difference between FPT and LPT is that LPT assumes that the actual waveform crosses the threshold level many times (as opposed to just once), thus increasing the jitter for that crossing. An exaggerated example of one crossing showing LPT is given in Figure 2.11.
The analysis of LPT provided by Leung [10] is complex; there is no closed-form solution for jitter using Leungs LPT calculations, but the cumulative distribution function (CDF) is described by Equation (23) in that paper. From this equation, it can been seen that the important factors for LPT are the slew rate at the threshold voltage, the threshold voltage, and the time that it takes to cross the threshold. A simplified closed-form solution was later devised by Leung in [12]:

\[
\sigma_{LPT} = \sqrt{2\bar{\sigma}_c^4 + \theta \bar{\sigma}_c^2}
\]  

(2.29)

where \(\sigma_{LPT}\) is the LPT for one stage and one edge, \(\theta\) is the time to reach the barrier or voltage threshold, and \(\bar{\sigma}_c\) is the total current noise divided by the load capacitance and slew rate for one stage and one edge. From Equation (2.29) it can be seen that
the total jitter is a combination of the FPT variance $\theta \sigma_c^2$ and a new term $\bar{\sigma}_c^4$ which demonstrates that LPT can be much greater than FPT because of the term raised to the fourth power.

From both models, it becomes apparent that a low slew rate is the key to increasing noise in a ring oscillator. The trade-off is the frequency of the oscillator, since more noise is introduced as the speed is reduced.

### 2.5 Impact of Phase Noise on Random Number Generators

The more noise the SO can produce, the slower the FO needs to be to still perform at the required levels. This is important because the FO frequency is upper bounded by the fabrication technology. The speed at which the seed can be delivered is determined by the SO which is required to recover the DFF output signal. The desired waveform would therefore need to be fast enough to achieve the speed requirements of the TRNG but also have a relatively low slew rate at the threshold level to increase timing jitter and improve random number generation results. The approach covered in Section 3.3 seeks to accomplish these tasks.
Chapter 3

Truly Random Number Generator

In this chapter, individual components of the TRNG are designed and tested. The Cadence software was used to produce simulations using the IBM 0.13\(\mu\)m technology provided by Canadian Microelectronics Corporation (CMC). Timing jitter for the SO was calculated using the noisetran function in the Eldo software [13]. One period was run multiple times to obtain the threshold crossing distribution. Timing jitter is the standard deviation of the normal distribution.

3.1 Fast Ring Oscillator Design

For the D input of the DFF a specifically fast oscillator was required. The oscillator was required to be sufficiently fast so as to have one period of oscillation contained within the timing jitter of the clock input to the DFF, as was illustrated in Figure 2.4. This ensured that if at any time the FO had a 50% duty-cycle, the output would have had an equally likely chance of a one or a zero.

The easiest way to achieve the FO requirements was to implement a 3-stage simple
inverter ring oscillator (RO). A RO is able to achieve fast speeds, as well as having a saturated output making the design of the DFF simpler. An odd number of stages is required to allow for oscillation since output is single ended and needs to be inverted. The minimum number of stages, three, was chosen to minimize the delay. One stage of a ring oscillator provides one unit of delay and is usually denoted as the delay cell.

![Figure 3.1: A simple 3-stage ring oscillator](image)

The frequency of the simple ring oscillator is determined from the delay of each stage [14]. A time domain graph of the three node voltages from Figure 3.1 is displayed in Figure 3.2. The equation used to calculate frequency of a simple ring oscillator is as follows:

\[
    f_o = \frac{1}{2N t_p} \tag{3.1}
\]

where \( N \) is the number of stages and \( t_p \) is the propagation delay of one cell. \( t_p \) can be replaced with 69% of the inverter’s time constant shown in Equation (3.2) using \( R \) as the equivalent resistance of the 'on' transistor in one of the inverters, and \( C \), the total capacitance at the node.

\[
    f_o = \frac{1}{2N \times 0.69RC} \tag{3.2}
\]

29
3.1.1 Transistor Level Simulation

The FO was designed to be as fast as a saturated ring oscillator can be. Since the simple inverter is single-ended, a minimum of three stages were needed to obtain the feedback inversion for the ring oscillator to oscillate. The strength of each delay cell was increased until the frequency gains levelled off due to increased capacitive load. The supply voltage was set to 1.2V, the recommend voltage level for the 0.13µm IBM CMOS technology, but could also be raised to increase speed if necessary. One of the three delay cells is shown in Figure 3.3; sizes were chosen to ensure adequate trade-off between driving power and load capacitance.
The final design output waveform is given in Figure 3.4. The output is almost sinusoidal and has a frequency of 9.5GHz and a duty cycle of 50%.

Figure 3.4: Transient simulation of the simple inverter ring oscillator. Frequency = 9.51GHz.
Since this ring oscillator and its slew rate are so fast, noise was considered negligible and ignored during the system level simulations.

### 3.2 Design 1 - Current-Starved Voltage Controlled Oscillator

The current-starved VCO is a versatile oscillator that allows control over both the rise and fall delays of the inverter by adjusting the bias voltages of the top and bottom transistors. A single delay cell is shown in Figure 3.5. All top and bottom transistors for the VCO are controlled by a current mirror with external control of the resistor values. This control allows for easy adjustment of the slew rate of each delay cell, which affects the jitter.

#### 3.2.1 Transistor Level Simulation

A nine-stage VCO was created using the delay cell in Figure 3.5. Both current mirrors were fixed to supply 150\(\mu\)A in order to create a 50% duty cycle clock signal. One period of the current starved VCO output is shown in Figure 3.6. The frequency of operation was approximately 75MHz.
Figure 3.5: Transistor level schematic of one delay cell for a current-starved inverter VCO.
The output capacitance of each delay cell was found to be approximately 35fF. Using this capacitance and the slew rates found at the switching threshold of the rising and falling edges of Figure 3.6, the jitter was estimated using FPT:

\[
\frac{v_n^2}{n} = \frac{t_{\text{rise}}^2}{C_L^2}
\]  

(3.3)

\[
\sigma_{\text{tot}} = \sqrt{\frac{n}{2} \left( \frac{v_{\text{rise}}^2}{SR_{\text{rise}}^2} + \frac{v_{\text{fall}}^2}{SR_{\text{fall}}^2} \right)}
\]  

(3.4)
Table 3.1: Jitter calculation for current-starved VCO

<table>
<thead>
<tr>
<th></th>
<th>Falling</th>
<th>Rising</th>
</tr>
</thead>
<tbody>
<tr>
<td>$i_{n\text{tot}}^2$</td>
<td>$2.03 \times 10^{-24}$</td>
<td>$2.98 \times 10^{-24}$</td>
</tr>
<tr>
<td>$t_d$</td>
<td>600ps</td>
<td>542ps</td>
</tr>
<tr>
<td>$v_{n\text{tot}}^2$</td>
<td>$9.94 \times 10^{-7}$</td>
<td>$1.32 \times 10^{-6}$</td>
</tr>
<tr>
<td>Slew Rate</td>
<td>$7.8 \times 10^8$</td>
<td>$9.53 \times 10^8$</td>
</tr>
<tr>
<td>$\sigma^2$</td>
<td>$1.63 \times 10^{-24}$</td>
<td>$1.45 \times 10^{-24}$</td>
</tr>
<tr>
<td>Total Jitter(N=9)</td>
<td></td>
<td>3.73ps</td>
</tr>
</tbody>
</table>

Using the jitter obtained from Equation (3.4), and Table 3.1 the number of samples for a noise run was obtained. Equation 3.5 from [15] was used to obtain a sample size that would provide a 95% confidence with an error(E) of ± of 0.5ps with a standard deviation or jitter of 4ps:

$$n = \left( \frac{z_{\alpha/2} \sigma}{E} \right)^2 \quad (3.5)$$

An $n$ of approximately 250 was obtained; this value was used in the Eldo noisetran simulation below.
Figure 3.7: Zoomed-in view of the threshold crossing spread after one period of the current-starved VCO with 250 noise runs.

Figure 3.8: Threshold crossing histogram of Figure 3.7 at 0.6V.

The standard deviation, and hence the jitter, of the threshold crossing histogram in

36
3.3 Design 2 - Current-Stealing VCO

In general, in order to increase the amount of noise in a VCO, the slew rate must be decreased at the threshold. Decreasing the slew rate in turn makes the VCO slower. It was desired to make a faster VCO since it was going to be used as a clock input to the DFF. This same clock signal will be used to recover the noisy output bits. The speed of the noisy VCO was therefore the same speed at which the RNG seed was delivered. A trade off was usually required between increasing the speed of system and increasing the randomness of the ring oscillator based TRNG.

One way to alleviate the issue of low slew rate/fast VCO is to create a fast clock has a low slew rate only as it passes the switching threshold. This was achieved using switch controlled current-stealing. Essentially, as a delay cell charges or discharges the capacitive load at the output, a switch triggers a mechanism to steal away that charging current from the delay cell. Less charging current results in a decreased slew rate thus fulfilling the goal of the circuit.

A system level design of one delay cell is given in Figure 3.9. The switch S1 controls when $I_{STEAL}$ turns on and is itself controlled by the two circuits. The first circuit is the rising edge control path and controls the precise moment at which S1 is triggered on. The second path, called the falling edge control path governs the transmission gate which in turn controls when S1 is turned off and the low slew phase ends.
The equations for Design 2 FPT jitter are similar to those in Design 1, but because of the low-slewing phase of Design 2, LPT has a greater impact on overall timing jitter. The equations for LPT for a current-stealing oscillator are covered in detail in the technical report from Leung [12].

![Figure 3.9: System design of the current stealing delay cell](image)

### 3.3.1 Jitter Calculation

The output capacitance of each delay cell was found to be approximately 240fF. Using this value and the slew rates found at the switching threshold of the rising and falling edges of Figure 3.11, the jitter was estimated using FPT.
CHAPTER 3. TRULY RANDOM NUMBER GENERATOR

Table 3.2: Jitter calculation for current-stealing VCO

<table>
<thead>
<tr>
<th></th>
<th>Falling</th>
<th>Rising</th>
</tr>
</thead>
<tbody>
<tr>
<td>(i_{n\text{tot}}^2)</td>
<td>7.45 \times 10^{-24}</td>
<td>3.8 \times 10^{-24}</td>
</tr>
<tr>
<td>(\tau_d)</td>
<td>750ps</td>
<td>700ps</td>
</tr>
<tr>
<td>(v_{n\text{tot}}^2)</td>
<td>9.46 \times 10^{-7}</td>
<td>1.32 \times 10^{-6}</td>
</tr>
<tr>
<td>Slew Rate</td>
<td>6.09 \times 10^8</td>
<td>2.58 \times 10^8</td>
</tr>
<tr>
<td>(\sigma^2)</td>
<td>2.62 \times 10^{-25}</td>
<td>7.08 \times 10^{-24}</td>
</tr>
<tr>
<td>Total Jitter(N=7)</td>
<td></td>
<td>5.29ps</td>
</tr>
</tbody>
</table>

Using equation (3.5) and the results from Table 3.2, \(n\) was determined to be 250 while the error was approximately the same as in Design 1 at 0.6ps.

3.3.2 Transistor Level Simulation

A transistor level schematic of the system level design is illustrated in Figure 3.10. The main path consists of the primary delay cell, \(M_1\) and \(M_2\), and the stealing-transistor \(M_3\). This path behaves similar to a regular delay cell in a VCO but with the added control of the stealing-transistor. The stealing-transistor is governed by the control circuitry which consists of the rising and falling edge control paths. The falling edge control path uses the previous signal of the VCO to correctly time the opening and closing of the transmission gates to denote how long the low-slew phase will be active. The rising edge control path is a delay path of the input signal to the stealing-transistor. The rising edge control path was designed such that it was moderately faster than the main path delay, thus ensuring the signal \(V_c\) would go high before \(V_{out}\), thereby turning on the stealing transistor and activating the low-slew phase around the switching threshold. Only half of each of the transmission gates are present in Figure 3.10 because they are only concerned with passing one level.
CHAPTER 3. TRULY RANDOM NUMBER GENERATOR

The PMOS transmission gate $M_{10}$ is used to pass a "1" through to the stealing-transistor, and since PMOS can pass a one without the $V_{th}$ decrease, the NMOS of the transmission gate is not needed. The same applies for the NMOS transmission gate, since it only passes a 0 which an NMOS can accomplish alone [16]. The sizes of the current-stealing delay-cell are given in Table 3.3.

For the simulations, the voltage supply was set to the recommended value of 1.2V and the simulation was run for 20ns.

Figure 3.10: Transistor level schematic for one current stealing delay cell
Table 3.3: Transistor sizing chart for Design 2 delay cell.

<table>
<thead>
<tr>
<th>Path</th>
<th>Transistor</th>
<th>Width</th>
<th>Length</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Main Delay Path</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Main Inverter</td>
<td>M1</td>
<td>50um</td>
<td>0.6um</td>
</tr>
<tr>
<td>M2</td>
<td>4.96um</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Stealing Transistor</td>
<td>M3</td>
<td>7.48um</td>
<td>0.6um</td>
</tr>
<tr>
<td><strong>Rising Edge Control Path</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>First Inverter (starved)</td>
<td>M4A</td>
<td>10um</td>
<td>0.12um</td>
</tr>
<tr>
<td>M4</td>
<td>3.84um</td>
<td></td>
<td></td>
</tr>
<tr>
<td>M5</td>
<td>1.44um</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Second Inverter</td>
<td>M6</td>
<td>1.28um</td>
<td>0.12um</td>
</tr>
<tr>
<td>M7</td>
<td>0.48um</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Third Inverter</td>
<td>M8</td>
<td>3.84um</td>
<td>0.12um</td>
</tr>
<tr>
<td>M9</td>
<td>1.44um</td>
<td></td>
<td></td>
</tr>
<tr>
<td>PMOS Transmission</td>
<td>M10</td>
<td>4um</td>
<td>0.12um</td>
</tr>
<tr>
<td>NMOS Transmission</td>
<td>M11</td>
<td>1um</td>
<td>0.12um</td>
</tr>
<tr>
<td><strong>Falling Edge Control Path</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Gate Inverter</td>
<td>M12</td>
<td>1.28um</td>
<td>0.12um</td>
</tr>
<tr>
<td>M13</td>
<td>0.48um</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

A simulation frequency of 60MHz was achieved for the complete ring oscillator. The operation of the current-stealing VCO is further explained in the Figure 3.11. Figure 3.11(b) clearly shows that the gate signal is the inversion of the input from the previous stage, and that $V_{gate}$ creates a window for the control signal to pass through. Figure 3.11(c) shows that the waveform has a low-slew phase at around 0.8V controlled by the signal $V_c$, this voltage was targeted to be the threshold value for the main inverter.
CHAPTER 3. TRULY RANDOM NUMBER GENERATOR

Figure 3.11: Transient operation a current-stealing VCO.

Figure 3.12: Zoomed-in view of the threshold crossing spread after one period of the current-stealing VCO with 250 noise runs.
Figure 3.13: Threshold crossing histogram of Figure 3.12 at 0.685V

The standard deviation, and hence the jitter, of the threshold crossing histogram in Figure 3.13 was calculated to be 4.13ps.

A high switching threshold at the level of the low-slew phase was desired. Increasing an inverting switching threshold can be achieved by either increasing the strength of the PMOS or decreasing the strength of the NMOS. Altering the strength of a transistor can be accomplished by number of different methods. The first and simplest method for a full-custom design is to vary the size ratio (W/L) of the transistor. This changes the equivalent on resistance of the transistor and thereby alters the charging current. When the strength of the PMOS transistor in an inverter is increased, a higher input voltage is needed to turn the PMOS off and allow the inverter output to ground. Designing the main delay path inverter in the current-stealing delay cell to achieve a high threshold voltage proved to be problematic, nevertheless a solu-
tion is proposed in section 3.4. The problem stemmed from increasing the \( W \) of \( M_1 \) to increase its strength. The capacitive load of the previous stage increased as \( W \) increased based on the equation for capacitance of the gate given below:

\[
C_{gs1} = \frac{2}{3} W L C_{ox}.
\]  

(3.6)

The increase in capacitance load affected the speed and timing of each stage and made achieving the desired results difficult, resulting in the inability of the design to oscillate.

3.4 Design 3 - Current-Stealing VCO with Modifications

A simple solution to the problem discussed in Section 3.3 was to insert simple inverters with higher thresholds in between two current-stealing stages and allow the current-stealing main delay path inverter to obtain a balanced size ratio. This permitted more control over the threshold value. This new VCO is illustrated in Figure 3.14. The Design 3 VCO was be able to produce more jitter than Design 2 because it fully took advantage of the low slew rate portion of the current-stealing cell waveform.

![Figure 3.14: Block diagram of Design 3.](image)

The final modification to Design 2 aimed to add additional noise to the current-stealing delay cell without altering the slew rate. This was accomplished by con-
nectoring the drain of two transistors, an NMOS and a PMOS, to the output of the current-stealing stage. The transistor level schematic of the Design 3 delay cell is shown in Figure 3.15. The two transistors were controlled with a current mirror which forced an equal current so that, when performing KCL at the output node, no additional current was allowed to enter or leave the output capacitance, assuming no channel length modulation. Since no current was added or removed the slew rate remained unaffected. The total noise of the cell, however, did increase since noise is additive. The drain capacitance $C_{db}$ was much smaller than the gate capacitances of the following stage, hence the total load capacitance was not be altered significantly. The jitter increased with the $g_m$ of these two new transistors. Since the on/off status of the new transistors was controlled by the output node voltage, Design 3 was slightly more complicated due to additional changes in current at specific times in the output node. The current mirrors for these extra noise sources were set to draw 20µA of current. The sizings for the transistors were similar to Design 2 with a few changes, the sizes can be viewed in Table 3.4.

![Figure 3.15: Transistor level schematic for the main path of the Design 3 delay cell](image)
Table 3.4: Transistor sizing chart for Design 3 delay cells

<table>
<thead>
<tr>
<th>Main Delay Path</th>
<th>M1</th>
<th>13.375um/0.6um</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>M2</td>
<td>4.96um/0.6um</td>
</tr>
<tr>
<td>Stealing Transistor</td>
<td>M3</td>
<td>1.7um/0.6um</td>
</tr>
<tr>
<td>Noise Transistors</td>
<td>M4n</td>
<td>50um/0.6um</td>
</tr>
<tr>
<td></td>
<td>M5n</td>
<td>50um/0.6um</td>
</tr>
<tr>
<td></td>
<td>M6n</td>
<td>50um/0.6um</td>
</tr>
<tr>
<td></td>
<td>M7n</td>
<td>50um/0.6um</td>
</tr>
<tr>
<td>Rising Edge Control Path</td>
<td></td>
<td></td>
</tr>
<tr>
<td>First Inverter (starved)</td>
<td>M4A</td>
<td>10um/0.12um</td>
</tr>
<tr>
<td></td>
<td>M4</td>
<td>3.84um/0.12um</td>
</tr>
<tr>
<td></td>
<td>M5</td>
<td>1.44um/0.12um</td>
</tr>
<tr>
<td>Second Inverter</td>
<td>M6</td>
<td>1.28um/0.12um</td>
</tr>
<tr>
<td></td>
<td>M7</td>
<td>0.48um/0.12um</td>
</tr>
<tr>
<td>Third Inverter</td>
<td>M8</td>
<td>3.84um/0.12um</td>
</tr>
<tr>
<td></td>
<td>M9</td>
<td>1.44um/0.12um</td>
</tr>
<tr>
<td>PMOS Transmission</td>
<td>M10</td>
<td>4um/0.12um</td>
</tr>
<tr>
<td>NMOS Transmission</td>
<td>M11</td>
<td>1um/0.12um</td>
</tr>
<tr>
<td>Falling Edge Control Path</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Gate Inverter</td>
<td>M12</td>
<td>1.28um/0.12um</td>
</tr>
<tr>
<td></td>
<td>M13</td>
<td>0.48um/0.12um</td>
</tr>
<tr>
<td>High-Threshold Delay-Cell</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Shift Inverter</td>
<td>M1b</td>
<td>8um/0.12um</td>
</tr>
<tr>
<td></td>
<td>M2b</td>
<td>0.5um/0.12um</td>
</tr>
</tbody>
</table>

One period of the Design 3 output is given in Figure 3.16 and shows an oscillation frequency of 37MHz.
Figure 3.16: Waveform of one period of the Design 3 VCO.
Figure 3.17: Timing jitter distribution for Design 3 at 300 noise runs and a threshold of 0.8V.

The standard deviation, and hence the jitter, of the threshold crossing histogram in Figure 3.17 was calculated to be 76.4ps.

### 3.5 D Flip-Flop

The DFF used for the TRNG and shown in Figure 3.18 was a sense-amplifier flip-flop covered in [17, 18]. The DFF operates using the clock signal and the sense-amplification of the D input and its compliment to control the SR latch at the bottom. While the clock is low $S_b$ and $R_b$ are both set high so that the NAND-based SR latch holds the current state. As soon as the clock goes high, the differential
pair for the two D inputs turns on setting the source of either $M_5$ or $M_7$ to ground and activating one of the inverters ($M_4$-$M_7$), bringing its output, $S_b$ or $R_b$, to ground. Since a NAND SR latch is active low, Q is set to equal D, and the circuit operates as a positive edge triggered DFF. Although setup and hold times are usually crucial factors in DFF and register design, they are not as significant for the TRNG. This is attributed to the nature of the system, it is not necessary for the input to pass all setup and hold conditions; as long as the times are smaller than the period of the D input, most of the output will propagate through the DFF as expected.
CHAPTER 3. TRULY RANDOM NUMBER GENERATOR

Figure 3.18: Sense amplifier DFF schematic.
3.6 Simulation Jitter Summary

Table 3.5: Summary of timing jitter from Eldo simulations.

<table>
<thead>
<tr>
<th>Design</th>
<th>Jitter(ps)</th>
</tr>
</thead>
<tbody>
<tr>
<td>D1</td>
<td>4.3173</td>
</tr>
<tr>
<td>D2</td>
<td>4.1293</td>
</tr>
<tr>
<td>D3</td>
<td>76.4</td>
</tr>
</tbody>
</table>
Chapter 4

TRNG Transistor Level Simulation

While the previous section demonstrated the functionality of the individual components of the TRNG, this section presents the results of the entire system. For the randomness tests, it was computationally inefficient to run 20,000 cycles of the whole system in Eldo to obtain the DFF output. Instead the timing jitter of the SO of each design was used with an ideal FO and DFF to produce the 20,000 bit output-stream. This bit-stream was then tested with the randomness suite. A FO with frequencies 1GHz, 5.5GHz, and 9GHz was used to calculate the output bit-stream from the SO jitter. 1GHz was the highest speed the extracted output buffers in Section 5.1 could transmit. 5.5GHz and 9GHz were the fastest frequencies that the FO could produce, with and without the consideration for parasitics, respectively.
4.1 Design 1

4.1.1 Transistor Level Simulation

Design 1 consists of the FO and the current-starved VCO. A test bench was created and is shown in Figure 4.1.

![Simulation test bench for designs 1 and 2](image)

The two components used as the D and clock inputs to the DFF, respectively, resulted in Figure 4.2. From top to bottom, graphs show the D input, the clock input and the Q output of the DFF. Figure 4.2 shows that the whole system operated correctly. Since this was a Cadence simulation, no noise was applied and the output waveform
was deterministic. The clock frequency for this configuration was approximately 170MHz.

![Figure 4.2: Transistor level simulation of Design 1.](image)

**4.1.2 Randomness Test**

For these tests an ideal FO in Matlab was used as the D input. For the clock, a 75MHz signal with a jitter of 4.32ps was used, as derived from Figure 3.8. A summary of NIST tests performed with 3 FO frequencies for 100 bit-streams is given in Table 4.1.

From the suite of NIST tests [6], it was determined that a number generator with this setup would not be considered random since all the tests did not pass. The frequency histograms, for one sequence, in Figure 4.3 are shown to be sporadic and uneven, indicating that the distribution of bits was deterministic.
Table 4.1: Summary of randomness tests for Design 1

<table>
<thead>
<tr>
<th>Test</th>
<th>1GHz % Pass</th>
<th>Result?</th>
<th>5.5GHz % Pass</th>
<th>Result?</th>
<th>9GHz % Pass</th>
<th>Result?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frequency</td>
<td>23/100</td>
<td>FAIL</td>
<td>92/100</td>
<td>FAIL</td>
<td>33/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Block Frequency</td>
<td>0/100</td>
<td>FAIL</td>
<td>4/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Cumulative Sums (For.)</td>
<td>0/100</td>
<td>FAIL</td>
<td>87/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Cumulative Sums (Rev.)</td>
<td>0/100</td>
<td>FAIL</td>
<td>85/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Runs</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Longest Run</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>FFT</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Approx. Entropy</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Serial 1</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Serial 2</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
</tbody>
</table>
4.2 Design 2

4.2.1 Transistor Level Simulation

The system-wide test simulation was repeated for the second design. Design 2 consisted of the fast RO D input and the current-stealing CLK input. The output waveform is given in Figure 4.4. From top to bottom, graphs show the FO, the output of the current-stealing VCO (V1, blue) and its buffered output (clock, purple), and the
DFF Q output. This is again shown to be in working order but deterministic as no noise was introduced. The clock frequency for this configuration was approximately 200MHz.

![Figure 4.4: Transistor level simulation for Design 2](image)

4.2.2 Randomness Tests

For these tests an ideal FO in Matlab was used as the D input. For the clock, a 60MHz signal with a jitter of 4.13ps was used, as derived from Figure 3.13. A summary of NIST tests performed with 3 FO frequencies for 100 bit-streams is given in Table 4.2.

From the suite of NIST tests [6], it was determined that a number generator with this setup would not be considered random since all the tests did not pass. The frequency histograms, for one sequence, in Figure 4.5 are shown to be sporadic and uneven, indicating that the distribution of bits was deterministic.
### Table 4.2: Summary of randomness tests for Design 2

<table>
<thead>
<tr>
<th>Test</th>
<th>1GHz</th>
<th>Result?</th>
<th>5.5GHz</th>
<th>Result?</th>
<th>9GHz</th>
<th>Result?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frequency</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Block Frequency</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Cumulative Sums (For.)</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Cumulative Sums (Rev.)</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Runs</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Longest Run</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>FFT</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>38/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Approx. Entropy</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Serial 1</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
<tr>
<td>Serial 2</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
<td>0/100</td>
<td>FAIL</td>
</tr>
</tbody>
</table>
CHAPTER 4. TRNG TRANSISTOR LEVEL SIMULATION

4.3 Design 3

The system-wide test was not repeated for design 3 because of the similarity in SO waveforms. The randomness tests from the SO were the only item of interest for this design.

Figure 4.5: Four-bit distribution poker test for Design 2.
4.3.1 Randomness Tests

For the test an ideal FO in Matlab was used as the D input. For the clock, a 37.7MHz signal with a jitter of 76.4ps was used, as derived from Figure 3.17. A summary of NIST tests performed with 3 FO frequencies for 100 bit-streams is given in Table 4.3

From the suite of NIST tests [6], it was determined that the number generator could be considered random at 9GHz because all tests passes at least 96 times. The poker test frequency histograms, for one sequence, in Figures 4.6(b) and 4.6(c) are shown to be level and even, indicating that the distribution of the bits appear to be random. The 5.5GHz tests show that almost all pass except for the entropy test, this shows that this is close to the lowest FO frequency possible. The 1GHz FO was not adequate for producing randomness in the output stream, even with a substantial amount of timing jitter.

Table 4.3: Summary of randomness tests for Design 3

<table>
<thead>
<tr>
<th>Test</th>
<th>1GHz</th>
<th>Result?</th>
<th>5.5GHz</th>
<th>Result?</th>
<th>9GHz</th>
<th>Result?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frequency</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Block Frequency</td>
<td>100/100</td>
<td>PASS</td>
<td>97/100</td>
<td>PASS</td>
<td>99/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Cumulative Sums (For.)</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Cumulative Sums (Rev.)</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
<td>100/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Runs</td>
<td>0/100</td>
<td>FAIL</td>
<td>39/100</td>
<td>FAIL</td>
<td>99/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Longest Run</td>
<td>0/100</td>
<td>FAIL</td>
<td>98/100</td>
<td>PASS</td>
<td>98/100</td>
<td>PASS</td>
</tr>
<tr>
<td>FFT</td>
<td>0/100</td>
<td>FAIL</td>
<td>100/100</td>
<td>PASS</td>
<td>97/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Approx. Entropy</td>
<td>0/100</td>
<td>FAIL</td>
<td>93/100</td>
<td>FAIL</td>
<td>97/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Serial 1</td>
<td>0/100</td>
<td>FAIL</td>
<td>98/100</td>
<td>PASS</td>
<td>98/100</td>
<td>PASS</td>
</tr>
<tr>
<td>Serial 2</td>
<td>0/100</td>
<td>FAIL</td>
<td>99/100</td>
<td>PASS</td>
<td>98/100</td>
<td>PASS</td>
</tr>
</tbody>
</table>
Figure 4.6: Four-bit distribution poker test for Design 3.
Chapter 5

Fabrication and Testing

All fabrication and layout designs were intended for use with the 0.13\(\mu m\) IBM CMOS technology. Minimum sizing for this technology is 120nm length and 160nm width. The standard power supply is 1.2V but can be increased to as high as 1.6V to improve the speed of the oscillators if required [19].

5.1 Buffer Design

All instruments used in testing had input impedances of 10-15pF, therefore in order to analyze the signals properly, each chip output required buffered. To achieve the correct driving ability, a simple inverter chain was used. Using the logical effort method for sizing a chain of inverters, it was determined that the optimum number of stages for a 15pF load was eight using an effective fan-out of three [16]. The layout for this buffer is given in Figure 5.1. Effective fan-out is is defined as the difference in sizes of two consecutive stages of an inverter chain. A fan-out of three therefore means that the widths of the second inverter are three times larger than the widths of the first inverter. For an effective fan-out of three the RC time constant for an
inverter becomes too large and the delay for one stage is longer than the half period of the signal to be buffered, resulting in truncation of the output signal. A slow-speed buffer was therefore used for the sub-1GHz frequency outputs, such as the clocks and Q.

Another buffer was designed with a fan-out of 1.8, allowing the FO (5GHz-10GHz) to be analyzed off chip. Using this a 17 stage buffer was created. A problem was encountered after the 13th stage; there was not enough current being delivered to drive the capacitance of the next stages and the signal was consequently dying. To resolve this issue, only the first 13 stages were used, meaning that output signal was not rail-to-rail. Since the only value that was to be extracted for the FO output was the running frequency, this was decided to be an acceptable loss. The high-speed buffer is shown in Figure 5.2.

5.1.1 Layout

The slow-speed buffer used an area of 150µmx60µm including the large guard ring. The guard ring was included to isolate large fluctuations in inverter supply voltage from the rest of the chip. The number of fingers for each transistor was increased at each stage so as to spread the large charging current onto many wires. This also helped to maintain a compact buffer.
The high-speed buffer was slightly larger at 160\(\mu m\times 60\mu m\) because of the extra stages.

5.1.2 Parasitic Extraction and Simulations

Each buffer was laid out and the parasitic capacitance and resistance were extracted into a new netlist. These new extracted circuits were simulated to determine the performance of each buffer in a situation as close to the actual microchip as possible.
The slow-buffer was tested by passing an ideal sine wave at 200MHz to determine how well the slow VCO and DFF Q output could drive a 15pF scope load. The results of this test are given in Figure 5.3.

![Figure 5.3: Input and output signal for the slow-speed buffer at 200MHz with a 15pF load.](image)

The same test was repeated, where the frequency of the input was changed from 200MHz to 1GHz. The buffer had a difficult time producing a large signal. A peak-to-peak voltage of 200mV was desired in order for a clear signal to appear on the testing oscilloscope. Figure 5.4 shows that the slow-speed buffer could only produce a 150mVp-p signal at 1GHz.
CHAPTER 5. FABRICATION AND TESTING

Figure 5.4: Input and output signal for the slow-speed buffer at 1GHz with a 15pF load.

The high-speed buffer was tested also at 1GHz and was able to produce a 350mVp-p output, as displayed in Figure 5.5. The high-speed buffer could not however go above 1.5GHz without the signal attenuated to an unacceptable level. The FO for both Design 1 and 2 were therefore redesigned to produce an extracted signal frequency of only 1GHz. This was much smaller than the 5.5GHz signal that the extracted fast VCO could produce originally and thus greatly affected the randomness of the Q output for each system.
5.2 Design 1

After testing the high-speed buffer with extracted parasitics, it was determined that it could still not produce a 5GHz output signal, FO for both designs was therefore modified to approximately 1GHz in order to be able to read the output.

5.2.1 Layout

The layout for Design 1 with labelled sections is given in Figure 5.6. Two internal buffer chains were introduced to isolate each oscillator from its load and to supply sharp edge so that the inputs to the DFF were clear digital signals, either 0V or 1.2V, improving function and reducing glitches. An example of one delay cell for the current-starved VCO is illustrated in Figure 5.7. Each group of NMOS transistors was surrounded by a guard ring to prevent latch-up from occurring. The guard
ring design could have been optimized for size by including all the NMOS transis-
tors for the VCO, but it was decided to err on the side of caution and produce a
layout that had the best chance of producing results. Design 1 occupied an area of
155\mu m by 55\mu m.

![Figure 5.6: Layout of Design 1 TRNG.](image)
5.2.2 Parasitic Extraction and Simulations

Design 1 was connected to two slow-speed buffers for the clock and Q signals and one high-speed buffer for the D input, and laid out in its exact form on the microchip to be submitted. The parasitic capacitances of each node were extracted using the CALIBRE tool on Cadence to create a new netlist with all elements included. This netlist was simulated and provided the waveforms shown in Figure 5.8. The D input swung rail-to-rail internally, had a peak-to-peak voltage of 300mV at a frequency of 1.02GHz. The extracted frequency of the noisy clock was 132.12MHz which as expected was smaller than the 170MHz simulated without the parasitic capacitance models. The Q output is shown to have a non-clock like waveform, but was still deterministic since no noise was introduced into the full system simulations.
5.3 Design 2

5.3.1 Layout

The layout for Design 2 with labelled sections is given in Figure 5.9. The area of Design 2 was 6800\(\mu m^2\). An example of one delay cell for the current-stealing VCO is illustrated in Figure 5.10.
Figure 5.9: Layout of Design 2 TRNG.
5.3.2 Parasitic Extraction and Simulations

Design 2 was connected to two slow-speed buffers for the clock and Q signals and one high-speed buffer for the D input, and was laid out in its exact form on the microchip to be submitted. The parasitic capacitances of each node were extracted using the CALIBRE tool on Cadence to create a new netlist with all elements included. This netlist was simulated and provided the waveforms shown in Figure 5.11. The D input that swung rail-to-rail internally, had a peak-to-peak voltage of 300mV at a frequency of 1.02GHz. The extracted frequency of the noisy clock was 86MHz, as expected, was smaller than the 200MHz simulated without parasitic capacitance. Since no noise was introduced into this simulation, the Q output had a deterministic, clock-like waveform. Figure 5.11 shows that Design 2 did function correctly.
Figure 5.11: Design 2 full extraction simulation with 15pF load on each output

5.4 Layout considerations

The full microchip layout, including designs, buffers, ESD protection and metal filling, is given in Figure 5.12. The chip dimensions are 1mm by 1mm. The various parts are highlighted on the figure. The buffers were positioned at the top of the chip in order to isolate the large fluctuations in voltage from the design through the substrate (This placement could have potentially skewed the results of the test by adding more uncertainty making certain VCOs appear better at producing noise than others). Each design, as well as the group of buffers had their own $V_{DD}$ and $V_{SS}$ to further isolate the fluctuations. This also provided the ability to increase or decrease the supply voltage and consequently the speed of the design, allowing for finer control over the operation.
Figure 5.12: Full submitted chip layout for ICGWTRNG in 0.13um IBM technology.
5.4.1 ESD protection

Electrostatic discharge (ESD) protection was included by adding double-diodes provided by the IBM ESD library [20]. The double-diodes were used for all control input signals. If any input signals became higher than $V_{DD}$ or lower than $V_{SS}$, the diodes turned on and redirected the current to the $V_{DD}$ or $V_{SS}$ pads, thus protecting the then gate oxides of the current mirror inputs. Figure 5.13 provides a schematic of the double-diode ESD protection.

For all other pads the very large drains of the last stage of each buffer were considered sufficient protection. For latch-up, all NMOS transistors connected to $V_{SS}$ were separated from PMOS transistors connected to $V_{DD}$ by a guard ring. This prevented...
PNP to NPN connections from forming and sinking too much current, which would otherwise lead to those sections of the chip burning up.

5.5 PCB Layout

A 3” x 3” two-layer PCB was designed to test the chip. SMA connectors and single pins were used for each output to allow for easy testing setup. Jumpers were used for most supply paths as well as for connection of bias inputs to the current mirrors. This provided the ability to easily control what was turned on, as well as measure current in each of these paths. The PCB was fabricated by Albert Printed Circuit Boards.
5.6 Testing

The $0.13 \mu m$ chip was fabricated through CMC and The MOSIS Service company. The layout was successfully tested in previous sections in this chapter to show that Designs 1 and 2 would still function after fabrication. Design 3 was not finalized in time for the design submission deadline, so it was excluded from the fabrication.
5.6.1 Design 1

Due to constraints on the number of output pads on the chip only Design 1 had separate supply control for the FO. This extra control was implemented to help troubleshoot any problems that could be faced during testing. Figure 5.15 show the D input to both designs.

The biasing for the clock of Design 1 was altered to lower the frequency to 70MHz, as to make the comparisons to Design 2 better. The first waveform to be captured was the clock output without the fast RO being turned on. This allowed for a clean signal to be observed without any supply coupling from the other RO to affect the frequency of oscillation.

![Figure 5.15: Fast RO output from Design 1. Running frequency = 923MHz](image-url)

Figure 5.15: Fast RO output from Design 1. Running frequency = 923MHz
A 20,000 bit long waveform from the clean clock in Figure 5.16 was extracted into Matlab and the cycle-to-cycle jitter was calculated to be 17.33ps. The jitter distribution of this clean clock is given in Figure 5.17.
The Tektronix application DPOjet was also used to obtain timing jitter statistics. In Figure 5.18 the eye diagram and time interval error [21] for 50,000 cycle of the clean clock were derived.
Table 5.1: Summary of DPOjet jitter stats for Design 1 for clean clock

<table>
<thead>
<tr>
<th>Description</th>
<th>Mean</th>
<th>Std Dev</th>
<th>Population</th>
</tr>
</thead>
<tbody>
<tr>
<td>TIE</td>
<td>0.000s</td>
<td>143.58ps</td>
<td>50446</td>
</tr>
<tr>
<td>RJdd1</td>
<td>19.097ps</td>
<td>3.6055ps</td>
<td>35</td>
</tr>
<tr>
<td>DJdd1</td>
<td>187.20ps</td>
<td>99.122ps</td>
<td>35</td>
</tr>
</tbody>
</table>

The FO was then connected. The clock waveform in Figure 5.19 showed many distortions and the rails that would effect overall timing.

Figure 5.19: Screen shot of PCB design for testing the chip. Running frequency = 72.4MHz

A 20,000 bit-string from the regular clock in Figure 5.19 was recorded and Matlab was used to calculate the cycle-to-cycle jitter which was 951.713ps. The jitter distribution of this regular clock is given in Figure 5.20. The jitter did not follow a normal distribution so the calculated jitter isn’t as meaningful in regards to comparing numbers to the simulated calculation from from Figure 3.8.
Figure 5.20: Threshold crossing histogram for a clean CLK signal with D turned off

The Tektronix application DPOjet was also used to obtain timing jitter statistics. In Figure 5.18 the eye diagram and time interval error [21].
CHAPTER 5. FABRICATION AND TESTING

Figure 5.21: DPOJet eye-diagram and time interval error distribution of the clean clock waveform

Table 5.2: Summary of DPOJet jitter stats for Design 1 for regular clock

<table>
<thead>
<tr>
<th>Description</th>
<th>Mean</th>
<th>Std Dev</th>
<th>Population</th>
</tr>
</thead>
<tbody>
<tr>
<td>TIE</td>
<td>0.0000s</td>
<td>949.78ps</td>
<td>60259</td>
</tr>
<tr>
<td>RJdd1</td>
<td>152.31ps</td>
<td>148.87ps</td>
<td>10</td>
</tr>
<tr>
<td>DJdd1</td>
<td>1.6215ns</td>
<td>1.2567ns</td>
<td>10</td>
</tr>
</tbody>
</table>

Figure 5.22 shows an example of the Q and clock on-chip outputs.
Randomness Tests

The Design 1 clock was compared to three ideal FO frequencies to obtain three sets of 10 bit-streams to be tested against the 10 bit-streams obtained on-chip. Due to lack of time only 10 bit-streams could be acquired from the clock and Q of Design 1. Table 5.3 provides a summary of the results obtained for the randomness tests. Figure 5.23 shows one poker test distribution for each set of bit-streams tested.
Table 5.3: Summary of randomness tests for chip output of Design 1

<table>
<thead>
<tr>
<th>Test</th>
<th>1GHz</th>
<th>5.5GHz</th>
<th>9GHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frequency</td>
<td></td>
<td>PASS</td>
<td></td>
</tr>
<tr>
<td>Block Frequency</td>
<td>10/10</td>
<td>PASS</td>
<td>10/10</td>
</tr>
<tr>
<td>Cumulative Sums</td>
<td>8/10</td>
<td>PASS</td>
<td>8/10</td>
</tr>
<tr>
<td>Cumulative Sums (Rev.)</td>
<td>9/10</td>
<td>PASS</td>
<td>9/10</td>
</tr>
<tr>
<td>Runs</td>
<td>0/10</td>
<td>FAIL</td>
<td>0/10</td>
</tr>
<tr>
<td>Longest Run</td>
<td>0/10</td>
<td>FAIL</td>
<td>0/10</td>
</tr>
<tr>
<td>Approx. Entropy</td>
<td>0/10</td>
<td>FAIL</td>
<td>0/10</td>
</tr>
<tr>
<td>Serial 1</td>
<td>0/10</td>
<td>FAIL</td>
<td>0/10</td>
</tr>
<tr>
<td>Serial 2</td>
<td>0/10</td>
<td>FAIL</td>
<td>0/10</td>
</tr>
</tbody>
</table>
Figure 5.23: Four-bit distribution poker test for chip output for Design 1

5.6.2 Design 2

The on-chip clock for Design 2 is shown in Figure 5.24. It had a much smaller peak-to-peak voltage than expected but the frequency of 60MHz was close to the extracted simulation frequency. Further testing was required to troubleshoot the operation of the clock output.
A 20,000 bit-string from the regular clock in Figure 5.24 was recorded and Matlab was used to calculate the cycle-to-cycle jitter which was 1.506ns. The jitter distribution of this regular clock is given in Figure 5.25.
Figure 5.25: Threshold crossing histogram for the Design 2 clock from chip

The Tektronix application DPOjet was also used to obtain timing jitter statistics. In Figure 5.26 the eye diagram and time interval error [21].

Figure 5.26: DPOJet eyediagram and time interval error distribution of the Design 2 clock from chip
Table 5.4: Summary of DPOjet jitter stats for Design 3 clock from chip

<table>
<thead>
<tr>
<th>Description</th>
<th>Mean</th>
<th>Std Dev</th>
<th>Population</th>
</tr>
</thead>
<tbody>
<tr>
<td>TIE1, Ch1</td>
<td>0.0000s</td>
<td>891.72ps</td>
<td>27845</td>
</tr>
<tr>
<td>RJdd1, Ch1</td>
<td>340.08ps</td>
<td>86.897ps</td>
<td>22</td>
</tr>
<tr>
<td>DJdd1, Ch1</td>
<td>1.2655ns</td>
<td>881.45ps</td>
<td>22</td>
</tr>
</tbody>
</table>

Figure 5.27 shows an example of the Q and clock on-chip outputs.

Randomness Tests

The Design 2 clock was compared to three ideal FO frequencies to obtain three sets of 100 bit-streams to be tested against the 5 bit-streams obtained on-chip. Due to time constraints only 5 sets of bit-streams was acquired from the on-chip Q for Design 2. Table 5.5 provides a summary of the results obtained for the randomness tests. Figure 5.28 shows one poker test distribution for each set of bit-streams tested.
## Table 5.5: Summary of randomness tests for chip output of Design 2.

<table>
<thead>
<tr>
<th>Test</th>
<th>Frequency</th>
<th>Block Frequency</th>
<th>Cumulative Sums (For.)</th>
<th>Cumulative Sums (Rev.)</th>
<th>Runs</th>
<th>Longest Run</th>
<th>FFT</th>
<th>Approx. Entropy</th>
<th>Serial 1</th>
<th>Serial 2</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>1GHz</td>
<td>5.5GHz</td>
<td>9GHz</td>
<td>1GHz</td>
<td>5.5GHz</td>
<td>9GHz</td>
<td>1GHz</td>
<td>5.5GHz</td>
<td>9GHz</td>
<td>1GHz</td>
</tr>
<tr>
<td>% Pass</td>
<td>35/100</td>
<td>90/100</td>
<td>39/100</td>
<td>37/100</td>
<td>26/100</td>
<td>72/100</td>
<td>99/100</td>
<td>45/100</td>
<td>94/100</td>
<td>98/100</td>
</tr>
<tr>
<td>Result?</td>
<td>FAIL</td>
<td>FAIL</td>
<td>FAIL</td>
<td>FAIL</td>
<td>FAIL</td>
<td>FAIL</td>
<td>FAIL</td>
<td>FAIL</td>
<td>FAIL</td>
<td>FAIL</td>
</tr>
<tr>
<td>% Pass</td>
<td>100/100</td>
<td>100/100</td>
<td>96/100</td>
<td>96/100</td>
<td>99/100</td>
<td>98/100</td>
<td>100/100</td>
<td>98/100</td>
<td>98/100</td>
<td>100/100</td>
</tr>
<tr>
<td>Result?</td>
<td>PASS</td>
<td>PASS</td>
<td>PASS</td>
<td>PASS</td>
<td>PASS</td>
<td>PASS</td>
<td>PASS</td>
<td>PASS</td>
<td>PASS</td>
<td>PASS</td>
</tr>
</tbody>
</table>
5.6.3 Summary

Both designs failed to produce random bit-streams at the FO frequency fabricated. However, when compared to an ideal FO with higher frequency both design showed randomness.

Design 2 showed a slightly more uniform distribution than Design 1 as shown in the poker test in Figure 5.23(d) and Figure 5.28(d). This was expected because of
the increased amount of jitter observed in Figure 5.25 over Figure 5.17.
Chapter 6

Conclusions

Three ring oscillator based TRNGs were designed using a noisy VCO to create randomness. Design 1 used a standard current-starved delay-cell as the RNG clock, and had the lowest timing jitter of all the designs created. Design 2 used a newly designed current-stealing, low-slewing delay-cell. The exploitation of multiple crossings and LPT resulted in improved jitter over the previous design, but not quite to the desired extent. The difficulty rose from setting the switching threshold of the subsequent stage to the low-slew phase level. This issue was alleviated by creating Design 3 a modification of Design 2. Design 3 involved inserting simple two-transistor inverters in between each current-stealing cell, allowing for easier control of the threshold. In addition, more noise was introduced through extra transistors on each current-stealing delay-cell. Design 3 provided exceptional timing jitter, 75ps, proving that multiple crossings and LPT were being utilized.

The outputs of each design were tested under a suite of tests outlined by the NIST. The results of the tests indicated that the first two designs were not sufficiently random. Only Design 3 provided adequate noise to obtain the required randomness.
Excessive amount of noise in the final design allows for further customization of the TRNG, as speed can be increased while still delivering acceptable randomness. This would improve the overall speed in which the seed from the TRNG is delivered.

Designs 1 and 2 were both fabricated onto a 0.13\( \mu m \) process chip and tested with an oscilloscope and Matlab. The results showed that both on-chip outputs with FO of 1GHz were not random. Design 2 was slightly more random than design 1. The SO waveform was extracted for both designs and used in conjunction with Matlab to test FOs with frequency of 5.5GHz and 9 GHz, resulting in random bit-streams from both designs.

In comparison to other oscillator based RNG research [2, 22] The speed achieved of 30-75MHz seems very reasonable. These designs were built with focus on the novel idea of utilizing last passage time for the increase in phase noise. The frequency was kept around the same value for each design so they could be compared with each other. Also power consumption was not considered for this work.

6.1 Future work

Design 3 was not prepared in time for fabrication and thus for direct comparison of results with Design 1 and 2. Theoretically, Design 3 should provide vast improvements in the jitter production, as simulations showed substantial increase in performance. Applying Design 3 on a chip would therefore be a worthwhile endeavour. The design would be similar in size to the original Design 2.
References


REFERENCES


