Circuit Design For Reliability
Circuit Design For Reliability
Gilson Wirth Editors
Circuit
Design for
Reliability
Circuit Design for Reliability
Ricardo Reis • Yu Cao • Gilson Wirth
Editors
123
Editors
Ricardo Reis Yu Cao
Instituto de Informática School of ECEE
Universidade Federal do Rio Arizona State University
Grande do Sul (UFRGS) Tempe, USA
Porto Alegre, Rio Grande do Sul, Brazil
Gilson Wirth
Electrical Engineering Department
Universidade Federal do Rio
Grande do Sul (UFRGS)
Porto Alegre
Rio Grande do Sul
Brazil
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Ricardo Reis, Yu Cao, and Gilson Wirth
2 Recent Trends in Bias Temperature Instability . . . . . . . . . . . . . . . . . . . . . . . . . 5
B. Kaczer, T. Grasser, J. Franco, M. Toledano-Luque,
J. Roussel, M. Cho, E. Simoen, and G. Groeseneken
3 Charge Trapping Phenomena in MOSFETS: From Noise
to Bias Temperature Instability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Gilson Wirth and Roberto da Silva
4 Atomistic Simulations on Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Dragica Vasileska and Nabil Ashraf
5 On-Chip Characterization of Statistical Device Degradation . . . . . . . . . 69
Takashi Sato and Hiromitsu Awano
6 Compact Modeling of BTI for Circuit Reliability Analysis . . . . . . . . . . . . 93
Ketul B. Sutaria, Jyothi B. Velamala, Athul Ramkumar,
and Yu Cao
7 Circuit Resilience Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Veit B. Kleeberger, Christian Weis, Ulf Schlichtmann,
and Norbert Wehn
8 Layout Aware Electromigration Analysis
of Power/Ground Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Di-an Li, Malgorzata Marek-Sadowska, and Sani R. Nassif
9 Power-Gating for Leakage Control and Beyond . . . . . . . . . . . . . . . . . . . . . . . . 175
Andrea Calimera, Alberto Macii, Enrico Macii,
and Massimo Poncino
v
vi Contents
10 Soft Error Rate and Fault Tolerance Techniques for FPGAs. . . . . . . . . . 207
Fernanda Kastensmidt and Ricardo Reis
11 Low Power Robust FinFET-Based SRAM Design in Scaled
Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Sumeet Kumar Gupta and Kaushik Roy
12 Variability-Aware Clock Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Matthew R. Guthaus and Gustavo Wilke
Chapter 1
Introduction
R. Reis • G. Wirth
Electrical Engineering Department, Universidade Federal do Rio Grande do Sul (UFRGS),
Porto Alegre, Rio Grande do Sul, Brazil
Y. Cao ()
School of ECEE, Arizona State University, Tempe, AZ, USA
e-mail: ycao@asu.edu
observations of BTI are first briefly reviewed. Afterwards it is shown how the
understanding of gate oxide defect properties can be used to explain BTI and its
variability, and possible technological solutions for both nFETs and pFETs are
discussed. The charge trapping phenomena that plays a dominant role in BTI also
dominates low-frequency noise. A modeling and circuit simulation approach that
focuses on operation conditions relevant for digital and analog design, including
large signal AC operation, is presented. The modeling and simulation approach aims
at helping circuit designers to cope with this reliability issue.
Next, the modeling and characterization is addressed. Any modeling effort
needs a solid base to be built on. Models are based on accurate measurement
methodologies, which contribute to understanding the mechanisms behind the
device degradation. In this book, authors survey characterization methods for
the purpose of device modeling that includes the effects of temporal parameter
degradation, such as BTI and noise. Particular focus is placed on the approaches that
utilize on-chip circuits to enhance the accuracy and fidelity of the measurements.
Modeling must focus on real circuit operation conditions, such as Dynamic Voltage
Scaling (DVS), or biasing found in mixed signal designs where the stress waveform
is much more random. Modeling solutions suitable to allow aging simulation under
all dynamic stress conditions are presented.
Variability in electrical parameters, such as variability in threshold voltage, is
also known to be a major reliability concern. Among the sources of variability
is random dopant fluctuation (RDF). We discuss how RDF can induce significant
variation in saturation (on) current and transconductance degradation—two key
metrics for benchmark performance of digital and analog integrated circuits.
An analysis of ongoing scaling related trends of integrated circuits is also
presented. For this purpose, authors discuss major trends observed in process
technology, as well as in memory and logic designs. While newer technology styles,
such as FinFETs provide necessary advances to enable further technology scaling,
we can observe that there exists also a trend for reduced resilience with ongoing scal-
ing. Our hope is that with the roadmap presented in this book, which concentrates on
FinFET-based designs, we are able to sensitize the research community to present
and near future problems associated with this technology. Additionally, different
architectures for SRAM memories are analyzed, to demonstrate the different design
choices available and tradeoffs that have to be made.
Finally, the last chapters present design techniques for reliability. FinFETs have
emerged as alternatives to conventional bulk MOSFETs in scaled technologies
due to superior gate control of the channel, lower short channel effects and
higher scalability. However, width quantization in FinFETs constrains the design
space of FinFET-based circuits, especially SRAMs in which transistor sizing is
critical for the circuit robustness. The adverse effects of width quantization can
be mitigated by appropriate device-circuit co-design of FinFET-based memories.
Some of such techniques are described, with an emphasis on the device-circuit
interactions associated with each methodology. The impact of different technology
1 Introduction 3
options in FinFETs like gate-underlap, fin orientation, fin height, gate workfunction
and independent control of the gates on the stability, power and performance of 6T
SRAMs is discussed.
Electromigration is an aging effect that affects metal wires (interconnections). In
this book, authors briefly introduce physical foundations of electromigration (EM).
We discuss physical parameters affecting EM wire lifetime and we introduce some
background related to the existing EM physical simulators. In the work presented
in this book for EM physical simulation, authors adopt the atomic concentration
balance-based model. We discuss the simulation setup and results. We present
VEMA, a variation-aware electromigration (EM) analysis tool for power grid wires.
The tool considers process variations caused by the chemical–mechanical polishing
(CMP) and edge placement error (EPE). VEMA is a full-chip EM analysis tool.
It performs EM lifetime calculation, and analyzes process variation effects on EM
reliability and reports variation tolerances of EM-sensitive power grid wires.
Integrated circuits subject to ionizing radiation are sensitive to transient faults
caused by the interaction of ionizing particles with the semiconductor. At ground
level, the main source of radiation are neutrons that interact with the material
generating secondary particles that can ionize the silicon, provoking transient upset
in circuits fabricated in nanometer technology. The interaction of the ionizing
radiation with the transistor may provoke transient and permanent effects. Different
fault tolerance techniques can be applied to FPGAs according to their type of
configuration technology, architecture and target operating environment. A chapter
is dedicated to present a set of fault mitigation techniques for SRAM, FLASH and
ANTIFUSE-based FPGAs. A test methodology to characterize those FPGA under
radiation is also presented. Results from neutron-induced faults are presented and
discussed.
The relevance of power consumption and leakage currents is also increasing.
Although several Design-for-Low-Power and Design-for-Variability options are
already available in modern EDA suites, the contrasting nature of the two metrics
makes their integration extremely challenging. Most of the approaches used to
compensate and/or mitigate circuit variability (e.g., Dynamic Voltage Scaling and
Adaptive Body Biasing) are, in fact, intrinsically power inefficient, as they exploit
the concept of redundancy, which is known to originate power overhead. In this
book, we introduce possible solutions for concurrent leakage minimization and
variability compensation. More specifically, authors propose Power-Gating as a
mean for simultaneously controlling static power consumption and mitigating the
effects induced by two of the most insidious sources of variability, namely, electrical
parameter variations and Aging due to BTI.
High-performance clock network design has been a challenge for many years
due to the drastically increasing effect of process variability. In addition, tight power
budgets have lowered supply voltage levels, which make designs more sensitive to
noise. Together, variability and noise present a colossal challenge to clock designers
in order to meet timing, yield, and power simultaneously. This chapter discusses the
different strategies that designers use to ameliorate variability and noise problems
in clock network design.
4 R. Reis et al.
The project of this book was born during the successful IEEE Circuits and
Systems Society Summer School on Physical Design of Reliable Circuits that was
held in Porto Alegre, Brazil, from January 12 to 15, 2010. Some chapters of the
book are based on the talks realized during the school and other well-known invited
authors wrote other chapters.
It is our hope that this book will provide the vitally important physical under-
standing and design techniques for state of the art and future technologies, ranging
from technology modeling, fault detection and analysis, circuit hardening, and
reliability management in different circuit units.
Chapter 2
Recent Trends in Bias Temperature Instability
Abstract The paradigm shifts occurring in the past few years in our understanding
of BTI are reviewed. Among the most significant ones is the shift from perceiving
NBTI in terms of the Reaction-Diffusion model to analyzing BTI with the tools
originally developed for describing low-frequency noise. This includes the interpre-
tation of the time, temperature, voltage, and duty cycle dependences of BTI. It is
further demonstrated that a wealth of information about defect properties can be
obtained from deeply-scaled devices, and that this information can allow projection
of variability issues of future deeply downscaled CMOS devices. The chapter is
concluded by showing the most promising technological solutions to alleviate both
PBTI and NBTI.
1 Introduction
Among the critical reliability issues facing present and future deeply downscaled
CMOS devices is the so-called Bias Temperature Instability (BTI). While BTI in
nFET devices was generally ascribed to charge trapping in the high-k portion of
the gate oxide, the interpretation of BTI in pFET devices still generates controversy
[1–7]. This phenomenon in pFET devices has been previously described by
diffusion of hydrogen from and back into substrate/gate oxide interface states [8].
The so-called Reaction–Diffusion model based on this assumption is still popular,
especially in the design community [9], despite being inconsistent with some crucial
observations [10].
In this chapter, the elementary definitions and experimental observations of BTI
are first briefly reviewed. One of the most intriguing properties of BTI—the lack
BTI is a consequence of charging of defect states in the gate oxide and at its
interface [2]. The defects could be both pre-existing or generated during device
operation. The trapped charge results in a shift of the device parameters, such as its
threshold voltage Vth , channel mobility, transconductance, and subthreshold slope,
and generally a decrease of the FET’s drive current. The name is derived from the
phenomenon being strongly accelerated by temperature T and gate bias VG . BTI
in n-channel FET devices, which are typically biased in circuits at positive VG ,
is referred to as Positive BTI (PBTI), while negative BTI (NBTI) takes place in
p-channel FETs. Constant VG stress bias is often referred to as static, or “DC” BTI,
while periodically interrupted VG stress is called “AC”, or dynamic BTI.
3 Static BTI
Figure 2.1 illustrates the typical gradual shift of pFET threshold voltage Vth during
accelerated stress at elevated T [5]. The stress data are typically measured at several
VG ’s and Vth is extrapolated to 10 years at the circuit operating voltage VDD (or
VDD C 10 %). The extrapolated Vth must be below a given value (typically 30 or
50 mV) for the technology to qualify.
This simple extrapolation procedure is, however, complicated by Vth decreas-
ing immediately after the stress bias is removed, as illustrated in Fig. 2.1 [15]. As
will be discussed henceforth, this recovery, or relaxation, component R typically
proceeds simultaneously on many time scales, making it difficult to determine
its beginning or end and thus separating it from the final non-recoverable, or
permanent, component P [2, 6]. This Vth relaxation is thus a crucial problem for
BTI measurement, interpretation, and extrapolation.
2 Recent Trends in Bias Temperature Instability 7
-ΔVth (mV)
effect is seen (note: Vth 0 V stress relaxation
15
for this device). Inset
illustrates the biases applied 0V
at FET terminals during the 10
BTI measurement T = 125ⴗC
VG 0V VG,stress = -2 V
5
VG,relax = 0 V
-0.05V
0
0 500 1000 1500 2000
time (s)
50
40
ΔVth (mV)
30
~2x
20
10
T = 125 C tstress = 1000 s VS = VD = VB = VCC = 2 V
0 0
DC 10 102 104 106 108 1010
frequency (Hz)
Fig. 2.2 The Vth shifts due to 50 % AC unipolar NBTI stress in pFETs are seen independent of
frequency in the entire frequency range of 1 Hz–2 GHz. The Vth shift of the corresponding DC
NBTI stress obtained on an identical device is shown for comparison. Inset: Micrograph of the
on-chip circuit for the DC and AC BTI measurements consisting of a ring oscillator, a frequency
divider, a buffer, a pass-gate-based multiplexer, and the device under test [17]
4 Dynamic BTI
In many CMOS applications, such as logic, the majority of the FETs are constantly
switched and thus exposed to dynamic stress [16]. Figure 2.2 documents that
NBTI is present at frequencies up to the GHz range, i.e., there does not appear
to be any “cut-off” time constant of the degradation mechanism above 1 ns [17].
Furthermore, the AC bias signal reduces BTI with respect to the DC stress. This
provides some additional reliability margin, which can be factored in during the
application design phase [9].
8 B. Kaczer et al.
0
0 20 40 60 80 100
Duty Factor
ΔVth (mV)
1/trelax for 7 decades is a
40 10-3
signature of states with
discharging time constants 30 10-4
covering as many decades.
(b) Gate-referred noise 10-5
20
spectra measured on the same 10-6
(unstressed) devices show 10
clear 1/f dependence, 10-7
routinely explained by a 0 10-8
superposition of states with 10-4 10-3 10-2 10-1 100 101 102 103 104 105
widely distributed time scales trelax (s)
b 10-9
SiON
10-10 T = 25⬚C
unstressed
10-11
SVG (V2 Hz-1)
10-12 VG = -0.6 V
10-13
VG = -0.3 V
10-14
10-15
10-16 0
10 101 102 103 104 105 106
frequency (Hz)
off” times are typically observed [12]. For the sake of simplicity it is therefore
assumed here that the time constants are log-uniformly distributed from times
much shorter than the switching time of a pFET to very long, corresponding to
the lifetime of a CMOS application. Such states with widely distributed time scales
are then represented by “RC” elements in Fig. 2.5 with the total FET Vth being
proportional to the sum of voltages (“occupancies”) on all capacitors. For the sake
of simplicity, it is assumed that all RC elements have the same weight and can
be partially occupied, which emulates the behavior of a large-area device. Most
properties of the recoverable component can be reproduced when the ohmic resistors
in Fig. 2.5 are replaced with a non-linear component (simulated by two diodes with
different parameters, see Fig. 2.5), which emulates different charging (i.e., capture)
and discharging (i.e., emission) time constants of each defect [21]. Such a circuit
correctly reproduces DF (Fig. 2.6, cf. Fig. 2.3) and also the log-like relaxation
(Fig. 2.4a) and the log-like initial phase of stress (not shown) [19].
10 B. Kaczer et al.
a b
… …
Fig. 2.5 (a) An equivalent circuit with exponentially increasing capacitances used to emulate
defect states with widely distributed time scales, such as those active in low-frequency noise.
(b) The same circuit modified to account for charging (i.e., capture) and discharging (i.e., emission)
time constants being voltage dependent, represented by asymmetric diodes. The sum of voltages
on capacitors is assumed to be proportional to FET Vth
dependence of R is also
qualitatively well reproduced 6
trelax = 10-3, 10-2, … , 104a.u.
by the equivalent circuit in
Fig. 2.5, as is the decrease 5
with increasing relaxation
4 relaxation
time (cf. Fig. 2.3, which,
however, shows the sum of R 3
and P)
2
0
0 20 40 60 80 100
Duty Factor
Figure 2.7 shows two typical Vth relaxation transients following positive VG stress
on a single 70 90 nm2 nMOSFET (i.e., corresponding to PBTI). Conversely to the
continuous relaxation curves obtained on large devices, a quantized Vth transient
is observed in the deeply-scaled devices. In such devices, the relaxation is observed
to proceed in discrete voltage steps, with each step corresponding to discharging of
a single oxide defect [12, 22, 23]. Upon repeated perturbation, each defect shows up
in the relaxation trace with a characteristic “fingerprint” consisting of its discharge,
or emission time, and its voltage step [14].
Figure 2.8 shows the two-dimensional histogram of the heights and the emission
times of the steps when the experiment was repeated 70 times at the same stressing
and relaxing condition as in Fig. 2.7 [14]. In Fig. 2.8, four clusters are clearly formed
that correspond to four active defects in the time window of the experimental setup.
The emission times of each defect are stochastically distributed and follow an
exponential distribution. This allows us to determine the average emission time e .
The capture time of each trap can be obtained by varying the stress (i.e., charging)
time from 240 ms down to 2 ms. The intensity of the cluster decreases with reducing
2 Recent Trends in Bias Temperature Instability 11
stress time when the characteristic capture time is in the range of the stress time. The
fit of the intensity to Pc D 1 exp(tstress / c ) lets us calculate the average capture
time c . This technique is known as Time Dependent Defect Spectroscopy (TDDS)
[14].
In Fig. 2.8, an identical experiment was repeated at 50 ı C on the same device.
Note the large horizontal shift of the clusters to shorter emission times with only a
25 ı C temperature increase. The Arrhenius plots of the emission and capture times
obtained at T from 10 to 50 ı C (not shown) provide activation energies of 0.48 eV
for emission and 0.25 eV for capture. Similarly thermally activated capture and
emission times are also observed in both nFET and pFET (i.e., corresponding to
NBTI) with conventional SiO2 gate oxide [14, 23, 24]. One can therefore conclude
for all these cases that both emission and capture in both electron and hole gate
oxide traps are without a doubt thermally activated processes. This experimental
fact is incompatible with direct elastic tunneling theories widely used in different
oxide trap characterization techniques and calculations. Consequently, a new model
that takes into account this thermal dependence has to be considered.
Fig. 2.9 Simulated capture and emission time constants (lines) compared with to the experimental
TDDS values obtained on SiO2 pFETs (symbols) during NBTI stressing at 125 and 175 ı C and
varying VG . The experimental occupation probability of the charged state fp is also indicated. The
configuration coordinate diagram is shown in the inset (dashed line: neutral defect state; solid line:
charged state potential)
Fig. 2.10 Simulated RTN, stress, and recovery behavior of a nano-scale device using a stochastic
solution algorithm of the proposed model. (a) At the threshold voltage (Vg1), the RTN is dominated
by defect #5 with the occasional contribution from defect #3. Defects #1, #2, and #4 remain
positively charged within the ‘simulation/experimental’ window. (b) During stress (Vg2), the
capture times are dramatically reduced by the higher (more negative) gate voltage and the defects
#3 and #5 become predominantly positively charged (£c £e). Defects #1, #2, and #4 start
producing RTN. (c) During recovery (back at Vg1), trapped charge is subsequently lost and the
dynamic equilibrium behavior is gradually restored
It has been previously argued that the phenomenon called NBTI relaxation in
pFET devices is in fact just a different facet of the well-known low-frequency noise
in these devices. While the low-frequency noise corresponds to the channel/gate
dielectrics system being in the state of dynamic equilibrium, NBTI relaxation
corresponds to the perturbed system returning to this equilibrium [28]. Figure 2.10
14 B. Kaczer et al.
4
99.99
3 99.9
99
2 97
90
1
percentile
probit(FN)
70
0 N = 1, 2, .., 20 50
30
-1
10
-2 N=6 η ≈ 4 mV 3
1
N=9
-3 0.1
N = 14
0.01
0 40 80 120 160
-ΔVth (mV)
Fig. 2.12 Equation (2.1) in a probit plot rescaled to fit experimental distributions from Fig. 2.10
of Ref. [22], with the corresponding values of N and readily extracted
consist of devices with a different number n of oxide defects in each device, with n
being Poisson distributed [12, 22, 28]. The total Vth distribution can be therefore
obtained as
X1
e N N n n .n; Vth =n/
FN .Vth ; / D 1 ; (2.1)
nD0
nŠ nŠ
where N is the mean number of defects in the FET gate oxide and is related to the
oxide trap (surface) density Not as N D W L Not . The CDF is plotted in Fig. 2.12
for several values of N. For comparison, measured total Vth distributions for three
different stress times from Ref. [22] are excellently fitted by the derived analytical
description.
The advantage of describing the total Vth distribution in terms of Eq. (2.1) is its
relative simplicity and tangibility of the variables. The analytical description allows,
among other things, to calculate NBTI threshold voltage shifts in an unlimited
population of devices, a feat practically impossible through device simulations.
10 Technological Solutions
Once the underlying BTI mechanisms are understood, the defect properties can be
modified to beneficial ends. Below, two possible technological solutions for both
PBTI and NBTI are discussed.
16 B. Kaczer et al.
100
10-1
ΔVth (V) ref
10-2 La
ref T = 125⬚C
10-3 trelax = 10-3s
lines: tstress = 10 years
La
symbols: tstress = 6x103s
10-4
1 2 5 10
Eox (MV cm-1)
Fig. 2.13 A significant reduction of PBTI threshold voltage shift is observed in planar nFETs
with La passivation (“La”) over the reference stack (“ref”) without passivation. Simplified power-
law projection to 10 years shows passivated stack having sufficient reliability (Vth < 30 mV) at
5 MV/cm operating field
Reduction of gate stack EOT, which is one of the most efficient ways to improve
FET performance, enhances NBTI due to increased oxide electric field. As a
consequence, 10 year lifetime can be guaranteed for sub-1 nm EOT Si pFETs only
at gate overdrive voltages far below the expected operating voltages (Fig. 2.14).
Another way to improve FET performance is the use of high-mobility substrates,
such as buried-channel SiGe [34]. Because of the valence band offset between the
2 Recent Trends in Bias Temperature Instability 17
Fig. 2.14 Plot of the operating overdrive voltage jVG Vth j for 10 year lifetime assuming a 30 mV
threshold voltage shift criterion vs. the inversion capacitance equivalent thickness Tinv , for Si
channel devices with different processing used as a reference, and for SiGe pFETs. For low Tinv ,
Si devices jVG Vth j is below the expected operating voltage. In contrast to that, optimized SiGe
devices show improved lifetime. Inset: gate-stack band diagram in inversion. Si cap acts as a barrier
EV for holes
SiGe and the Si cap (see inset of Fig. 2.14), inversion channel holes are confined in
the SiGe layer, which therefore acts as a quantum well (QW) for holes. The Si cap
lowers the inversion capacitance as compared to the accumulation capacitance. For
these devices it is therefore necessary to report the capacitance-equivalent thickness
in inversion (Tinv , evaluated at VG D Vth 0.6 V) which will be affected by the
thickness of the Si cap [35].
As can be seen from Fig. 2.14, SiGe-based device gate stacks significantly
increase operating gate overdrive while still guaranteeing 10 year device lifetime
and at the moment seem to be the only solution to the NBTI issue for sub-1 nm
EOT devices. It has been recently observed that both increasing the Ge content
in the channel as well as increasing the SiGe QW thickness reduces NBTI. Most
intriguingly, a reduction of Si cap thickness also diminishes NBTI [35]. The most
likely hypothesis explaining all three trends appears to be the energetic decoupling
of the buried channel and the gate oxide defects [36].
Conclusions
In this chapter some of the shifts occurring in the past few years in our
understanding of BTI were reviewed. Among the most significant ones is the
shift from perceiving NBTI in terms of the Reaction–Diffusion model to ana-
lyzing BTI with the tools originally developed for describing low-frequency
(continued)
18 B. Kaczer et al.
noise. This includes the interpretation of the time, temperature, voltage, and
duty cycle dependences of BTI. It was further demonstrated that a wealth
of information about defect properties can be obtained from deeply-scaled
devices, and that this information can allow interpretation of variability issues
of future deeply downscaled CMOS devices. This theme was complemented
by showing the most promising technological solutions to alleviate both PBTI
and NBTI.
References
1 Introduction
The major goal of this chapter is to compile and critically discuss recent work per-
formed by the authors on modeling of charge trapping and de-trapping phenomena
in nanometer scale CMOS devices [1–6, 16–18]. The role of charge trapping and
de-trapping in both low-frequency noise and bias temperature instability (BTI) is
discussed.
G. Wirth ()
Electrical Engineering Department, Universidade Federal do Rio Grande do Sul (UFRGS),
Porto Alegre, Rio Grande do Sul, Brazil
e-mail: wirth@ece.ufrgs.br
R. da Silva
Physics Institute, UFRGS, Porto Alegre, Brazil
Fig. 3.2 If a transistor is abruptly turned on, the Fermi level abruptly changes and trap occupation
probability follows the Fermi level (EF ) change. Charge trapping/de-trapping is not an instanta-
neous event. It is governed by characteristic time constants, and the trap occupation reflects the
new Fermi level after an elapsed time. This figure depicts the situation for electron trapping in the
channel of a NMOS transistor
Charge trapping and de-trapping at localized states (charge traps) at the interface
or in the gate dielectric is a significant reliability issue for CMOS applications.
It is known to be a source of low-frequency noise, besides playing a role in Bias
Temperature Instability (BTI).
Traps that contribute to noise are the ones that keep switching their state between
occupied and empty, as depicted in Fig. 3.1. These are the traps with occupation
probability close to 50 %, which means that their capture and emission times are
similar.
Traps that contribute to BTI are the ones that have a high probability to stay
occupied after a charge trapping event, as depicted in Fig. 3.2. These are the traps
with occupation probability close to 100 %, which means that their capture time is
much shorter than the emission time.
Figure 3.3 depicts the mechanisms leading to BTI and low-frequency noise.
If a constant bias is applied to a Metal–Insulator–Semiconductor system, only
the localized states (traps) with energy close to the Fermi level show significant
activity, i.e., change their states between occupied and empty, by capture and
subsequent emission of charge carriers from the channel region. This may be
considered a steady state condition, where the occupation probability of a trap is
time independent, i.e., a constant that does not change with time. This mechanism
is known to originate low-frequency noise.
However, if the bias point changes abruptly, as is common in e.g. digital CMOS
applications, the trap occupation probability abruptly changes. For instance, if the
3 Charge Trapping Phenomena in MOSFETS: From Noise to Bias. . . 23
Fig. 3.3 (Left hand side) Traps that contribute to NBTI are the ones that stay occupied after
an capture event occurs. This leads to a degradation of transistor on current over time. (Right
hand side) Traps that contribute to noise are the ones that keep switching their state over time,
exchanging charge carriers with the channel inversion layer. This leads to Random Telegraph Noise
in the device current
gate voltage abruptly changes the Fermi level in such a way that trap occupation
probability is increased, the rate at which charge carries are captured abruptly
becomes larger than the rate at which carriers are emitted, and the number of trapped
charge increases over time. Traps change their occupation state according to their
characteristic time constant, meaning that the number of trapped charge does not
instantaneously reflect the new occupation probability. The faster traps (with shorter
capture time constants) become filled first, while the slowest traps take longer to
become filled.
Each trap that becomes occupied degrades the channel conductivity, decreasing
the device current. For nanometer scale devices, this current decrease is seen to
occur in discrete steps, each step being related to the capture of a single channel
carrier (in large area device it becomes hard to clearly identify the discrete steps).
Since the dynamics of this occupation depends on the bias point and temperature, it
may lead to bias temperature instability (BTI).
The first sections of this chapter cover the modeling of low-frequency noise,
while the last section covers the modeling of BTI.
24 G. Wirth and R. da Silva
The origin of RTN noise is the alternate capture and emission of charge carriers at
discrete trap levels near the Si–SiO2 interface. Figure 3.4 depicts the cross section of
an n-channel MOSFET through the location of the interface trap. The influence of
the traps on the electrical current flowing through the channel is twofold. On the one
hand, the occupation of a trap changes the number of free carriers in the inversion
layer. On the other hand, a charged trap state has an influence on the local mobility
near to its position due to Coulomb scattering. If the MOSFET biasing is kept
constant, a stationary RTN is observed at the terminals of the device as a discrete
fluctuation in electrical current, ıId being the amplitude of the current fluctuation,
as shown in the inset of Fig. 3.5. The average high current time corresponds to the
electron capture time constant ( c ), and the average low current time corresponds to
the emission time constant ( e ).
The power spectrum of a RTN fluctuation can be evaluated by calculating the
auto-covariance of the signal, and then applying the Wiener–Khinchin formula [12].
The power spectrum is then the Fourier transform of the auto-covariance, and is a
Lorentzian, as given by Eq. (3.1) and depicted in Fig. 3.5.
ı2 ˇ 1 1
S .!/ D (3.1)
.1 C ˇ/2 !0 1 C .!=!0 /2
where ı is the RTN amplitude, ¨0 is the angular corner frequency and “ is equal
to £c /£e . The amplitude of the RTN fluctuation induced by a trap depends on trap
position along the channel. It may be exacerbated by factors such as random dopant
fluctuations (RDF). For nanometer scale channel lengths (L), traps near the source
end of the channel can cause significant mobility fluctuations and surface potential
Trap
Gate Oxide
Inversion Layer
Drain Current ID
Fig. 3.4 Schematic cross section of the inversion layer of a MOS transistor through the location
of an interface trap. If the trap is electrically charged the inversion layer is disturbed by the trap,
affecting the drain current ID . The trap not only affects the number of free carriers in the inversion
layer but is also a source of electrical charge carrier scattering
26 G. Wirth and R. da Silva
S(log)
Current
τc
δΙd
τe
Time
f(log)
Fig. 3.5 Time and frequency domain representation of a stationary random telegraph noise (RTN).
In frequency domain, the power spectrum of a RTN is a Lorentzian. In time domain discrete
fluctuations are observed in the drain current, where c is the average time in the high current
state, which corresponds to the state where the trap is electrically neutral (empty). e is the average
time in the low current state, which corresponds to the state where the trap is electrically charged.
ıId is the amplitude of the current fluctuation
fluctuations, hindering electrical current flow and enhancing RTN amplitude (ı).
A trap positioned near the source junction may be able to create a repulsive Coulomb
blockade well surrounding the trap, hindering trap injection from the source into
the channel. Thus, a significant mobility fluctuation will be added to carrier number
fluctuation for carrier electrons trapped near the source side [17]. Another parameter
that may impact RTN amplitude is trap position into the dielectric.
Equation (3.1) assumes that the capture and emission time constants ( c and
e ) are time independent, constant values. This is true for constant (time invariant)
biasing. However, under cyclo-stationary excitation the applied bias voltage is a
periodic function of time, and c (t) and e (t) become periodic functions of time.
Cyclo-stationary excitation is depicted in Fig. 3.6.
In order to derive the power spectrum under cyclo-stationary excitation, we did
follow the methodology originally proposed by Machlup [12] for stationary RTN.
A RTN is considered to be a purely random signal, which may be in one of two
states, called 1 and 0. If the signal is in state 1, the probability of making a transition
to 0 in a short time dt is assumed to be dt/ c (t). If the signal is in state 0, the
probability of making a transition to 1 is assumed to be dt/ e (t). In this form, the
state 1 is related to the empty trap, i.e., high current state in Fig. 3.5, while the state
0 is related to the occupied trap, i.e., low current state.
3 Charge Trapping Phenomena in MOSFETS: From Noise to Bias. . . 27
where P11 (s) is the probability of an even number of transitions in time s, given we
start in state 1. P(x(t) D 1), the probability of being in state 1 at time t, is given by
Z T
1 1
T
dt < £e1.t/ >
0 e .t/
£
P .x .t/ D 1/ D Z T Z T D (3.3)
1 1 1 1 < £e1.t/ > C < £c1.t/ >
T
d tC T dt
0 £e .t/ 0 £c .t/
The case of interest for most practical applications is for the noise at frequencies
below the frequency of the cyclo-stationary excitation signal.
If the probability of a trap to switch state during one period T of the cyclo-
stationary excitation signal is very small, a simplification may be done in the
calculation of the autocorrelation. This case corresponds to the limit where c (t)
and e (t) are much larger than the period T, leading to small transition probabilities
T/ c (t) and T/ e (t).
In this case we can write, without loss of generality, that s D nT, where n is a
positive integer (1, 2, 3, : : : ). In this situation we have
Z s n1 Z .iC1/T
X Xn1 Z .iC1/T
1 1 1 1
C dy D d yC dy D
0 £e .y/ £c .y/ iD0 iT
£e .y/ £c .y/
iD0 iT
(3.5)
leading to
" #
< £ 1.t/ > T< £ 1.t/ > -s < £ 1.t/ >C< £ 1.t/ >
A .s/ D e
< £ 1.t/ >C< £ 1.t/ >
1 e e e c C
e c T < 1.t/ >C< 1.t/ >
£e £c
e 1
(3.6)
< £ 1.t/ > T< £ 1.t/ >
C < £ 1.t/ >C< £ 1.t/ >
e e
e c T < 1.t/ >C< 1.t/ >
£e £c
e 1
This means that, if c (t) and e (t) are much larger than the period T of the excitation
signal, the values of 1/£c (t) and 1/£e (t) in the integrals of (3.4) are equivalent to their
time averages <1/£c (t)> and <1/£e (t)>.
The power spectrum Si (!), due to a single trap (the i-th trap), is the calculated as
the Fourier transform of the autocorrelation, leading to
Z 1
1
Si .!/ D A.s/e i!s ds (3.8)
2 1
3 Charge Trapping Phenomena in MOSFETS: From Noise to Bias. . . 29
Noise PSD
10-1
10-2
10-2 100 102
βeq
ıi2 ˇeq 1 1
Si .!/ D (3.9)
1 C ˇeq 2 !i 1 C .!=!i /2
Here •i determines the trap’s voltage amplitude and ¨ i the angular frequency, and
1 1
¨i D < >C< > (3.11)
£c .t/ £e .t/
This takes us to the conclusion that making a RTN signal cyclo-stationary leads to
a Lorentzian spectrum with corner frequency equal to the sum of the inverse time
average values of the capture and emission times. Please note that for stationary
RTN the corner frequency is equal to the sum of the inverse values of the constant
capture and emission times. Please refer to equation 9 in [12].
The result for this limit is valid for any kind of periodic excitation. This is the
limit studied in [11, 13]. However, for this limit, we obtain the same result in a much
simpler derivation than in [11, 13], and without making any further assumption or
simplification. The single assumption is that the transition probabilities dt/£e (t) and
dt/£c (t) are much smaller than the excitation period T.
Equations (3.11) and (3.9) are a generalization of the Machlup formulation for
cyclo-stationary RTN with excitation frequency higher than the noise frequency.
30 G. Wirth and R. da Silva
The Machlup equations for the auto-correlation and power spectrum are recovered
if we consider <1/£e (t)> D 1/£e and <1/£c (t)> D 1/£c , i.e., constant, not time
dependent values.
The noise behavior of a device results from the combined effect of all traps found in
the device. The noise power spectrum may then be written as the summation of the
contribution of each one of the Ntr traps found in the device. The observation that
traps are Poisson distributed results in an average noise spectral density given by
1
X X
Nt r
e N NtNr
<S > D <Si > D N <Si > (3.12)
Nt r D0 iD1
Nt r Š
Here <Si > is the average noise contribution of a trap, Ntr is actual number of traps
in a particular device, and N is average number of traps in an ensemble of devices.
As depicted in Fig. 3.8, if the corner frequencies of the Lorentzians correspond-
ing to power spectral density (PSD) of different traps are equally spaced on a log
scale, the summation of the power spectrum due to all traps leads to 1/f noise. In
time domain, this corresponds to the situation where the charge trap characteristic
time constants are uniformly distributed on a log scale.
The equations above are valid for any waveform of the periodic excitation signal.
In order to explore the noise behavior in detail and allow comparison to experimental
results, a case of particular interest for practical applications will be studied. It is
square wave excitation.
Square wave excitation is chosen as case study, because of its interest in practical
applications and because of the availability of experimental data. Comparison of
model results to relevant experimental data is performed in the last section of this
work.
Under square wave excitation, the bias voltage abruptly alternates between two
states, called on and off. Please note that the state names on and off do not imply that
the device has necessarily to be periodically turned on and off. The names refer to
two distinct states, with different gate bias. With periodically changing gate bias, the
Fermi level becomes a periodic function of time EF (t). This implies that the Fermi
level alternates between two levels: Eon being the Fermi level during the on state
and Eoff being the Fermi level during the off state. The duty cycle ˛ is the fraction
3 Charge Trapping Phenomena in MOSFETS: From Noise to Bias. . . 31
102
101
100
10−1
10−2
S(f)
10−3
10−4
10−5
10−6
10−7
10−2 10−1 100 101 102 103 104 105 106
f(Hz)
Fig. 3.8 If the corner frequencies of the Lorentzians (red lines) corresponding to different traps
are equally spaced on a log scale, the summation of the power spectrum due to all traps leads to
1/f noise (blue line)
of the period T in which the device is in the on state. The capture and emission time
constants of a trap are affected by the Fermi level.
For square wave cyclo-stationary excitation with duty cycle ’, the time averaged
capture and emission time constants may be written as
<1=c > D ˛=c;on C .1 ˛/ =c;off (3.13)
<1=e > D ˛=e;on C .1 ˛/ =e;off (3.14)
ˇeq D Eon ; Eoff ; ˛ e 2Et =kB T (3.15)
with
where W is the channel width and L the channel length of the transistor, and Ndec is
the trap density per unit area and frequency decade. Please see (3.1) for a detailed
explanation of these parameters. This equation again looks similar to the DC noise
behavior (see [1]). In case the cyclo-stationary period T is short compared to the
trap’s time constants their time averages and the trap numbers close to the two
Fermi levels determine the noise level. At frequencies significantly higher than the
excitation frequency the noise is given by DC theory.
Equation (3.17) above clearly relates the noise reduction to the distribution of
traps over energy. From the above equation, it follows that if the distribution of traps
in energy over the bandgap is uniform, <S> is expected to remain approximately
constant under cyclo-stationary excitation. In this case, the peak of the noise
contribution of a trap always probes the same trap density, since the trap density
around to the Fermi level is always the same. Note that for the peak of the noise
contribution of a trap occurs for ET D EF . If the trap density is uniform over the
bandgap, g(Et ) corresponding to the peak of “eq is always the same.
Monte Carlo (MC) simulations are performed in time domain. The RTN of each
trap is evaluated, and then the power spectral density is calculated. The Monte Carlo
simulation show very good agreement to the analytical results, as well as to our
experimental results, as shown in the last section of this work.
In Fig. 3.9 the results of Eq. 3.17 show noise reduction as a function of the Fermi
levels in the ‘on’ and ‘off’ state at frequencies lower than the excitation period.
0.02
S
0.015
[a.u]
0.01
0.005
1
0.8 Eoff=f(VGS,off)
0.6
0.4 1
Eon=f(VGS,on)
Fig. 3.9 Noise reduction as a function of the Fermi level in the ‘on’ and ‘off’ states. The noise
power S is evaluated according to Eq. (3.17), considering the U-shaped trap density given by
Eq. (3.18). The same parameter a D 11 is used in all figures and evaluations performed in this
work. Note that the noise reduction is larger if the biasing levels are symmetrical in relation to the
center of the U-shaped trap density
3 Charge Trapping Phenomena in MOSFETS: From Noise to Bias. . . 33
It can be seen that a higher noise reduction can be expected if the biasing levels are
symmetrical in relation to the center of the U-shaped trap density. In this case, the
traps that contribute most to the noise power are the ones close to the center of the
bandgap, where the density is lowest.
In good agreement to the works in [11, 13], the trap energy distribution g(Et )
is key for explaining experimentally observed findings. A parabolic U-shape trap
density function is assumed here:
Since the microscopic approach maintains the statistically relevant parameters, the
model proposed here also allows the modeling of the statistically relevant param-
eters. As derived in [2], the normalized standard deviation of noise performance
under constant biasing is
2.5
2
g(ET) [a.u.]
1.5
r
¢np 2 1 <A4 > b
D p (3.20)
<npBW > Ndec WL <A2 >2 fH c
fL
Here fH and fL are the lower and upper boundaries of the bandwidth of interest in
a given circuit design, respectively, and b and c are constants, with b D 0.74 and
c D 0.05 [2].
A similar formulation can also be derived for cyclo-stationary operation. The
standard deviation of noise power is given by:
p
S D <S 2 > <S >2 (3.21)
Hence, in addition to <S>2 based on Eq. (3.17), it is necessary to evaluate <S2 >.
Again following a similar approach to [12], the resulting normalized standard
deviation under square wave excitation is given by:
Z 4Et =kB T
! 12
Ec 2
q e
<ı 4 > 4 g .Et / dEt
S <ı 2 >2 Ev 1 C e 2Et =kB T
Dp Z Ec
(3.22)
<S > Ndec W L e 2Et =kB T
2 g .Et / dEt
Ev 1 C e 2Et =kB T
Figure 3.11 shows the normalized standard deviation as a function of the Fermi
levels in the ‘on’ and ‘off’ state for the U-shaped trap density given by (3.18). The
normalized standard deviation becomes higher when the traps that contribute most
to noise are the ones close to the center of the band gap. In this case the normalized
standard deviation increases even though the average noise power decreases.
relative error of S
10
8
6
4 0.4
0.6
1 0.8 Eoff=f(VGS,off)
0.8 0.6 1
0.4
Eon=f(VGS,on)
Fig. 3.11 Normalized standard deviation of noise performance as given by Eq. (3.22). An increase
in normalized standard deviation is seen under cyclo-stationary excitation, especially for conditions
where the average cyclo-stationary noise reduction is high
3 Charge Trapping Phenomena in MOSFETS: From Noise to Bias. . . 35
8 Experimental Results
The low frequency noise was experimentally investigated using short and long
channel NMOS devices from a 45 nm technology. Technology details are described
in [15]. The DC and large signal noise was measured using a semi-automatic prober
setup using a BTA noise system, and the results are described in [3]. A square wave
with 50 % duty cycle (’ D 0.5) was applied to the gate using a waveform generator
HP33120A for measuring cyclo-stationary noise. Two types of cyclo-stationary
measurements were performed, one with VGS,OFF D 0 V and VGS,ON D VT , and the
other with VGS,OFF D VT and VGS, ON D VDD . Note, that the effect of modulation for
a signal modulated by a square wave with 50 % duty cycle is a reduction of noise
power (in baseband) by a factor of four.
After the standard modulation theory, for square wave excitation, switching
operation can be represented by the multiplication of the noise with a square-wave
signal with 50 % duty cycle, m(t), as follows [19]:
1 2 2 2
m .t/ D C sin !sw t C sin 3!sw t C sin 5!sw t C : : :
2 3 5
In frequency domain this corresponds to a convolution of the noise PSD with a
spectrum with delta functions at dc, ¨sw , 3¨sw , 5¨sw , and so forth. The dc-term
determines the resulting noise power in baseband, which is (1/2)2 or 6 dB, if
compared to the original noise power.
However, the measurements results did show a much larger noise reduction. The
noise reduction was larger than a factor of ten, and the mean value of the LF-noise
under DC conditions was seen to follow 1/f behavior, but with deviations at some
frequencies, showing Lorentzian components [3]. For cyclo-stationary conditions
these deviations were more pronounced in agreement to our theory. Also in good
agreement to our theory, the noise reduction factor was found to be almost constant
if only measured results below the excitation frequency are considered.
To compare theory and experiment, the average number of traps for all possible
energetic levels in the bandgap were extracted from average noise behavior. The
model comparisons are based on the total number of traps and the assumption of a
U-shaped trap density with a D 11 (see Eq. 3.18).
Experimental results did show that whereas the average LF noise is decreasing
under cyclo-stationary conditions, the normalized standard deviation of noise power
increases. As shown in Fig. 3.12, this again is in good agreement to the model
presented here. In Fig. 3.12, the model values for mean and upper spec in cyclo-
stationary conditions are given by Eqs. 3.17 and 3.22, respectively. Here the upper
boundary for the DC noise is evaluated using the model presented in [1].
36 G. Wirth and R. da Silva
MEAN (MEAS)
UPPER SPEC (MEAS)
MEAN (SIM )
UPPER SPEC (SIM)
10-17
WL SID [A2 mm2 /Hz]
10-18
10-19
DC CYCLO DC CYCLO
L = 0.04 mm L = 1 mm
CHANNEL LENGTH
Fig. 3.12 Area normalized average and C3¢ worst case noise behavior under DC and CS
conditions. All data are taken at 10 Hz noise frequency. Open symbols are data from simulation
and closed symbols are measured data reported in [3]
Above we discussed the low-frequency noise originated from trapping and de-
trapping at localized states (charge traps) at the interface or in the gate dielectric
of transistors. Here we discuss the role of charge trapping in Bias Temperature
Instability (BTI). It is widely accepted that charge trapping plays a role in the
threshold voltage shifts (VT ) produced by Bias Temperature stress. It was reported
that a significant fraction of the threshold voltage shift is recovered spontaneously
once the Bias Temperature stress is removed [20–22]. Although first observed
decades ago, the phenomenon still remains controversial in both experimental and
theoretical terms.
There is a vast literature on Negative Bias Temperature Instability (NBTI) in
PMOSFETs, where most models are based on reaction diffusion. Reaction diffusion
models involve breaking of Hydrogen-Silicon bonds at the Silicon-Gate Dielectric
interface, related to the trapping of inversion layer holes, with the release and
diffusion of a hydrogenic species. Recovery (relaxation) is assumed to occur with
re-bonding of the Hydrogen-Silicon bonds, i.e., anneal out the interface traps [20].
Although reaction diffusion models have been very useful and successful, some
aspects of NBTI are difficult to be fully explained in a reaction diffusion framework,
as for instance the fast recovery which occurs if bias stress is removed [20]. Other
3 Charge Trapping Phenomena in MOSFETS: From Noise to Bias. . . 37
where p11 (dt) D 1p10 (dt) D 1dt/£e and P00 (t) D 1P01 (t). This leads to a simple
differential equation. The solution of this differential equation is [18]:
P01 .t / D 1 – exp t=£eq £e = .£c C £e / (3.24)
38 G. Wirth and R. da Silva
where 1/£eq D 1/£c C 1/£e . A similar evaluation can be performed for P11 (t),
leading to
P11 .t / D £e C £c exp t=£eq = .£c C £e / (3.25)
For a device which has Ntr traps, we can write the number of occupied traps at time
X
Ntr
t, n(t), as being n.t / D i .t /, where ™i (t) can assume the values 0 or 1. We can
iD1
then evaluate the average number of traps occupied at time t as being:
X1 N N t r e N XN t r
hn.t /i D k P .k; t/ (3.26)
N t rD0 Nt rŠ kD0
where NNtr e-N /Ntr ! is the probability that Ntr traps are found in a device, i.e., the
number of traps is assumed to be Poisson distributed, with parameter <Ntr > D N.
While P(k, t) is the probability that k traps are occupied at time t, with k D 0 : : :
Ntr . The averaging must be performed over the successive stochastic capture and
emission events of a trap, and over the number of traps of the ensemble of devices.
The average threshold voltage fluctuation is then obtained by multiplying (3.26)
by <•>, which is the average fluctuation due to a single trap. This leads to:
Since traps have different time constants, which are statistically independent and
identically distributed random variables, for considering P(k, t) in Eq. (3.26) we
evaluate the average probability over different traps by writing a binomial. In this
case
Ntr k Ntr-k
P .k; t/ D P01 .£c ; £e ; t/ P00 .£c ; £e ; t/ (3.28)
k
“
where P01 .£c ; £e ; t/ D d £c d £e P01 .£c ; £e ; t/ f .£c ; £e / and P00 .£c ; £e ; t/ D
“
d £c d £e P00 .£c ; £e ; t/ f .£c ; £e /, with f(£c , £e ) being the joint probability function
of time constants £c and £e over all traps.
£c and £e are random variables, whose values depend on temperature and bias
point, and which we assume to follow [14]:
h i
£c D 10p 1 C exp q (3.29)
where p 2 [pmin , pmax ]. Please note that pmin and pmax define the times constants
of the fastest and the slowest trap, respectively. This also limits the time interval
in which the model here proposed is valid, in a similar way as in the analysis of
low-frequency noise made above. Since p is assumed to be uniformly distributed,
the charge trap characteristic time constants are uniformly distributed on a log
scale. Again, this is the same assumption done in the low-frequency noise analysis
above. The variable q is given by q D (ET –EF )/kB T 2 [(EV –EF )/kB T, (EC –EF )/kB T],
where EC is the conduction band edge, while EV is the valence band edge. ET is
the energy level of the trap. Consequently, £c and £e are temperature dependent.
The assumption of the existence of individual defects with a very wide distribution
of time constants is also in line with recent NBTI data [20–24]. The assumption
of a single p, governing both£c and £e , may be restrictive and deserves deeper
investigation in future works, both theoretical and experimental. Nevertheless, we
did run numerical analysis (Monte Carlo simulations) assuming independent p
values for capture and emission. The BTI behavior is observed to stay essentially
the same (no qualitative deviation on the BTI behavior is observed), as long as the
p values remain uniformly distributed.
For the evaluation of Eq. (3.26), we consider Eqs. (3.28), (3.29) and (3.30). In
evaluating the average over the different random variables, we separate the average
over number of traps
X1 N N t r e N
h. /i D . /
N t rD0 Nt r Š
Please remember that the time constants £c and £e depend on Fermi level, trap
energy level, and temperature, as given by (3.29) and (3.30). Evaluation the averages
in this manner facilitates the analytical analysis.
This leads to:
X1 N N t r e N
hn.t /i D P01 .£c ; £e ; t/ Nt r D N P01 .£c ; £e ; t/
N t rD0 Nt r Š
Z Z !
10pmax t
N Ec
g .ET / dET .e u 1/
D du
ln10 .pmax pmi n / Ev 1 C e .ET EF /=kB T 10pmi n t u
(3.31)
40 G. Wirth and R. da Silva
where g(ET ) describes the trap energy distribution in the band-gap, and in the second
integral a change of variable was made, p D log (u/t), dp D du / (u ln 10). N is
the average number of traps found in a device.
In Eq. (3.31), the first integral contains the Fermi level and temperature depen-
dence, while the second integral contains the time dependence. This means that this
equation has the time dependence separated from the Fermi level (i.e., bias point,
since the Fermi level is defined by the bias point) and temperature dependence.
Hence, the model predicts that for measurements carried out at different tempera-
tures, there are scaling factors that can be used as a multiplicative coefficients for
the threshold voltage shift, making the curves for different temperatures to overlap
(at all measured times). The same applies for measurements carried out at different
Fermi levels, i.e., different stress voltages (bias points). Note that the Fermi level
is a function of the applied voltage. Furthermore, since Eq. (3.31) is valid for both
stress and recovery phases, it implies that the temperature and voltage dependence
(scaling factor) during stress and during recovery is the same. This is very relevant
and in agreement to experimental data, as discussed below.
Equation (3.31) can be evaluated numerically, leading to:
where A and B are constants, and the last term clearly shows that the time evolution
of number of occupied traps shows a log(t) behavior. The term ®(T, EF ) describes the
temperature and Fermi level dependence. Please note that (3.32) above is of the same
form as equation (4) in [23], which was empirically written, as an approximation for
the experimentally observed behavior.
The actual form of the term ®(T, EF ) depends on the trap energy distribution in the
band-gap g(ET ). g(ET ) is usually found to be a convex curve (e.g. U-shaped); see, for
instance, [14, 15]. In this work, a U-shape trap density function is investigated. As
in the case of low-frequency noise analysis for abruptly changing gate bias, the trap
energy distribution g(ET ) is key for explaining experimentally observed findings.
Equation (3.32) models both stress and recovery phases. Hence, both stress and
recovery phases of BTI are described by the same Eq. (3.32). If the density of
initially occupied traps is lower than the value expected for the bias point, the
number of occupied traps increases logarithmically. This corresponds to stress phase
of BTI. On the other hand, if the density of initially occupied traps is higher than
the value expected for the bias point, the density of occupied states decreases
logarithmically. This corresponds to the recovery phase of BTI. Besides analytical
analysis and evaluation, we did run Monte Carlo simulations to confirm this
behavior (Monte Carlo simulations were performed starting from different numbers
of initially occupied traps). Some of the results are presented in Fig. 3.13.
The model here presented considers that the number of traps is constant, i.e.,
there is no trap generation or annihilation. For stress or recovery over long time
intervals, there may be generation of traps during the stress phase, or annihilation
of traps (annealing) during the recovery phase. In this case the average number of
traps <Ntr > in Eq. (3.26) becomes a function of time, <Ntr (t)>. If the number
3 Charge Trapping Phenomena in MOSFETS: From Noise to Bias. . . 41
or traps increases with time during the stress phase, the evolution of number of
occupied traps may become faster than log(t) as time evolves, as experimentally
observed in many works, where the time dependence is found to follow a power
law for long stress times [20, 21]. If a proper equation for <Ntr (t)> is available, the
modeling approach here presented can be used to evaluate this behavior. Modeling
Ntr as a function of time is out of the scope of this work. This work focus on the
log(t) behavior observed for recovery and short stress times, as discussed in the
experimental session below.
For numerical analysis and Monte Carlo simulations performed in this work
(pmax pmin ) was chosen to be 7, i.e., it is assumed that p is uniformly distributed
over seven decades. The behavior does not depend on pmax and pmin , as long as the
time being considered is much longer than the shortest time constant, and much
shorter than longest time constant.
Now we discuss the situation that we call cyclo-stationary excitation, where the
gate bias of the device is periodically switched.
In square wave cyclo-stationary excitation, the bias voltage abruptly alternates
between two states, called on and off. In the context of this work, the on state refers
to the state where stress is applied, and off refers to the state where no stress is
applied. The fraction of the period T in which the device is in the on state is ’ (i.e.,
the duty cycle). Figure 3.14 depicts this situation.
With periodically changing gate bias, the Fermi level becomes a periodic function
of time EF (t). As seen from Eqs. (3.23) to (3.28), the parameters that govern
the charge trapping behavior are the capture and emission time constants, c and
e , respectively. The behavior of these time constants under cyclo-stationary was
already discussed above in the context of Low-Frequency Noise. It has been shown
that as long as the cyclo-stationary period time T is short compared to the trap’s
time constants, the behavior of each trap is governed by an effective capture ( c,eff )
and emission ( e,eff ) time constant. The effective values are evaluated as being the
time average values over one excitation period:
42 G. Wirth and R. da Silva
Fig. 3.14 This figure depicts the behavior of slow traps under switched bias. If the gate bias is
periodically alternated between on and off, the Fermi Level (EF ) changes accordingly. If trap time
constants are much longer than the on (stress) and off (recovery) cycle, traps are not able to follow
device bias, and act according to equivalent time constants. For these traps, which are too slow to
follow bias point change, the behavior is described by the equivalent time constants given by Eqs.
(3.13) and (3.14)
Z T
1 1 1
D dt (3.33)
c;eff T 0 c
Z T
1 1 1
D dt
e;eff T 0 e
Under square wave excitation the effective time constants become a simple function
of the duty cycle ’, as given by Eqs (3.13) and (3.14) above.
This formulation shows that there is no dependence of the effective time con-
stants on the frequency of the cyclo-stationary excitation, as long as the excitation
period T is short compared to the trap time constants.
Using the formulation for the equivalent time constants, Eq. (3.28) becomes
Ntr k Ntr-k
P .k; t/ D P01 .£c; eff ; £e;eff ; t/ P00 .£c;eff ; £e;eff ; t/ (3.34)
k
Because the Fermi level is now a function of time, it is not possible to derive an
analytical closed form expression as Eq. (3.31). However, numerical analysis, as
well as Monte Carlo simulations, show that also under cyclo-stationary excitation
the behavior has the form of Eq. (3.32), i.e., there is one term that describes the log(t)
behavior, multiplied by another term that describes the temperature and Fermi level
dependences. Furthermore, it can be shown below that a simple equation to write
an effective Fermi level may be used, leading to a simple model, suitable for circuit
simulation purposes.
Figure 3.15 shows the results of MC simulation where the switching bias is fast
if compared to the trap time constants of all traps considered in the simulation.
In this simulation, time constants of all traps are at least one order of magnitude
longer than the period of the square wave excitation. In this figure, the dotted
3 Charge Trapping Phenomena in MOSFETS: From Noise to Bias. . . 43
12
10
8
ΔVT (a.u)
0
102 103
Time (a.u.)
Fig. 3.15 Behavior of slow traps under square wave excitation. Steady increase of VT is seem,
causing a BTI degradation which follows a log(t) behavior. Dotted (blue) line is detailed Monte
Carlo simulation, considering usual time constants. Solid (red) line is Monte Carlo simulation
using equivalent time constants, as given by Eqs. (3.13) and (3.14)
(blue) line is the result of the detailed MC simulation of the square wave excitation,
applying the usual time constants during the stress (on) and recovery (off ) phases.
The solid (red) line is MC simulation using equivalent time constants, as given by
Eqs. (3.13) and (3.14). In this simulation the same equivalent time constants are
applied during the whole simulation. In other words, it is similar to a DC simulation
(no switching bias), where the equivalent time constants are applied during the
whole simulation time. Slow traps lead to a degradation component which steadily
increases with time, with a log(t) behavior. In this MC simulation all traps are
assumed to be initially empty. Note that the slow traps do not follow the switching
bias. Steady increase in VT is seen. These MC simulations confirm that the behavior
is determined by the equivalent time constants, as given by (3.13) and (3.14).
Equation (3.34) can be solved numerically, but this approach is not practical
for circuit simulation purposes. A simpler approach, considering and equivalent
Fermi level EF,eff , which is a function of duty cycle, can fit the experimental data.
In this approach, empirically obtained after insights from extensive Monte Carlo
simulations and numerical analysis, EF,eff is written as the time average of the Fermi
level. For the case of square wave excitation with duty cycle ’ we have:
With this empirical approach, the same Eq. (3.31) can model both steady bias
and cyclo-stationary excitation. Monte Carlo numerical analysis confirms that the
empirical formulation fits the numerical solution of (3.34).
44 G. Wirth and R. da Silva
Conclusion
This chapter discussed the role of charge trapping in low-frequency noise
and bias temperature instability (BTI). A statistical model for the charge
trapping phenomena in MOS devices was presented. The model is based
on discrete microscopic device physics parameters. Besides evaluating the
average behavior, the model here proposed allows the derivation of statistical
relevant parameters. The model is applied to study the random telegraph noise
(RTN) and Bias Temperature Instability (BTI).
The impact of charge trapping is modeled as momentary changes in
threshold voltage (or drain current). In analog circuits, this may lead to low-
frequency noise and be a source of jitter in oscillators. In digital circuits it
may lead to aging effects and transient effects, since circuit behavior may
change due to transient VT fluctuations between two logic operations of a
digital circuit.
The modeling approach focuses on operation conditions relevant for digital
and analog design, including large signal AC operation.
It is shown that if the density of traps in the band-gap is a convex curve
(“U”-shaped) a reduction in noise power may be achieved under cyclo-
stationary excitation. However, the variability in noise behavior (normalized
standard deviation of noise power) is shown to increase. This increase in
variability will be another challenge to analog and RF circuit designs in deep
sub-micron CMOS technologies.
Regarding BTI, an analytical model for both stress and recovery phases
of BTI is presented. Furthermore, the model properly describes device
behavior under periodic switching, also called AC-BTI or cyclo-stationary
operation. It is shown that a universal logarithmic law describes the time
dependence of charge trapping in both stress and recovery phases, and that
the time dependence may be separated from the temperature and bias point
dependence.
Acknowledgments Gilson Wirth and Roberto da Silva thank CNPq and FAPERGS for research
grants that partially support the work reported here.
References
3. G Wirth, R da Silva, P Srinivasan, J Krick and R Brederlow. “Statistical model for MOSFET
low-frequency noise under cyclo-stationary conditions” Int Electron Dev Meeting - IEDM
2009, p. 30.5.1-4, (2009).
4. G Wirth and R da Silva, “Low-Frequency Noise Spectrum of Cyclo-Stationary Random
Telegraph Signals”, Electrical Eng., 90, p. 435–41 (2008).
5. R da Silva, G Wirth and L Brusamarello. “An appropriate model for the noise power spectrum
produced by traps at the Si SiO interface: a study of the influence of a time-dependent
Fermi level. Journal of Statistical Mechanics” Journal of Statistical Mechanics. Theory and
Experiment, 2008, p. P10015 (2008).
6. R Brederlow, J Koh, G Wirth, R da Silva, M Tiebout, R Thewes. “Low Frequency Noise
Considerations for CMOS Analog Circuit Design” Proceedings of the 2005 International Conf
on Noise and Fluctuations (ICNF). p. 703–708 (2005).
7. R Brederlow, J Koh and R Thewes. “A physics-based low frequency noise model for MOSFETs
under periodic large signal excitation” Solid-State El., 50, p. 668–73 (2006).
8. I Bloom and Y Nemirovsky. “1/f noise reduction of metal-oxide-semiconductor transistors by
cycling from inversion to accumulation” Appl Phys Lett, 58, p.1664–6 (1991).
9. B Dierickx and E Simoen. “The decrease of ”random telegraph signal”noise in metal-oxide-
semiconductor field-effect transistors when cycled from inversion to accumulation”. J Appl
Phys, 71, p. 2028–2029 (1992).
10. M Ertürk, T Xia and W Clark. “Gate voltage dependence of MOSFET 1/f noise statistics”.
IEEE Electron Dev Let., 28, p. 812–814 (2007).
11. A van der Wel, E Klumperink, E Hoekstra and B Nauta. “Relating random telegraph signal
noise in metal-oxide-semiconductor transistors to interface trap energy distribution” Appl Phys
Lett., 87, p. 183507 (2005).
12. S Machlup, “Noise in Semiconductors: Spectrum of a Two-Parameter Random Signal” J Appl
Phys, 35, p. 341–343 (1954).
13. A Roy and C Enz, “Analytical Modeling of Large-Signal Cyclo-Stationary Low-Frequency
Noise with Arbitrary Periodic Input” IEEE Trans. Electron Dev., 54, p.2537-2545 (2007).
14. M Kirton and M Uren, “Noise in solid-state microstructures: A new perspective on individual
defects, interface states and low-frequency (1/f) noise” Adv. in Physics, 38, p. 367–468 (1989).
15. Ekbote, S.; Benaissa, K.; Obradovic, B.; Liu, S.; Shichijo, H.; Hou, F.; Blythe, T.; Houston,
T.W.; Martin, S.; Taylor, R.; Singh, A.; Yang, H.; Baldwin, G, “45nm Low-Power CMOS
SoC Technology with Aggressive Reduction of Random Variation for SRAM and Analog
Transistors” 2008 VLSI Tech. Symp., p. 160–161 (2008).
16. Wirth, Gilson I. ; da Silva, Roberto ; Kaczer, Ben. “Statistical Model for MOSFET Bias
Temperature Instability Component Due to Charge Trapping”. IEEE Transactions on Electron
Devices, p. 2743–2751 (2011).
17. Ashraf, Nabil ; Vasileska, Dragica ; Wirth, Gilson ; Srinivasan, P. “Accurate Model for
the Threshold Voltage Fluctuation Estimation in 45-nm Channel Length MOSFET Devices
in the Presence of Random Traps and Random Dopants”. IEEE Electron Device Letters,
p. 1044–1046 (2011).
18. Camargo, Vinícius V. A. ; Ashraf, Nabil ; BRUSAMARELLO, Lucas ; Vasileska, Dragica ;
Wirth, Gilson . “Impact of RDF and RTS on the performance of SRAM cells”. Journal of
Computational Electronics, p. 122–127, (2010).
19. E Klumperink, S Gierkink, A van der Wel, and B Nauta; “Reducing MOSFET 1/f noise and
power consumption by switched biasing ,” Solid-State Circuits, IEEE Journal of , vol.35, no.7,
pp.994-1001 (2000).
20. D. K. Schroder, “Negative bias temperature instability: What do we understand?”, Microelec-
tron. Reliab., vol. 47 no. 6, pp. 841–852, (2007).
21. S. E. Rauch, “Review and reexamination of reliability effects related to NBTI statistical
variations,” IEEE Trans. Device Mater. Rel., vol. 7, no. 4, pp. 524–530, (2007).
46 G. Wirth and R. da Silva
1 Introduction
The transistor mismatch due to random variations in process parameters has become
one of the major issues in deep-submicrometer technology. The term transistor
mismatch refers to the fact that supposedly identical transistors at the design phase
come out as distinct devices after manufacturing due to process variations. The
extrinsic factors include variation in the implantation dose and energy, oxidation
and annealing temperatures, etc. On the other hand, the intrinsic factors include
variations due to channel dopant number, interface, and fixed oxide charge. Both
the extrinsic and the intrinsic factors influence the transistor output performance
and yield. The transistor mismatch effect will become worse in the future due
to demanding requirement on process tolerance. Hence, there is a direct need to
develop a very good understanding on the impact of variation in a given unit process
on resulting variation in a given circuit parameter. This will enable the process
engineering and manufacturing team to define appropriate process monitor and
control criteria for all the unit processes involved. This will also help during the
technology development phase to select a particular process integration scheme
that could minimize the mismatch among various available options. In this book
chapter we first discuss the impact of discrete impurity effects on critical device
parameters (Sect. 2). In Sect. 3 we elaborate on the influence of the random telegraph
noise fluctuations on the variation in the threshold voltage and device on current.
Conclusions from this work are presented in “Conclusions” section.
Statistical fluctuations of the channel dopant number were predicted by Keyes [1]
as a fundamental physical limitation of MOSFETs down-scaling. Entering into
the nanometer regime results in a decreasing number of channel impurities whose
random distribution leads to significant fluctuations of the threshold voltage and
off-state leakage current. These effects are likely to induce serious problems on the
operation and performances of logical and analog circuits.
It has been experimentally verified by Mizuno and co-workers [2] that threshold
voltage fluctuations are mainly caused by random fluctuations of the number of
dopant atoms (RDF) and other contributions such as fluctuations of the oxide
thickness are comparably very small. It follows from these remarks that impurities
cannot be considered anymore using the continuum doping model in advanced
semiconductor device modeling but the precise location of each individual impurity
within a full Coulomb interaction picture must be taken into account.
To illustrate the importance of the discrete impurity effects on the threshold voltage
and the off-state current, in 1996 the group from ASU simulated a number of MOS-
FET device structures [3]. The corresponding potential profile under equilibrium
4 Atomistic Simulations on Reliability 49
Fig. 4.1 The role of discrete impurities in large devices (left panel) and in mesoscopic devices
(right panel)
Fig. 4.2 Top panel: Potential profile in a cut through the depth of the device. Bottom panel:
Potential profile in the semiconductor in a plane parallel to the semiconductor/oxide interface
conditions along the depth and in the plane parallel to the semiconductor/oxide
interface is shown in the top and bottom panels of Fig. 4.2. In our analysis we have
found that it is not only the total number of atoms within the discrete doping region
that matters, but the location of these atoms plays very important role.
Illustrated in Fig. 4.3, are the potential profiles and the current stream lines for
two impurity distributions. Note that the significant current crowding in the upper
panel near the critical source end of the channel leads to smaller drain current for
VD D 50 mV and a range of VG values. This, in turn, results in larger threshold
voltage for this device.
50 D. Vasileska and N. Ashraf
Fig. 4.3 Potential profile and current stream lines in a 50 nm gate-length MOSFET device. Shown
here are two impurity distributions in the device active region
In the past, the effect of discrete dopant random distribution in MOSFET channel
has been assessed by analytical or drift-diffusion (DD) approaches. The first DD
study consisted in using a stochastically fluctuating dopant distribution obeying
Poisson statistics [7]. 3D atomistic simulators have also been developed for studying
threshold voltage fluctuations [8, 9] Even though the DD/HD (hydrodynamic)
methods are very useful because of their simplicity and fast computing times, it
is not at all clear whether such macroscopic simulation schemes can be exploited
into the atomistic regime. In fact, it is not at all clear how such discrete electrons
and impurities are modeled in macroscopic device simulations due to the long-range
nature of the Coulomb potential. 3D DD/HD macroscopic models may be accurate
for modeling the threshold voltage fluctuations (since the device is in the off-state)
but they are definitely not accurate when examining the on-state current fluctuations.
Three-dimensional Monte Carlo (MC) simulations should provide a more realis-
tic transport description in ultra-short MOSFETs, in particular in the on-state. The
MC procedure gives an exact solution of the Boltzmann transport equation. It, thus,
correctly describes the non-stationary transport conditions. Even if microscopic
simulations such as the MC method are concerned, the treatment of the electrons
and impurities is not straightforward due to, again, the long-range nature of the
Coulomb potential. The incorporation of the long-range Coulomb potential in the
MC method has been a long-standing issue [10]. This problem is, in general,
avoided by assuming that the electrons and the impurities are always screened by the
other carriers so that the long-range part of the Coulomb interaction is effectively
suppressed. The complexity of the MC simulation increases as one takes into
account more complicated screening processes by using the dynamical and wave-
vector dependent dielectric function obtained from, for example, the random phase
approximation. Indeed, screening is a very complicated many-body matter [11].
A novel approach has been introduced by the ASU group [12], in which the MC
method is supplemented by a molecular dynamics (MD) routine. In this approach,
the mutual Coulomb interaction among electrons and impurities is treated in the
drift part of the MC transport kernel. Indeed, the various aspects associated with
the Coulomb interaction, such as dynamical screening and multiple scatterings,
are automatically taken into account. Very recently, the MC/MD method has
been extended for spatially inhomogeneous systems. Since a part of the Coulomb
52 D. Vasileska and N. Ashraf
interaction is already taken into account by the solution of the Poisson equation,
the MD treatment of the Coulomb interaction is restricted only to the limited area
near the charged particles. It is claimed that the full incorporation of the Coulomb
interaction is indispensable to reproduce the correct electron mobility in highly
doped silicon samples.
Although real space treatments eliminate the problem of double counting of the
force, a drawback is that the 3D Poisson equation must be solved repeatedly to
properly describe the self-consistent fields which consume over 80 % of the total
simulation time. To further speed up simulations, the ASU team has, for the first
time, utilized a 3D Fast Multi-pole Method (FMM) [13–16] instead. The FMM
allows calculation of the field and the potential in a system of n particles connected
by a central force within O(n) operations given certain prescribed accuracy. The
FMM is based on the idea of condensing the information of the potential generated
by point sources in truncated series expansions. After calculating suitable expan-
sions, the long range part of the potential is obtained by evaluating the truncated
series at the point in question and the short range part is calculated by direct
summation. The field due to the applied boundary biases is obtained at the beginning
of the simulation by solving the Laplace equation. Hence the total field acting on
each electron is the sum of this constant field and the contribution from the electron–
electron and electron–impurity interactions handled by the FMM calculations. The
image charges, which arise because of the dielectric discontinuity, are handled by
the method of images [17].
Next we present several success stories on the use of our 3D particle-based device
simulation code that properly takes into account the short range and the long range
Coulomb forces (Fig. 4.4). We begin with the example of energy relaxation of the
VD =1 V, VG = 1 V
300
200
100
channel drain
0
100 110 120 130 140 150 160 170 180
Length [nm]
Fig. 4.4 Average energy of the electrons coming to the drain from the channel. The applied bias
equals VD D VG D 1 V. Filled (open) circles correspond to the case when the short-range e–e and
e–i interactions are included (omitted) in the simulations
4 Atomistic Simulations on Reliability 53
carriers at the drain end of the MOSFET channel. The simulated device has channel
length LG D 80 nm, channel width WG D 80 nm and oxide thickness Tox D 3 nm.
The lateral extension of the source and drain regions is 50 nm. The channel doping
equals 3 1018 cm3 . The applied bias is VG D VD D 1 V. Only those electrons that
entered the channel region from the source side were “tagged” and their energy
and position was monitored and used in the average energy calculation. From the
average velocity simulation results, it follows that the short-range electron–electron
(e–e) and electron–impurity (e–i) interaction terms damp the velocity overshoot
effect, thus increasing the transit time of the carriers through the device; thus
reducing its cut-off frequency. It is also quite clear that when we use the mesh
force only, i.e. we skip the MD loop that allows us to correct for the short-range
e–e and e–i interactions, those electrons that enter the drain end of the device from
the channel never reach equilibrium. Their average energy is more than 60 meV
far into the drain region. Also, the average energy peaks past the drain junction.
The addition of the short-range Coulomb forces to the mesh force via the MD loop,
leads to rapid thermalization of the carriers once they enter the drain region. The
characteristic distance over which carriers thermalize is on the order of a few nm.
Second important example we have chosen in this paper is the quantitative
prediction of the threshold voltage fluctuations versus device gate width, channel
doping and oxide thickness, that are shown in Fig. 4.5. Also shown in this figure
are the analytical model predictions derived from experimental studies by Mizuno,
Okamura, and Toriumi [2], according to which
p ! p
4
q 3 "s b Tox 4 N
vt D p p ; (4.1)
2"ox Leff Weff
where N is the average channel doping density, ¥b is the built-in potential, Tox is the
oxide thickness, Leff and Weff are the effective channel length and width, and "s and
"ox are the semiconductor and oxide permittivity, respectively. Stolk, Widdershoven
and Klaassen [18] generalized the analytical result by Mizuno and his co-workers,
by taking into account the finite thickness of the inversion layer, depth-distribution
of charges in the depletion layer and the influence of the source and drain dopant
distributions and depletion regions. For a uniform channel dopant distribution, the
analytical expression for the threshold voltage standard deviation simplifies to
p ! p
4 4
q 3 "s b kb T 1 Tox N
vt D p p C p : (4.2)
3 q 4"s b Na "ox Leff Weff
In Eq. (4.2), the first term in the square brackets represents the surface potential
fluctuations whereas the second term represents the fluctuations in the electric field.
The decrease of the threshold voltage fluctuations with increasing the width of the
gate is due to the averaging effects, in agreement with the experimental findings by
Horstmann et al. [19]. We want to point out that we still observed significant spread
54 D. Vasileska and N. Ashraf
σVth [mV]
40
30
20
30
σVth [mV]
25
20
15 σVth => Equation (1)
10 σVth => Equation (2)
0
1x1018 3x1018 5x1018 7x1018
Doping density NA [cm-3]
c 100
σVth => Equation (1)
60
40
LG =WG =50 nm
20
NA =5x1018 cm-3
0
0 1 2 3 4 5
Oxide thickness Tox [nm]
4 Atomistic Simulations on Reliability 55
of the device transfer characteristics along the gate voltage axis even for devices
with WG D 100 nm. This is due to the nonuniformity of the potential barrier, which
allows for early turn-on of some parts of the channel. As expected, the increase in
the channel doping leads to larger threshold voltage standard deviation ¢VTH . These
results also imply that the fluctuations in the threshold voltage can be even larger in
devices in which counter ion implantation is used for threshold voltage adjustments.
Similarly, the increase in the oxide thickness leads to linear increase in the threshold
voltage standard deviation. The results shown in Fig. 4.5(a–c) also suggest that
reconstruction of the established scaling laws is needed to reduce the fluctuations
in the threshold voltage. In other words, within some new scaling methodology, Tox
should become much thinner, or NA much lower that what the conventional scaling
laws give.
To further elaborate on the role of Coulomb interactions, in Example 3, as
suggested by the results presented in Example 2, we consider 50 nm channel
length narrow-width SOI device with undoped channel. The results presented in
Fig. 4.6 also suggest that there are fluctuations in the device threshold voltage
for devices fabricated on the same chip due to unintentional doping and random
positioning of the impurity atoms. This can also be deduced from the scatter of the
experimental data from [20]. The simulation results of the transfer characteristics
with a single impurity present in different regions in the channel of the device,
shown in the top panel of Fig. 4.6, clearly demonstrate the origin of the threshold
voltage shifts for devices with 10 and 5 nm channel width. The width dependence
of the threshold voltage for the case of a uniform (undoped) and a discrete impurity
model is shown in the bottom panel of Fig. 4.6. These results suggest that both size-
quantization effects and unintentional doping must be concurrently considered to
explain threshold voltage variation in small devises.
b 0.6
Experimental: Lg = 250nm
0.5 Simulation: Lg = 50 nm
Simulation: Vt roll-off adjusted
0.4 Discrete single dopants
Threshold Voltage [V]
0.3
0.2
0.1
-0.1
-0.2
2 4 6 8 10 12 14 16
Channel Width [nm]
Fig. 4.6 Top Panel: Transfer characteristics of the device with 10 and 5 nm channel widths and
different location of the impurity atoms. Bottom Panel: Width dependence of the threshold voltage
for the case of a uniform and a discrete impurity model. Clearly seen in this figure are two trends:
(a) Threshold voltage increase with decreasing channel width due to quantum-mechanical size
quantization effects, and (b) Scatter in the threshold voltage data due to unintentional doping
at least readily encountered, the number of defects seem to reduce to less than 3
in number per square micron within an eV energy distribution for a device size
of W L D 50 nm 50 nm. The reason reliability concern did not arise in wide
gate area technology generation is because with a good number of traps lying
4 Atomistic Simulations on Reliability 57
within a few eV of Fermi energy, the spatial distribution of energy levels is more
tighter making the energy barrier values 4EB a fraction of an eV. Hence the carriers
trapped in traps easily reemit to inversion layers making RTS amplitude variation
almost nonexistent. But in today’s aggressively scaled device size, even though the
trap numbers are countable and sparse, these traps can be located deep within the
oxide with higher 4EB differential and a carrier once trapped in a trapped site,
may stay there for a prolonged period of time and never get reemitted to inversion
layer causing severe RTS amplitude drawbacks. The surface level trap lying close
to the interface can block the carrier flow in the channel by causing local potential
fluctuations where significant spread of RTS variation can be observed for even a
single trap closer to source side to mid-channel zone impeding the carrier flow. Also
one way carrier gets trapped and get detrapped is through tunneling from inversion
layer to a trap location and in earlier technological generation with thicker gate
oxide, tunneling was not as significant as it is today with the gate-oxide reaching
nanometric thickness. The conclusion is with the channel electrons being random,
presence of very few defects will suffice to cause notable RTS related device
operation failure for current ongoing technological generation of MOSFETs,
Figure 4.7 (top panel) illustrates the measurement data from a 90 nm SRAM
design [21]. The minimum supply voltage (Vccmin ), which is highly sensitive to
device threshold voltage, exhibits a similar pattern in the time domain as that of
RTF. The impact of RTF in this case is more than 200 mV, which is catastrophic to
the yield and low-power design of SRAM. Therefore, accurate and physical models
of RTF are essential to predict and optimize circuit performance during the design
stage. Currently, such models are not available for circuit simulation. The compound
between RTF and other sources of variation, such as RDF, further complicates the
Fig. 4.7 Top panel: The fluctuation of SRAM Vccmin due to RTF. Bottom panel: A random
telegraph signal (namely x(t)) produced by a carrier trap
58 D. Vasileska and N. Ashraf
situation especially in extremely scaled CMOS design. Figure 4.7 (bottom panel)
shows a typical two level RTS signal time domain characteristics.
In this research work the integration of random defects positioned across the channel
at the Si/SiO2 interface from source end to the drain end in the presence of different
random channel and bulk dopant distributions are used to conduct Ensemble
Monte-Carlo (EMC) based numerical simulation of key device performance metrics
for 45 nm gate length MOSFET device. The two main performance parameters
that affect RTF based reliability measurements are percentage change in relative
drain current fluctuation, particularly in the saturation region where most digital
circuits operate and percentage change in threshold voltage that affects the device
transconductance and on-current drive.
The simulator described in the previous section is presently being used in the
investigation of the random trap fluctuations in 45 nm technology node MOSFET
device where, in addition to the randomness of the position and the actual number of
the impurity atoms in the whole simulated domain of the device, a random double-
charged trap is introduced in the middle section of the channel and moved from the
source-end to the drain end of the channel. An example of a discrete impurity pattern
and a double trap located at the source-end of the channel is shown in Fig. 4.8. The
effective channel length of 45 nm technology node is taken to be 35 nm.
We consider ensemble of 20 devices with different random dopant distribution.
The threshold voltage of each of these 20 devices without the presence of the trap is
shown in Fig. 4.9.
The total variation of the threshold voltage as a function of the double trap
position in the middle portion of the channel, when moved from the source end to
the drain end of the channel, is shown in Fig. 4.10. We see that the threshold voltage
increases from its average value when the double trap is located at the source end of
the channel. This is due to the fact that carriers see additional large potential barrier
due to the presence of the charged double trap and are reflected back in the source
contact. The threshold voltage reduces when the double trap is moved away from the
source injection barrier because when the electrons are injected in the channel, even
though the electric field is small (due to small drain bias applied when measuring
threshold voltage), they slowly drift towards the drain contact.
In Fig. 4.11 we depict the threshold voltage standard deviation variation as a
function of the double trap position when the double trap is being moved from the
middle of the source end of the channel to the middle of the drain end of the channel.
An explanation of the results given in Fig. 4.11 is schematically shown in Fig. 4.12.
At threshold voltage, the sheet electron density in the channel is small, therefore
screening is not important. Traps near the source end of the channel have the largest
4 Atomistic Simulations on Reliability 59
35
Distance [nm]
30
25
20
15
10
0
0 5 10 15 20 25 30 35 40 45
Distance [nm]
Trap position
Fig. 4.8 Random dopant distribution and a double trap located in the middle of the source end of
the channel
Threshold voltage Vt [V]
0.25
0.2
0.15
0.1
0.05
0
0 5 10 15 20
Dopant Configuration
Fig. 4.9 Threshold voltage fluctuations due to random dopant fluctuations (without traps) for a
statistical ensemble of 20 devices with different number and different distribution of the impurity
atoms
60 D. Vasileska and N. Ashraf
0.19
0.16
0.15
0.14
0.13
0.12
0.11
0.1
0 5 10 15 20 25 30 35 40
distance along the channel [nm]
Fig. 4.10 Threshold voltage variation with trap position variation. These results are averages over
the ensemble of 20 devices
45
Simulated data
cubic interpolation
40
percentage Vt variation
35
30
25
Trap position (b)
20
Fig. 4.11 Percentage threshold voltage fluctuation due to a double trap located at the semiconduc-
tor/oxide interface and different position along the middle section of the channel. Twenty devices
with different random dopant distributions have been averaged out. Detailed description of the
trend of the results shown in this figure is given in Fig. 4.12
4 Atomistic Simulations on Reliability 61
a b
Sheet density Sheet density
distance distance
VD small VD small
10
channel
6
2
VDs = 0.7 V, VGs = 0.8 V
0
0 10 20 30 40
distance along the channel [nm]
Fig. 4.13 ON-current degradation vs. trap position. The statistical ensemble used here consists of
the first seven devices from Fig. 4.9 with different number and different distribution of the impurity
atoms
influence since they are major obstacles to the electrons because of the large input
barrier depicted in case (a) shown on the left panel of Fig. 4.12. Traps near the
drain end of the channel have smaller influence since electrons are accelerated by
the small electric field—case (b) shown on the right panel of Fig. 4.12.
Figure 4.13 shows the ON-current degradation as a function of the double trap
position. As depicted on the figure, near the source end of the channel the current
degradation due to the presence of a negatively charged double trap is large because
the double trap introduces additional barrier for the current flow. When the double
trap is in the middle section of the channel, the current degradation is smaller.
Traps near the drain contact, where the electron density is pinched off for the bias
62 D. Vasileska and N. Ashraf
Sheet Sheet
density density
distance distance
(a)
(a)
(b)
(b)
VD small
VD large
Fig. 4.14 Under small gate and drain bias, we have the situation depicted in Fig. 4.12. For large
gate bias: (a) the sheet density increases which means that screening increases; (b) traps near
the drain are surrounded with large number of electrons for small VD , therefore screening of the
Coulomb potential is large and there is smaller degradation of the current; (c) for large VD traps
near the drain are surrounded with smaller number of electrons, therefore screening is smaller and
we have slightly larger ID degradation
conditions used, are not effectively screened and a notable increase of the current
degradation is observed. At the drain contact, the degradation drops practically to
zero because there traps are effectively screened by the electrons. The expected
current trends under small and large drain voltages are explained in Fig. 4.14.
4 Conclusions
Unlike most of the analytical revelation of published articles, where it has been
reported that fluctuation in drain current amplitude variation and threshold voltage
variation tend to diminish at strong inversion and saturation conditions imposing
higher gate and drain bias owing to the screened out potential due to high inversion
charge density at the surface with associated improvement of Coulombic-scattering
related mobility, our simulations conducted at saturation bias conditions on a 45 nm
MOSFET reveal that for different random dopant distributions in the channel
and bulk, the fluctuation pattern exhibited by drain current amplitude variation
and threshold voltage variation are truly statistical and random in nature, i.e., for
some specific random dopant distributions, the fluctuation nature is well controlled
whereas for some other random dopants, the fluctuation pattern shows significant
transitions between local peaks and valleys. From our EMC simulation, it has been
demonstrated that fluctuations in the drain current amplitude and the threshold
4 Atomistic Simulations on Reliability 63
voltage have been dependent on particular random dopant distribution type, i.e., its
number within the channel area and its position, in addition to having strong
correlation on strategically positioned interface traps along the channel from source
to drain.
In order to truly represent the amplitude variation to show more dependence on
spatial positioning of trap than specific random dopant type, the expectation value
of the statistic (drain current or threshold voltage amplitude change) or the average
term needs to be studied out of a significant number of possible random channel
dopant distributions. These averaged-out values of drain current and threshold
voltage fluctuations as extracted by the simulations can be analytically rationalized
through the concept of device physics of scaled MOSFETs. We propose herein the
following analytical physical explanations supporting the simulated results obtained
for drain current fluctuations and threshold voltage fluctuations with associated
fluctuations in its standard deviation induced by trap’s occurrences in the channel
close to the interface spatially located between source and drain.
It has long been known that random dopant fluctuations cause fluctuations in the
surface potential along the channel from source to drain and also from interface to
some depth into the bulk. The fact that at strong inversion and saturation condition,
the ready availability of inversion charge density smears out the surface potential
variation, thus possible fluctuation in drain current and threshold voltage, needs
a second look. With considerable scaling of gate length along with narrow width
MOSFET devices, there are only few 100 atoms available per device area of scaled
MOSFET and considering their random positioning along the channel and also
along the depth from interface towards bulk, based on the fact that the depletion
region available underneath the channel itself has a spatial random distribution
owing to the random doping of the bulk, the inversion charge density formed
by surface band bending is not continuous along the interface from source to
drain, rather it is discrete and spatially random and nonuniform. Therefore the
very possibility of screened out or smeared out surface potential at the interface
is never a reality and depending upon the local fluctuation enhanced inhomogeneity
of inversion charge density even at high gate bias, there will be sizeable spatial
variation of surface potential at the interface from source towards drain. In addition
to experiencing fluctuation due to random discreteness of channel dopant position
and its number, the local surface potential fluctuation is also enhanced by the
strategic location of trap or a number of traps interspersed between one another
and distributed along the channel from source to drain. A repulsive trap center
originating from the presence of acceptor like traps will enhance the spike in local
potential amplitude at the trap’s position and if this trap happens to be situated
64 D. Vasileska and N. Ashraf
right at the source side or even a few nm away from the source side, the spike
in potential barrier is going to significantly impede the carrier flow and thereby
reduce the number of carriers that reach the drain side and contribute to current
value. A trap’s positioning at the source side being repulsive type consequently
causes significant negative shift in current value and significant rise in fluctuation
percentage value of relative drain current amplitude. This phenomenon is not
uncommon in modern process controlled MOSFET fabrication as most RTF related
phenomenon are caused by acceptor like traps being repulsive in nature. As the
drain bias increases ensuring saturation conditions, the spatial conduction energy
band gradually decreases towards the drain, enhancing drift related carrier transport
where the carriers either enter the condition of velocity saturation or velocity
overshoot near the drain zone and easily surmount any local barrier constituted
surface potential fluctuation. Hence if the traps are near the drain side, the fluctuation
pattern reduces to a steady and stable value. This physical aspect is corroborated
by all the simulation outputs of present research work regarding drain current
fluctuation. When the double-trap is located at the pinch-off region there is slight
increase in the current degradation.
For the threshold voltage fluctuations and fluctuation in standard deviation of
threshold voltage as extracted by our simulation results, the analytical explanation
holds and since a trap’s presence at the source side of the channel is causing
maximum resistance to carrier flow by showing a peak of local surface potential
at trap’s position, therefore this will affect the inversion charge density available
at threshold and shift the threshold voltage by altering the gate bias conditions at
threshold. Therefore relative threshold voltage amplitude reaches a high peak near
source side trap positions and due to presence of high enough energetic carriers
along the drain region, a trap’s positioning there would not alter the inversion
condition to the extent that a considerable shift in threshold voltage amplitude will
be expected. Hence, the threshold voltage shift for drain side-positioned traps will be
minimal. This physical aspect is also vindicated by the available data statistics that
have been shown in the plots shown previously for threshold voltage variation with
its fluctuation in standard deviation of threshold voltage. The explanation presented
in this subsection from device physics standpoint is attributed to the condition that
a large number of channel random dopant distributions are considered and have
been averaged to arrive at the values of final drain current fluctuation statistic and
threshold voltage fluctuation statistic. Expectedly, the randomness of the distribution
of channel and bulk dopants will induce comprehensive randomness in local surface
potential fluctuations making the after effect on the drain current fluctuations
and threshold voltage fluctuations to be purely statistical. In essence, the plots
shown with different random dopant distributions related drain current amplitude
fluctuation and threshold voltage variation reveal a very important observation that
strategically devised random channel dopant distributions through ion implantation
can more successfully suppress the variability in drain current and threshold voltage
and excel against RTF related device failure.
4 Atomistic Simulations on Reliability 65
Each trap acts as a potential scattering center and impacts the scattering related
transport efficiency. When the traps are closely apart and randomly positioned along
the channel from source to drain, trap’s interaction with carrier electron in the inver-
sion channel and fixed ions in the underlying depletion region modifies the short
range trap-to-electron–electron and trap-to-electron–ion Coulombic potential giving
rise to significant Coulomb scattering related mobility reduction. Since our EMC
transport kernel is equipped with properly accounting for this Coulomb effect, the
results of the simulations perfectly reveal the fact that due to the random reduction
of transport mobility of carrier due to trap’s position and interactions with nearby
electrons and ions, drain current amplitude reduction and hence larger fluctuation
of drain current amplitude is expected which will be spatially nonuniform. The
Coulomb related short range and long range e–e and e–ion interaction with traps will
also result in significant threshold voltage shift at inversion condition on the gate,
hence is responsible for trap related threshold voltage fluctuation. The modification
of Coulomb potential seen by a trap surrounding its vicinity where a significant or no
carrier may reside and significant or no ion may reside stemming from their mutual
random and discrete nature, is also severely impacted by a trap’s proximity to a
nearby trap. Since for all our simulations, the traps were very closely positioned to
one another within 1 nm separation, this will induce more Coulomb potential change
and changes in local short range trap-electron and trap-ion interaction affecting
carrier mobility. The rapid screening of Coulomb potential only occurs at the drain
side with high bias on drain and hence at those locations (except the pinch-off
region), mobility reduction is minimal and the fluctuation pattern for both relative
drain current and threshold voltage is minimized. In addition, depending on the
discrete positioning and number density of channel dopants, the particular trap-to-
electron and trap-to-ion interaction for both short and long range Coulomb forces
will be fully distinctive and random causing randomness in mobility reduction
spreading over different random channel and bulk dopant distributions. Larger
spikes and surges in local surface potential values expected near the source side,
will contribute to associated variation in short range and long range Coulomb force
related scattering probability of carrier transport for a trap positioned there.
From the EMC simulation results of our present research work on RTF related drain
current and threshold voltage fluctuation, a close observation reveals the important
viewpoint that for interface trap occurrences, only traps within a thousandth fraction
of a nm from the interface in the channel are considered for study. It has been shown
by the exhaustive work of Professor Dragica Vasileska and her co-researchers that
both the drain current and threshold voltage are strongly correlated to within a few
66 D. Vasileska and N. Ashraf
nm from the interface to the vertical channel depth. Since the traps designed in our
simulation are almost within a considerable fraction of a nm from the interface,
it is vindicated from previous simulation study with present EMC code that both
drain current fluctuation pattern and threshold voltage fluctuation pattern will show
strong positive correlation with trap’s proximity to the interface. This feature has
been found to be another potent factor for reasoning out wide fluctuation patterns
observed for both drain current amplitude and threshold voltage variation with its
standard deviation. In addition, the interface roughness and chemical imperfections
due to trap’s positioning as scattering centers at random locations from source to
drain are also a significant source for interface roughness scattering related mobility
reduction which from theory is found to be one of most dominant causes for
mobility reduction at high gate and drain bias, i.e. saturation bias condition of the
MOSFET which we employed in our simulation for on current fluctuations study.
The interface roughness will reduce the overall effective mobility with the traps
acting as scattering sites randomly distributed within the inversion channel, being in
conjunction with Coulomb-force related mobility reduction and will impact spatial
random and statistical variation in both relative drain current amplitude.
References
1. Keyes R. W., The effect of randomness in the distribution of impurity atoms on FET thresholds,
Appl. Phys. vol. 8, 251–259 (1975).
2. Mizuno T., Okumtura J. and Toriumi A., Experimental study of threshold voltage fluctuation
due to statistical variation of channel dopant number in MOSFET’s, IEEE Trans. Electron
Devices 41, 2216–2221 (1994).
3. Vasileska D., Gross W. J., Kafedziski V. and Ferry D. K., Convergence properties of the
Bi-CGSTAB method for the solution of the 3D Poisson and 3D electron current continuity
equations for scaled Si MOSFETs, VLSI Design 8, Nos. 1-4, 301–305, (1998).
4. Fischetti M. V., Towards Fully Quantum Mechanical 3D Device Simulations, Journal of
Computational Electronics 1, 81–85 (2002).
5. Fischetti M. V., Effect of the electron–plasmon interaction on the electron mo-bility in silicon,
Phys. Rev. B 44, 5527–5534 (1991).
6. Lilly M. P., Eisenstein J. P., Pfeiffer L. N., and West K. W., Coulomb Drag in the Extreme
Quantum Limit, Phys. Rev Lett. 80, pp. 1714–1717, (1998).
7. Wong H. S. and Taur Y., Three-dimensional “atomistic” simulation of 50 nm FETs, in Proc.
IEDM, 29.2.1 (1993).
8. Gross W. J., Vasileska D. and Ferry D. K., 3D Simulations of Ultra-Small MOSFETs with
Real-Space Treatment of the Electron–Electron and Electron–Ion Interactions, VLSI Design,
Vol. 10, 437–452 (2000).
9. Asenov A., Random Dopant Induced Threshold Voltage Lowering and Fluctuations in Sub-
0.1 um MOSFETs: A 3-D Atomistic Simulation Study, IEEE Trans. Electron Devices 45,
2505–2513 (1998).
10. Sano N., Matsuzawa K., Mukai M., and Nakayama N., On discrete random dopant modeling
in drift-diffusion simulations: physical meaning of ‘atom istic dopants, Microelectronics
Reliability, Volume 42, Issue 2, 189–199 Feb. (2002).
4 Atomistic Simulations on Reliability 67
11. Ferry D. K., Kriman A. M., Kann M. J., and Joshi R. P., Molecular dynamics extensions
of Monte Carlo simulation in semiconductor modeling, Computer Physics Comm., vol. 67,
119–134 (1991).
12. Gross W. J., Vasileska D. and Ferry D. K., A Novel Approach for Introducing the Electron–
Electron and Electron–Impurity Interactions in Particle-Based Simulations, IEEE Electron
Device Lett. 20, No. 9, 463–465 (1999).
13. Greengard L. and Rokhlin V., A fast algorithm for particle simulations, J. Comput. Phys. 135,
280–292 (1997).
14. Beatson R. and L. Greengard, A short course on fast multipole methods, in Wavelets, Multilevel
Methods and Elliptic PDEs (Leicester, 1996), ser. Nu mer.Math. Sci. Comput. New York:
Oxford Univ. Press, 1–37 (1997).
15. Cheng H., Greengard L., and Rokhlin V., A Fast Adaptive Multipole Algo-rithm in Three
Dimensions, J. Comput. Phys. 155, 468–498 (1999).
16. FMMPART3D user’s guide, version 1.0 ed., MadMax Optics, Hamden, CT, USA.
17. Khan H. R.,Vasileska D., Ahmed S. S., Ringhofer C. and Heitzinger C., Mod-eling of FinFETs:
3D MC Simulation Using FMM and Unintentional Doping Effects on Device Operation,
Journal of Computational Electronics, Vol. 3, Nos. 3-4, 337–340 (2005).
18. Stolk P. A., Widdershoven F. P. and Klaassen D. B. M., IEEE Trans. Electron Devices 45,
1960–1971 (1998).
19. Horstmann J. T., Hilleringmann U. and Goser K. F., IEEE Trans. Electron Devices 45, 299–306
(1998).
20. Majima H., Ishikuro H., and Hiramoto T., IEEE Electron Dev. Lett. 21, 396–398 (2000).
21. Agostinelli T., Arca M., Caironi M., Ferrero V., Natali D. and Sampietro M., Trapping effects
on the frequency response of dithiolane-based planar photodetectors, ICSM0 International
Conference on Science and Technology of Synthetic Metals.
Chapter 5
On-Chip Characterization of Statistical Device
Degradation
Abstract Bias temperature instability (BTI) is one of the most critical degradation
mechanisms that occur in modern semiconductor devices. The degradation due
to BTI is transient, and known to be greatly influenced by bias voltages and
temperature, making it very difficult to detect possible BTI-related failures during
manufacturing test. Characterization and modeling of BTI is hence extremely
important to protect a chip from BTI-related failures. In this chapter, an array
structure that accelerates the statistical characterization of BTI is described. By over-
lapping the stress-application period for each device, measurements on hundreds or
thousands of devices can be conducted concurrently. Test chip measurement results
that provides a statistical insight on the parameters of BTI-related degradation
process are also presented.
1 Introduction
channel formation. The temporal change in the charges in the channel region will
cause a drift in circuit performance or, in the worst case, will result in a malfunction
of the circuit.
As semiconductor circuits are utilized in various applications, diverse levels
of reliability are demanded. Certain applications impose very stringent reliability
requirements. Examples include transportation systems such as automated car-
driving systems or medical applications. On the other hand, in another application,
cost lowering may be pursued as the highest priority, requiring a certain level of
reliability to guarantee equal or longer than product lifespans. It has now become
a common practice to design for the reliability of a circuit in addition to its
performance.
A device model that incorporates a transient change in the device parameters is a
key enabler to facilitate the design of device and circuit reliability. As a solid basis
for building such a model, it is important to establish an accurate measurement
methodology that will contribute to understanding the mechanisms behind the
device degradation. In particular, the phenomena of bias temperature instability
(BTI) are drawing increasing attention.
In this chapter, we survey characterization methods for the purpose of device
modeling that includes the effects of temporal parameter degradation. We begin
by quickly reviewing the physical mechanisms of temporal threshold voltage
(Vth) changes: random telegraph noise (RTN), negative bias temperature instability
(NBTI), and positive bias temperature instability (PBTI), which are considered to
be interrelated and important in miniaturized devices. Then, we review the work to
accurately capture the RTN, NBTI, and PBTI phenomena. Particular focus is placed
on the approaches that utilize on-chip circuits to enhance the accuracy and fidelity
of the measurements. Finally, the characterization of statistical degradation will be
reviewed with examples.
BTI is considered to be one of the major limiting factors for long-term performance
and reliability of semiconductor devices [25, 30].
NBTI and PBTI degradations are observed in pMOS and nMOS transistors,
respectively. These phenomena are observed as a gradual increase in threshold
voltage during circuit operation. The threshold voltage increase is accelerated under
a strong vertical electric field and under an elevated temperature. It is also known
that the increased threshold voltage will be partially recovered once the stress
condition of the electric field or temperature is alleviated. Hence, their models
naturally become dependent on history. The prediction of a device parameter at a
certain time in the future has been found to be very difficult.
Physical mechanisms of threshold voltage shifts due to NBTI have long been
under debate. As device dimensions decreases, there is an urgent need for develop-
ing a model that is physically correct and thus predictable for both degradation and
5 On-Chip Characterization of Statistical Device Degradation 71
are approximately the sizes of logic LSIs is thus critically important. Additionally,
note that as device size decreases, the measurements become more difficult because
small parasitic elements in the measurement system may significantly alter the
measurement results.
voltage is continuously applied to the device. Hence, the degradation of a device can
be observed without recovery associated with the stress interruption. An ultra-fast
on-the-fly (UFOTF) method improved timing resolution to 1 s [13]. This method
utilizes a current-to-voltage converter to isolate the device of interest from an oscil-
loscope probe. However, these methods require the transconductance of the device
in order to convert the measured current to the corresponding threshold voltage.
Because transconductance may also be subject to variation between devices, error
associated with the conversion may be involved. In addition, the sensitivity of the
threshold voltage to the drain current tends to be very weak in the saturation region,
in which a strong stress bias condition for the device is achieved. In this case, it is
difficult to obtain high accuracy.
In [14], the drain current measurement of a transistor in the weak inversion
region, instead of the saturation region, is proposed to utilize the exponential
sensitivity. In order to overcome a slow settling time due to the extremely small
charging and discharging current of a device in the weak-inversion region, the device
of interest has to be large, and an assist circuit is required. In the literature, it is
reported that a measurement delay as fast as 400 ns has been achieved. Because this
method also relies on the drain current measurement, a conversion is also required
to obtain the threshold voltage by using separately acquired transconductance.
pass gate transistor. To keep track of the degradation of SRAM cells, a ring oscillator
composed of embedded SRAM was proposed in [26].
When measuring BTI phenomena, bias voltages have to be applied for a long time
to observe a detectable degradation in the threshold voltage. The largest obstacle for
statistical BTI characterization is the very long time required for the measurement
of multiple devices.
Upon close look examination of the BTI measurements, it is observed that each
threshold voltage measurement takes up only a very short time as compared to the
stress or recovery period. Most of the measurement period is used for applying
the bias voltages to the device. In such a condition, which is typical in BTI
characterizations, the total period for the measurement of multiple devices will be
significantly shortened by the overlapping stress periods of different devices.
A straightforward approach is to conduct I–V curve measurements for multiple
devices using a high-pin-count probe. In that case, all electrodes of the devices
need to be equipped with individual pad electrodes for probing. The use of a probe
card with a very high pin-count correspondingly requires a large number of SMUs,
which is impractical. Signal synchronizations become more complicated and costly
5 On-Chip Characterization of Statistical Device Degradation 75
Operating
mode MS SA MS
measurement Time
DUT0 stress
DUT1 stress
DUT2 stress
DUT3 stress
DUT4 stress
DUT5 stress
DUT(n−1) stress
A B
Fig. 5.1 Stress period overlapping on n devices. Stress periods with equal durations can be
achieved for all devices
a b
VSS
MEAS sw0
STRS XSEL
VSTR
VSS sw1 DUT
VSTR
YSEL sw4
sw2
DUT
MSEL Control sw sw3
logic cell VREC
cntl
VM
VREC
VM sw6 sw5
VDD VDD
VCC sw7
VCC
DUT unit DUT cell
c
YSEL<7:0>
DUT unit
Y−decoder (3b)
W/L=705/60 W/L=705/120
32 DUTs 32 DUTs
W/L=360/60 W/L=360/120
32 DUTs 32 DUTs
XSEL<15:0>
Address X−decoder (4b)
BTIarray
Fig. 5.2 Circuit structure of BTIarray for a pMOS DUT. (a) The DUT unit consists of “control
logic” and a “DUT cell.” All signals pass through the DUT unit, allowing an array circuit to be
easily formed. Control logic generates switch control signals for the DUT cell. (b) Schematic
diagram of a DUT cell. Switches are realized by pass-gate transistors. (c) Placement of DUT units
in an implementation of a BTIarray [22]. An array of 128 DUTs is formed. Power supply and
control signals are omitted for clarity except for XSEL and YSEL
Switch Control
In the SA mode, stress voltage is applied for a DUT in the DUT cell by closing
switches sw1, sw4, and sw6. Other switches are controlled to be open. The stress
voltage is VSTR minus VDD , where VSTR is more negative than VDD . In the RA
mode, switches sw0, sw3, and sw6 will be closed, and others are left open. VREC is
a recovery voltage and nominally equals VDD , which turns devices off.
In the two measurement modes, MS and MR, a measurement is carried out for a
DUT that is selected by an address signal. Other DUTs that are not selected are all
in either a stress or a recovery condition. DUT selection is specified by XSEL and
YSEL signals, both of which are generated as one-hot signals by the decoders in
78 T. Sato and H. Awano
Fig. 5.2c. Threshold voltage is measured by the constant-current method [16]. The
voltage that appears at node VM is captured during a constant current is applied to
Vcc. Hence, the switches sw0, sw2, sw5, and sw7 are closed in the selected DUT
cell. The switches of other DUT cells that are not selected are equal to those of
the SA mode or RA mode, depending on whether the current mode is MS or MR,
respectively.
The control logic circuit in the DUT unit in Fig. 5.2a can be realized according to
the truth table in Table 5.1. In the BTIarray realization in [22], mode switching
is achieved by the combination of three control signals: “MEAS,” “STRS,” and
“MSEL.” A DUT is selected for the measurement only when both MEAS and MSEL
are “1,” i.e., the array is in one of the measurement modes, and the address of the
DUT is selected. Otherwise, the constant bias voltages that correspond to either the
stress or recovery mode is applied to all the DUTs in parallel.
Advantages of BTIarray
a b
DUT
Control
332.8 μm
128 DUT array
7.4 μm
logic
225μm x 58.6μm
AND
Switches
10.8 μm 489.2 μm
Layout of the DUT unit. Chip microphotograph.
Fig. 5.3 An example implementation of the BTIarray. (a) Layout of the DUT unit, showing
selected layers up to the second level wiring. (b) A total of 128 DUTs are implemented sharing
only 16 probing pads [22]
The implementation and measurement results of the BTIarray for pMOS transistors
were first presented in [22]. Then, in [1], measurement results of BTIarrays for both
pMOS and nMOS transistors were presented. The nMOS array was designed as a
separate array from the pMOS array, but the circuit structure of the two arrays is
similar. The only difference between the two arrays is the switched polarities of the
stress and recovery voltages, and the connection of the DUT.
The specifications of the array design using a 65-nm process are summarized
in Table 5.2. In both arrays, four transistor sizes were implemented as DUTs in
combination with two channel lengths and two channel widths.
The layout diagram of the pMOS DUT unit including the pass-gate switches
and control logic circuit is shown in Fig. 5.3a. The layout size of a DUT unit is
10.8 m 7.4 m. Maximum layout regularities have been pursued by sharing the
single layout skeleton in Fig. 5.3a for four different DUT sizes. As a result, the
layout designs of the switches and the control logic were completely identical except
for the sizes of transistors that were located at the center of the DUT unit. This
regularity minimized layout-dependent device parameter variation. Additionally, it
greatly eased the layout design of the arrays.
80 T. Sato and H. Awano
As observed Fig. 5.2c, the number of DUT units with an equally sized DUT is
32. In these designs, the DUT units are regularly placed to form a 4 8 array. Four
4 8 arrays are again arranged to construct an entire array; thus, 128 DUTs are
implemented in total. A microphotograph of a test chip is presented in Fig. 5.3b.
The layout size of the array including decoders and periphery circuit is 225.2 m
62.3 m, and the total test circuit area including the probing pads is 489.2 m
332.8 m.
Considering the circuit structure of the DUT unit and formation of the entire array
structure, the number of DUTs in an array can be easily scaled. More recently, it has
been reported that 3,996 DUTs have been successfully implemented in an array
circuit by extending the concept of BTIarray without compromising measurement
accuracy [2]. A slight circuit modification was made to suppress leakage current
caused by a large number of off-state pass-gate switches to maintain measurement
accuracy. For that purpose, the concept of a dummy supply voltage [23] is utilized
to eliminate the leakage current. If the dummy supply voltage is not used, the
constant current, which should flow through a selected DUT to develop its threshold
voltage, becomes inaccurate because the current will be reduced significantly by the
leakages.
Fig. 5.4 Instrumentation for the measurement of BTIarray. Stress, recovery, and measurements
can be entirely written in a macro language, BTIscript
Table 5.3 Subset of macro commands realized as functions in Python used in BTIscript
Macro Description
stress_all(duration) All DUTs are biased to stress for duration seconds.
recovery_all(duration) All DUTs are biased to recover for duration seconds.
measure_stress_all(burst) Measure all DUTs in ascending order. Measurement
of a DUT is conducted for burst times while all other
DUTs are biased to stress.
measure_stress_one(adrs, burst) Measure single DUT specified by address adrs. Mea-
surement of a DUT is conducted for burst times while
all other DUTs are biased to stress.
measure_recovery_all(burst) Measure all DUTs in ascending order. Measurement
of a DUT is conducted by burst times while all other
DUTs are biased to recover.
measure_recovery_one(adrs, Measure single DUT specified by address adrs. Mea-
burst) surement of a DUT is conducted by burst times while
all other DUTs are biased to recover.
Parameters in parentheses may have a default value as in a regular Python function. The addition
of new commands that are dedicated to a specific measurement is easy [21]
with the Python language, such as predefined and user-defined classes, subroutine
calls, and flow controls, can be fully utilized.
The scripting language is useful to efficiently develop and debug measurement
scripts because of its user-friendliness. Without such an environment based on
a programming language, lengthy, complicated, and thus error-prone commands
would have to be hand-written. In particular, iterations and procedure calls with
parameters are very useful to represent measurement procedures both formally and
compactly. Macro commands covering typical measurement procedures for BTI
degradation are listed in Table 5.3.
To understand the versatility of the script, let us proceed through an example
script in Fig. 5.8 by comparing the commands to the measurement results, such as
in Fig. 5.9. Other measurement examples can also be found in [21, 27, 29].
Figure 5.5 shows a block diagram of the measurement system. A hand-written
BTIscript is first converted into a pattern file and GPIB commands. The pattern file
defines the transition timings of the control signals, and the commands from GPIB
and triggers from the FPGA-based pattern generator control the parametric analyzer.
The generated pattern file is transferred to the FPGA via a USB interface.
Sequences of bit patterns and output timings are transferred from the host PC via
82 T. Sato and H. Awano
Fig. 5.5 Block diagram of the FPGA-based pattern generator. A 24-bit command output is
generated to control the BTIarray and a parametric analyzer
USB. Each atomic command is formatted in a 64 bit command word. The upper
40 bits and lower 24 bits correspond to the count value and the state of the control
signals, respectively. The count value determines the duration for which the control
signals of a mode are applied. It is used as an initial value of the 40-bit countdown
counter. When the counter reaches zero, the next command word is fetched from the
FIFO queue to reload the counter and the output register.
The GPIB commands setup output voltages, output current, and the measurement
parameters such as integration time. Once the measurement is started, threshold
voltage measurement is conducted by the parametric analyzer according to the
trigger signal generated in the FPGA.
Time (s)
Time (s)
Fig. 5.6 Example threshold voltage measurement of a pMOS device, showing multilevel random
telegraph noise (DUT size: W/L = 360/60 nm/nm)
Ideally, the currents and voltages of the measurement paths should be switched
instantly by the on-chip pass-gate switches. However, in reality, there is a lower
limit on the transition time because of parasitic capacitances in the circuit and the
measurement system. Hence, the output voltage node requires a finite time to settle
to a final value. The settling time of the output voltage of the BTIarray has also been
evaluated in [1] using an oscilloscope. The measured settling time was distributed
between 10 and 60 s for almost all DUTs in the pMOS and nMOS arrays. The
minimum interval of the measurements can be reduced to 100 s or shorter in the
128-DUT array.
Figure 5.7 depicts the scenario for a BTI degradation measurement. The BTIscript
that defines the scenario is also shown in Fig. 5.8. First, the output values of the
voltage sources and current source are set (lines 1–8). Next, measurement timings
are defined for later use (lines 10–12). Then, the threshold voltages of all DUTs
in recovery bias mode are measured to obtain initial threshold voltages for the
DUTs before applying stress (line 21). The measurement interval is set to 10 ms,
which is determined by the minimum interval of the parametric analyzer. This step
84 T. Sato and H. Awano
Fig. 5.8 Script that realizes measurement scenario of Fig. 5.7. Stress, recovery, and measurement
processes are written using macro commands. After voltage sources and current source are
initialized and measurement interval is defined, threshold voltages of DUTs in recovery bias mode
are measured 1,000 times. Then, all DUTs undergo stress for 20,000 s during which time threshold
voltage measurements are conducted intermittently to reduce effect of interrupting stress. Finally,
all DUTs returned to recovery mode, and threshold voltages are measured 1,000 times
5 On-Chip Characterization of Statistical Device Degradation 85
requires 112810 ms D 1:28 s in total. Note that, during these measurements, the
DUTs that are not selected for measurement are kept in recovery bias mode to avoid
degradation. After that, all DUTs enter stress bias mode (lines 22–24). Stress is
continuously applied for 20,000 s. Measurements are intermittently conducted
during the stress period for all DUTs in series at 1, 2, 5, 10, 20, 50, 100, 500,
1,000, 5,000, 10,000, 12,000, 15,000, 18,000 and 20,000 s. The macro command
“measure_stress_all” is repeatedly issued by the “for” statement according to the
list of measurement intervals. Then, all DUTs return to recovery bias mode, and
measurements are conducted for all DUTs 1,000 times (lines 25–26). Repetitions
of 1,000 are realized by issuing the macro command “measure_recovery_all” in a
for loop.
The upper part of Fig. 5.7 shows the detailed timing of the control signals used
for acquiring a threshold voltage of the selected DUT in the BTIarray. First, a
“MEAS” signal is asserted. Then, the measurement trigger is issued after 100 s
have elapsed. This timing was determined according to the settling time of the output
node voltage. This ensures that the parametric analyzer starts an integration of VM
after the node voltage has been stabilized.
The threshold voltage measurement is based on the constant current method [16]
in which the gate voltage of the DUT is switched from VSTR to VSS when the DUT
is under measurement. The source voltage becomes the threshold voltage of the
selected DUT, which reduces the stress on the DUT under measurement. Because
the change in the threshold voltages contains a very fast component, partial recovery
may occur during the measurement. To minimize this recovery, the “MEAS” signal
is negated immediately after the sampling time required by the parametric analyzer
has elapsed.
Examples of the measured threshold voltage shift are shown in Fig. 5.9. The results
for two DUT sizes are presented: W/L = 360 nm/60 nm and 705 nm/120 nm. Ten
traces were randomly selected from each size. In the 20,000 s stress period, the
threshold voltage increased for all DUTs, but the traces of the increases differed
among DUTs. The variation in the threshold voltage increases was larger for DUTs
with a smaller channel area. Because the variation is expected to increase as the
transistor size decreases, statistical representations of the degradation parameters
are required.
In addition, a nonmonotonic increase in the threshold voltage was observed
for almost all of the DUTs. Although its cause is still not understood, there are
two possible hypotheses. The first hypothesis is recovery during measurement. As
described above, the measured DUTs experience less stress voltage than the other
fully stressed DUTs. This may lead to device-dependent partial recovery during
measurement.
86 T. Sato and H. Awano
a b
Fig. 5.9 Temporal changes in the threshold voltage (pMOS array) for ten randomly selected
DUTs. Variation in threshold voltage increases was larger for smaller-area DUTs. In the recovery
period, discrete changes in threshold voltage increases were observed for most of the DUTs. The
discrete changes are more distinct for smaller-area DUTs. (a) DUT size: W/L=360 nm/60 nm.
(b) DUT size: W/L=705 nm/120 nm
The other hypothesis is the interaction between carrier traps and emissions during
the stress period. The nonmonotonic increase may have been caused by the carrier
trap and emission process at interface defects, which is considered to be the major
mechanism of BTI degradation [10, 31]. This is a stochastic process; thus, the
fluctuation in the threshold voltage is observed at any time.
The time courses of threshold voltage recovery are magnified to the right of the
figures. The discrete recovery of the threshold voltages was more prominent for
the DUTs with a small channel area. Smoother recovery was again observed for
the DUTs with a larger channel area. This trend held true for the reminder of the
DUTs. The origin can also be explained by the averaging effect of carrier trap and
emission process. Under an assumption of uniform defect density over all areas of
the measured chip, a larger number of interface traps should be found in the DUTs
of a larger channel area than in those of a smaller channel area. The trapping and
emission of carriers occur more often in the larger DUTs, and the threshold voltage
shift caused by the trap of a single carrier is smaller than that of the smaller DUTs.
To investigate the stair-like changes in more detail, the response to the stress
and recovery bias was repeatedly measured for the DUT with a small channel area.
A stress with a duration of 1,000 s was first applied, and then, recovery mode
was applied for 1,280 s, during which 1,000 measurements were conducted for
each DUT. This set of stress-recovery measurements was repeated 100 times to
characterize the stochastic recovery process. The upper figure in Fig. 5.10 shows the
recovery period of the threshold voltage shift. The results of 30 trials, which are
again randomly selected from 100 trials, are presented.
In the recovery period, the threshold voltage shifts of an equal magnitude were
found throughout different repetitions. The magnitude of the threshold voltage
shift was determined by the defect location in the channel. Hence, the defects
could be identified by the magnitude. This characterization method is known as
time dependent defect spectroscopy (TDDS) [9]. The associated TDDS plot is also
5 On-Chip Characterization of Statistical Device Degradation 87
VTH (a.u.)
measurements. The results of #1 #0
30 trials randomly selected 20
from 100 trials are presented. #0
The corresponding TDDS 10
#1 #0
plot is shown in the lower #0
graph [1]
0
0 100 200 300 400
Time (s)
10
8
ΔVTH (a.u.)
6
Defect #0
4
Defect #1
2
0 100 200 300 400
Emission time (s)
a b
Fig. 5.11 Temporal changes in threshold voltage for ten randomly selected DUTs, measured for
an nMOS array. Both the degradation and recovery occurred in a very short time
shown in Fig. 5.10. There are two major clusters of defects. The emission time
constant of Defect #0 assumes a value of approximately 10 to 50 s, whereas that of
Defect #1 distributes wider in the range of 100 to 400 s. Characterizing the range and
distribution of the emission time is important in device lifetime extension methods,
such as a circuit that alternately uses one of two identical circuits [15].
Similarly, Fig. 5.11 shows the results of the nMOS array obtained using the same
measurement scenario as the pMOS array. The DUT set is again randomly selected.
By comparing the graphs with those in [1], readers can find the similar trend of
88 T. Sato and H. Awano
the temporal threshold voltage changes for nMOS DUTs. Although smaller than
that of pMOS DUTs, the threshold voltage of an nMOS DUT is increased. The
threshold voltage shift seemingly reached a plateau relatively quickly within 5,000 s,
and remained almost constant for the reminder of the stress time. In this device,
nMOS DUTs seemingly lack “slow trap” components [18]. In addition, regardless
of the device size, the threshold voltages of most of the DUTs recovered almost
completely once the stress was removed, which is different from the case of the
pMOS DUTs.
As we have observed in Fig. 5.9, there was a large difference in the threshold voltage
shift even for DUTs of equal channel-area size. The threshold voltage increase
during a stress period has been commonly modeled using a power law model in
which the stress period and threshold voltage increase are related by Vth / t n ,
where t is the stress period, and n is the power law exponent that varies between
transistors.
The distributions of the power law exponent for 720 DUTs (180 DUTs for each
size) measured using five pMOS arrays at 125 ı C are depicted in Fig. 5.12. The
a b
15 15
Sample quantile
Sample quantile
0.4 0.4
μ = -1.99 0.3 μ = -2.05 0.3
σ = 0.47 σ = 0.35
Frequency
Frequency
0.2 0.2
10 0.1
10 0.1
0.0 0.0
0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4
Model quantile Model quantile
5 5
0 0
0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4
n n
c 15 d 15
Sample quantile
0.4 0.4
Sample quantile
Frequency
0.2 0.2
10 0.1
10 0.1
0.0 0.0
0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4
Model quantile
5 5 Model quantile
0 0
0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4
n n
Fig. 5.12 Measured distributions of power law exponent and their log-normal approximations
for four sizes of DUTs [1]. The channel-area dependency of the variation is evident.
(a) W/L D 360 nm/60 nm. (b) W/L D 360 nm/120 nm. (c) W/L D 705 nm/60 nm. (d) W/L D 705
nm/120 nm
5 On-Chip Characterization of Statistical Device Degradation 89
5 W/L=360/60 5 W/L=360/60
W/L=705/60 W/L=705/60
4 W/L=360/120 4 W/L=360/120
(a.u.)
(a.u.)
W/L=705/120 W/L=705/120
3 3
2 2
1 1
0 0
0 1 2 3 4 5 Probability
Initial VTH (a.u.)
Fig. 5.13 Correlations between threshold voltages of fresh DUTs and the threshold voltage shift
after a stress with a duration of 20,000 s [1]. The approximated distributions of the threshold
voltage shift are also plotted
variance in the exponent decreased as the DUT size increased. All distributions
are approximated by a log-normal distribution, which is indicated by a solid line.
The two shape parameters of each distribution, and , are also provided. An
average of the log-normal distribution , which represents the logarithmic mean
of the power-law exponent, was approximately 2 regardless of the DUT size.1 The
inset graph shows quantile-quantile plot for the log-normal approximation.
The standard deviation of the power-law exponent increases as the channel
area of the DUTs decreases. The standard deviation of the power-law exponent
is inversely proportional to the channel area of a device. The inverse dependency
between the device channel area and exponent variance has also been reported in [2].
The exponent variation increases at a faster rate than the threshold voltage, whose
variation is inversely proportional to the square root of the channel area.
The accurate consideration of correlations between device parameters is an
effective way to avoid overdesign. Figure 5.13 shows a scatter plot between
threshold voltages of fresh DUTs and threshold voltage shifts after a stress period
of 20,000 s. The results of the four DUT sizes are shown. For the transistors of these
sizes, initial threshold voltages and the threshold voltage shifts fit normal and log-
normal distributions, respectively. As expected, no correlation was found between
the initial threshold voltage and its shift because the variation sources are different
for these variations.
1
D 2 approximately corresponds to an exponent n D 0:15.
90 T. Sato and H. Awano
The time to complete the scenario in Fig. 5.7 was approximately 5 h and 55 min
whereas that for a single DUT was approximately 5 h and 33 min. If a single DUT
measurement is serially conducted for 128 DUTs, the measurement would require
approximately 710 h, which is 29.6 days, even when the time for moving the probe
needles to the next DUT is ignored. Considering that 128 DUTs are integrated in an
array, BTIarray has thus achieved an increase in speed of a factor of 120 with this
measurement scenario, enabling month-long measurements to be completed within
a day. An ideal increase in speed of a factor of 128 could not be achieved because
the time required for threshold voltage measurements is finite. The parametric
analyzer requires 10 ms for the voltage measurement. Therefore, the acquisitions
of the threshold voltages for 128 DUTs for a total of 1,018 measurements require
approximately 22 min. The use of a faster SMU shall further shorten the total
measurement time.
5 Chapter Summary
proportional to the channel area. (3) Most of the measurement results involve a stair-
like recovery trace, which suggests the interface trap charge is playing a key role in
the threshold voltage shift.
Acknowledgements Part of this work has been supported by MEXT/JSPS KAKENHI Grant
Numbers 22360143 and 20431992.
References
1. Awano, H., Hiromoto, M., Sato, T.: BTIarray: A time-overlapping transistor array for efficient
statistical characterization of bias temperature instability. IEEE Trans. Dev. & Mat. Reliab.
(2014)
2. Awano, H., Hiromoto, M., Sato, T.: Variability in device degradations: Statistical observation
of NBTI for 3996 transistors. In: Proc. European Solid-State Dev. Res. Conf. (ESSDERC)
(2014)
3. Campbell, J., Lenahan, P., Krishnan, A., Krishnan, S.: Observations of NBTI-induced atomic-
scale defects. IEEE Trans. Dev. & Mat. Reliab. 6(2), 117–122 (2006)
4. Denais, M., Parthasarathy, C., Ribes, G., Rey-Tauriac, Y., Revil, N., Bravaix, A., Huard, V.,
Perrier, F.: On-the-fly characterization of NBTI in ultra-thin gate oxide PMOSFET’s. In: IEDM
Tech. Dig., pp. 109–112 (2004)
5. Du, G., Ang, D., Teo, Z., Hu, Y.: Ultrafast measurement on NBTI. IEEE Trans. Electron Device
Lett. 30(3), 275–277 (2009)
6. Fujimoto, S., Islam, A.M., Matsumoto, T., Onodera, H.: Inhomogeneous ring oscillator for
within-die variability and RTN characterization. IEEE Trans. Semicond. Manufact. 26(3),
296–305 (2013)
7. Ghosh, A., Rao, R.M., Brown, R.B., Chuang, C.T.: On-chip negative bias temperature
instability sensor using slew rate monitoring circuitry. In: ACM IEEE Intl. Symposium on
Low Power Electronics and Design (2009)
8. Grasser, T., Kaczer, B., Goes, W.: An energy-level perspective of bias temperature instability.
In: Proc. Intl. Reliab. Phys. Symp. (IRPS), pp. 28–38 (2008)
9. Grasser, T., Reisinger, H., Wagner, P., Schanovsky, F., Goes, W., Kaczer, B.: The time
dependent defect spectroscopy (TDDS) for the characterization of the bias temperature
instability. In: Proc. Intl. Reliab. Phys. Symp. (IRPS), pp. 16–25 (2010)
10. Huard, V., Denais, M.: Hole trapping effect on methodology for DC and AC negative bias
temperature instability measurements in pMOS transistors. In: Proc. Intl. Reliab. Phys. Symp.
(IRPS), pp. 40–45 (2004)
11. Kim, J.J., Rao, R., Schaub, J., Ghosh, A., Bansal, A., Zhao, K., Linder, B., Stathis, J.:
PBTI/NBTI monitoring ring oscillator circuits with on-chip Vt characterization and high
frequency AC stress capability. In: Proc. VLSI Technology symposium, pp. 224–225 (2011)
12. Kim, T., Persaud, R., Kim, C.: Silicon odometer: An on-chip reliability monitor for measuring
frequency degradation of digital circuits. IEEE J. Solid-State Circ. 43(4), 874–880 (2008)
13. Kumar, E., Maheta, V., Purawat, S., Islam, A., Olsen, C., Ahmed, K., Alam, M., Mahapatra, S.:
Material dependence of NBTI physical mechanism in silicon oxynitride (SiON) p-MOSFETs:
A comprehensive study by ultra-fast on-the-fly (UF-OTF) iDLIN technique. In: IEDM Tech.
Dig., pp. 809–812. IEEE (2007)
14. Matsumoto, T., Makino, H., Kobayashi, K., Onodera, H.: A 65 nm complementary metal-
oxide-semiconductor 400 ns measurement delay negative-bias-temperature-instability recov-
ery sensor with minimum assist circuit. Japan. J. Appl. Phys. 50(4), 04DE06 (2011)
15. Matsumoto, T., Makino, H., Kobayashi, K., Onodera, H.: Multicore large-scale integration
lifetime extension by negative bias temperature instability recovery-based self-healing. Japan.
J. Appl. Phys. 51, 04DE02 (2012)
92 T. Sato and H. Awano
16. Ortis-Conde, A., García Sánchez, F.J., Liou, J.J., Cerdeira, A., Estrada, M., Yue, Y.: A review
of recent MOSFET threshold voltage extraction methods. Microelec. Reliability 42, 583–596
(2002)
17. Rauch, S.E.: Review and reexamination of reliability effects related to NBTI-induced statistical
variations. IEEE Trans. Dev. & Mat. Reliab. 7(4), 524–529 (2007)
18. Reisinger, H., Blank, O., Heinrigs, W., Gustin, W., Schlunder, C.: A comparison of very fast
to very slow components in degradation and recovery due to NBTI and bulk hole trapping to
existing physical models. IEEE Trans. Dev. & Mat. Reliab. 7(1), 119–129 (2007)
19. Reisinger, H., Blank, O., Heinrigs, W., Muhlhoff, A., Gustin, W., Schlunder, C.: Analysis of
NBTI degradation- and recovery-behavior based on ultra fast VT-measurements. In: Proc. Intl.
Reliab. Phys. Symp. (IRPS), pp. 448–453 (2006)
20. Reisinger, H., Grasser, T., Gustin, W., Schlunder, C.: The statistical analysis of individual
defects constituting NBTI and its implications for modeling DC-and AC-stress. In: Proc. Intl.
Reliab. Phys. Symp. (IRPS), pp. 7–15 (2010)
21. Sato, T., Awano, H., Shimizu, H., Tsutsui, H., Ochi, H.: Statistical observations of NBTI-
induced threshold voltage shifts on small channel-area devices. In: Proc. Intl. Symp. Quality
Electronic Design (ISQED), pp. 306–311. IEEE (2012)
22. Sato, T., Kozaki, T., Uezono, T., Tsutsui, H., Ochi, H.: A device array for efficient bias-
temperature instability measurements. In: Proc. European Solid-State Dev. Res. Conf.
(ESSDERC), pp. 143–146 (2011)
23. Sato, T., Ueyama, H., Nakayama, N., Masu, K.: Accurate array-based measurement for
subthreshold-current of MOS transistors. IEEE J. Solid-State Circ. 44(11), 2977–2986 (2009)
24. da Silva, M., Kaczer, B., Van der Plas, G., Wirth, G., Groeseneken, G.: On-chip circuit for
massively parallel BTI characterization. In: IEEE Intl. Integrated Reliability Workshop Final
Report (IRW), pp. 90–93 (2011)
25. Stathis, J.H., Zafar, S.: The negative bias temperature instability in MOS devices: A review.
Microelec. Reliability 46(2–4), 270–286 (2006)
26. Tsai, M.C., Lin, Y.W., Yang, H.I., Tu, M.H., Shih, W.C., Lien, N.C., Lee, K.D., Jou, S.J.,
Chuang, C.T., Hwang, W.: Embedded SRAM ring oscillator for in-situ measurement of NBTI
and PBTI degradation in CMOS 6T SRAM array. In: VLSI Design, Automation, and Test
(VLSI-DAT), 2012 International Symposium on, pp. 1–4. IEEE (2012)
27. Velamala, J., Sutaria, K., Shimuzu, H., Awano, H., Sato, T., Wirth, G., Cao, Y.: Logarithmic
modeling of BTI under dynamic circuit operation: Static, dynamic and long-term prediction.
In: Proc. Intl. Reliab. Phys. Symp. (IRPS), pp. CM.3.1–CM.3.5 (2013)
28. Velamala, J.B., Sutaria, K.B., Sato, T., Cao, Y.: Aging statistics based on trapping/detrapping:
Silicon evidence, modeling and long-term prediction. In: Proc. Intl. Reliab. Phys. Symp.
(IRPS), pp. 2F2.1–2F2.5 (2012)
29. Velamala, J.B., Sutaria, K.B., Sato, T., Cao, Y.: Physics matters: statistical aging prediction
under trapping/detrapping. In: Proc. ACM/IEEE Design Automation Conf. (DAC), pp. 139–
144 (2012)
30. Wang, W., Yang, S., Bhardwaj, S., Vrudhula, S., Liu, F., Cao, Y.: The impact of NBTI effect on
combinational circuit: Modeling, simulation, and analysis. Trans. VLSI 18(2), 173–183 (2010)
31. Wirth, G., da Silva, R., Kaczer, B.: Statistical model for MOSFET bias temperature instability
component due to charge trapping. IEEE Trans. Electron Dev. 58(8), 2743–2751 (2011)
32. Zafar, S., Callegari, A., Gusev, E., Fischetti, M.V.: Charge trapping related threshold voltage
instabilities in high permittivity gate dielectric stacks. J. Appl. Phys. 93(11), 9298–9303 (2003)
Chapter 6
Compact Modeling of BTI for Circuit
Reliability Analysis
Abstract The aging process due to Bias Temperature Instability (BTI) is a key
limiting factor of circuit lifetime in contemporary CMOS design. Threshold voltage
shift induced by BTI is a strong function of stress voltage and temperature.
Furthermore, BTI consists of both stress and recovery phases, depending on the
dynamic stress conditions. This behavior poses a unique challenge for long-term
aging prediction for a wide range of stress patterns encountered in today’s circuits.
Traditional approaches usually resort to an average, constant stress waveform
to simplify the lifetime prediction. They are efficient, but fail to capture the
reality of circuit operation, especially under Dynamic Voltage Scaling (DVS) or
in analog/mixed signal designs where the stress waveform is much more random. In
this chapter, we present a suite of modeling solutions that enable aging simulation
under all dynamic stress conditions. The key innovation of this chapter is to develop
compact models of BTI when the stress voltage is varying. The results cover
the underlying physics of two leading mechanisms, Reaction–Diffusion (R–D)
and Trapping/Detrapping (T–D). Moreover, silicon validation of these models
is performed at 45 and 65 nm technology nodes, at both device and circuit
levels. Leveraging the newly developed BTI models under DVS and random input
waveforms, efficient aging simulation is demonstrated in representative digital and
analog circuits. Our proposed work provides a general and comprehensive solution
to circuit aging analysis under random stress patterns.
1 Introduction
Aggressive scaling of CMOS technology has been driving the evolution of elec-
tronics and the increase in the total number of transistors per chip. Moore’s law
predicts that the total number of transistors placed on a chip will be approximately
doubled every 18 months. Benefiting from the scaling of CMOS technology, system-
on-chips (SoCs) further improve the system capability by integrating both analog
and digital circuits on a single platform. Present SoCs pack close to 800 million
transistors incorporating analog and mixed signal units such as data converters (A/D,
D/A), PLLs, etc., providing higher performance at a lower cost.
On the other side, aggressive CMOS scaling results in serious reliability threats
such as Bias Temperature Instability (BTI), Channel Hot Carrier (CHC) and Time
Dependent Dielectric Breakdown (TDDB) [1–10]. These reliability effects become
more pronounced partially due to process scaling techniques that improve device
and circuit performance. Moreover, scaling of supply voltage and threshold voltage
does not go hand in hand with scaling of device feature size. This greatly increases
vertical and lateral electrical field causing the exacerbation of reliability issues. As
an aftermath of these issues, circuit performance degrades over time and eventually
results in functionality error, which is called circuit aging.
Threshold voltage shift (
Vth ) due to negative BTI (NBTI) in PMOS devices
is the dominant aging mechanism in scaled digital and analog circuits. Accurate
characterization of the aging rate is essential to lifetime evaluation and guard
banding [11–24]. One major challenge in reliability prediction of NBTI comes
from the recovery phase, which causes the dependence on the stress pattern and
its history. The situation is more complex under Dynamic Voltage Scaling (DVS)
in low power digital design. Analog/mixed signal circuits face similar challenges
due to random input pattern. Modulation of voltage levels due to such operations
results in partial recovery of Vth and thus, the average or the worst case pattern does
not provide an accurate prediction of the lifetime. Figure 6.1 presents the operation
pattern in a 65 nm dual core Intel processor and
Vth under such an operation.
1.00
1.0
Simulated ΔVth
ΔVth (a.u.)
0.5
ΔVth from random DVS waveform
ΔVth using the averaged pattern
0.0
0.0 5.0x104 1.0x105
Time (s)
Fig. 6.1 4Vth predicted from the averaged voltage significantly deviates from that simulated by
dynamic aging models
6 Compact Modeling of BTI for Circuit Reliability Analysis 95
When VDD fluctuates, the degradation is strongly non-linear and varies according
to the stress condition. The average value of the stress voltage predicts a lower
or higher Vth shifts, as compared to that by the real-time model. Therefore, real-
time prediction of Vth shift allows a realistic and accurate assessment of circuit
aging, avoiding over-design to compensate the degradation. Overall, it is critical
to understand NBTI with recovery effects, simulate its effect under dynamic stress
voltages, and ensure reliable circuit operation for the desired lifetime.
To date, research work on aging mechanisms has mainly focused on device and
reliability physics [24–31]. Proprietary efforts also exist in leading industrial com-
panies to develop their own reliability models and tools [32–35]. Such knowledge
needs to be propagated into circuit design and CAD tools to assess the impact of
device degradation on various circuit performance metrics. As BTI is the dominant
aging mechanism, the accuracy and efficiency of BTI models directly affect the
quality of reliability analysis.
In a contemporary SoC, a wide variety of circuit operation patterns co-exists.
The diversity of circuit operation presents a unique challenge to predict the lifetime
under the BTI effect: A device operating under constant stress voltage needs a
static aging model; different from static stress, devices under AMS operation may
experience totally random stress during their lifetime. For digital circuits, they
employ DVS extensively to balance the workload and power consumption. Under
such operation, a static model is insufficient to accurately determine circuit aging.
Therefore, a model which can analyze aging under any random input stress pattern
is necessary.
Different from AMS circuits, large-scale logic designs encounter periodic inputs
with different duty cycles and frequency. Although a random stress model can
handle such situations, long-term aging simulation is not efficient and requires
expensive simulation time. To improve the efficiency, a long-term BTI model is
preferred which predicts an upper bound of threshold voltage shift for a periodic
input. In summary, a set of new BTI models are needed to improve circuit reliability
prediction in both VLSI and AMS design under dynamic operations.
In reliability physics, NBTI was first assumed to be induced by the Reaction–
Diffusion (R–D) mechanism, where holes initiate the breaking of Si–H bonds
(reaction phase) at the silicon substrate/oxide interface [4, 28, 31]. The broken
hydrogen species diffuse away from the interface into gate oxide and the poly-
silicon gate (diffusion phase) and thus, interface charges are generated. This
mechanism leads to the classical power-law model of Vth shift. More recent research
suggests a role of hole Trapping/Detrapping (T–D) for Vth shift of a device under
stress [36–51]. According to the T–D theory, if the trap gains sufficient energy it may
capture a charge carrier, thus reducing the number of available carriers in the channel
and modulating the threshold voltage of a device. A logarithmic time dependence
96 K.B. Sutaria et al.
of Vth shift is observed under T–D as opposed to the power-law behavior. There are
research works which have suggested a combination of R–D and T–D. Whether T–
D or R–D dominates the Vth shift behavior depends on the fabrication technology.
For reliable circuit design, it is important to have an access to both R–D and T–D
based aging models in order to accurately determine the guard-band of a design. In
this chapter, we present a set of BTI models based on R–D and T–D theory which
can predict threshold voltage shift under static, random and long-term stress input.
Such a set of aging models provides an accurate and efficient solution for reliable
design.
The primary impact of BTI at the device level is the gradual increase in transistor
Vth , whereas the degradation of other device parameters are less pronounced. Since
the threshold voltage directly affects the delay of a digital gate, operating frequency
of a logic path decreases temporally. Similarly, in AMS circuits, increase in Vth
degrades the gain, trans-conductance and other performance metrics. To estimate
circuit aging rate, the fundamental step is to model device Vth shift under given
stress voltage and temperature. In this section, compact models based on both R–D
and T–D mechanisms are presented.
Two prevalent theories, Reaction–Diffusion and Trapping/Detrapping, are pro-
posed to explain BTI. Figure 6.2 shows the cross section of a PMOS device which
shows the difference between these mechanisms. R–D is a two-step process namely:
Reaction and Diffusion. According to the R–D theory, the stress voltage causes
covalent bonds (Si–H) at interface to break, which is Reaction. In the Diffusion
step the broken hydrogen atoms combine to form H2 , which diffuses towards gate.
For current thin-oxide devices, diffusion in poly-gate dominates the incremental
behavior of Vth shift. The interface states left at Si–SiO2 due to H2 diffusion increase
the threshold voltage. This leads to a power law relation (tn ) with time exponent
(n) 1/6, which should be independent of process parameters. 4Vth exponentially
depends on the stress voltage and temperature [28, 31].
According to the T–D theory shown in Fig. 6.2b, there exist a number of
defect states with different energy levels, and capture and emission time constants.
Threshold voltage increases when a trap captures a charge carrier from the channel
a b
D p+ D p+
ND
xxxxxx
Channel
G H2 Nit G
Fig. 6.2 Two mechanisms Carriers
for BTI: (a) Reaction
Diffusion (RD), and (b) S p+ S p+
Trapping/Detrapping (TD)
6 Compact Modeling of BTI for Circuit Reliability Analysis 97
of a MOS device. Reduced number of channel carriers also causes drain current
to decrease. The probability of trapping and detrapping is a function of capture and
emission time constants, respectively. The gradual change in the number of occupied
traps results in a logarithmic time evolution of Vth shift, different from the power law
behavior in R–D. The stress voltage and temperature has an exponential impact on
the degradation rate, similar as that in the R–D theory [47].
There are two critical steps based on the Reaction–Diffusion theory. Reaction: Si–H
or Si–O bonds at the substrate/gate oxide interface are broken under the electrical
stress [16–18]. The interface charges are induced in gate oxide and further in poly-
gate, which cause the increase of Vth . Given the initial concentration of the Si–H
bonds (N0 ) at the interface and the concentration of inversion carriers (P), the
generation rate of the interface traps is given by [28]:
dNI T
D kf .N0 NI T / P kr NH NI T (6.1)
dt
where, kf and kr are forward and reverse reaction rates. The generation rate is
exponentially dependent on the stress voltage and temperature. During the initial
phase of stress period, the trap generation is slow with respect to time. Thus dNIT /dt
0 and NIT < < N0 , reducing Eq. (6.1):
kf
NH NI T D N0 P (6.2)
kr
dNH2 d 2 NH2
DC (6.3)
dt dx 2
C is the diffusion constant for hydrogen molecules in poly–Si, which depends on
the activation energy. To solve differential Eqs. (6.1) and (6.3) to derive a compact
model, an approximate profile of H2 concentration in gate oxide and poly–Si is
assumed, as shown in Fig. 6.3. NH2 is the concentration of hydrogen p molecules at
distance ‘x’ from Si–SiO2 interface at time (t), where x is given by C t. Diffusion
98 K.B. Sutaria et al.
Z
x.t/
1 1 p
2 .1 C ı/ NH2 .0/ tox C NH2 .0/ C t (6.6)
2 2
NH2 (0) is H2 concentration at Si–SiO2 interface while ı NH2 (0) is the density at
Si–Poly interface. Finally, using NH2 D kh NH 2 , NIT can be represented as
NH2(t)
NH2(0)
Poly-Silicon
p ! 23
kh kf N 0 P p 13
NI T D .1 C ı/ tox C C t (6.7)
kr
where, inversion hole density P D Cox (Vgs Vth ). Based on the interface charges,
threshold voltage shift can be derived as: 4Vth D qNIT /Cox . This R–D based model
predicts the Vth shift under any given constant stress voltage, temperature and time.
The final form of the degradation is given as:
p 2n
qNI T
Vth D Cox
D A .1 C ı/ tox C C t
r (6.8)
qtox
AD "ox
1=2n
K2 Cox Vgs Vth exp 2Eox
E0
Several works have presented the evidence of the T–D mechanism, such as the
discrete Vth shift, especially during the recovery phase in NBTI observed with the
fast measurement techniques [52, 53]. Figure 6.4 illustrates the physical picture
of T–D: when a negative bias voltage is applied to the gate of a PMOS device,
the trap energy (relative to the Fermi energy level) is modulated. If the trap gains
sufficient energy, it may capture a charge carrier, thus reducing the number of
available carriers in the channel [38]. The charged trap state modulates the local
Vth and acts as a scattering source, reducing the effective mobility [38]. Faster traps
(with shorter time constants) having a higher probability of capturing carriers; the
occupation probability increases with voltage and temperature. Trapping and de-
trapping events are stochastic in nature and hence a compact model is based on the
statistics of trap properties.
The basic modeling assumptions based on T-D theory are the same as the ones
used in modeling of low-frequency noise, since the charge trapping dynamics
(capture and emission time statistics) that contribute to the degradation of device
performance over time is similar [40–44]. The main assumptions of the trap
properties are:
• The number of traps follows a Poisson distribution, which is common for a
discrete process.
• Capture and emission time constants are uniformly distributed on the logarithmic
scale. This microscopic assumption is critical to derive the logarithmic time
evolution at the macro scale.
100 K.B. Sutaria et al.
where p01 (dt) D 1/ c and p11 (dt) D 1 p10 (dt) D 1/ e . Integrating it from t0 to t:
eq
P01 .t C t0 / D 1 e t=eq C P01 .t0 / e t=eq (6.10)
c
where 1/ eq D 1/ c C 1/ e . c, e are random in nature, representing capture and
emission time constants respectively, and dependent on bias point and temperature.
The values are determined by [38]:
c D 10p .1 C e q / (6.11)
e D 10p 1 C e Cq (6.12)
where p 2 [pmin , pmax ]. pmin and pmax define the time constants for fastest and
slowest traps respectively (pmin 1 and pmax > 10). This assumption of the existence
of defects with wide distribution of time constants is in line with recent NBTI
data [38]. Since p is assumed to be uniformly distributed, the characteristic time
constants are uniformly distributed on logarithmic scale. The parameter q is given
by (ET EF )/kT, where ET is the trap energy and EF is the Fermi energy level.
The trap energy (relative to Fermi energy) is a function of applied electric field.
Consequently, c and e are dependent on voltage and temperature.
The occupation probability of the trap at time t, assuming that it is under
constant stress from time t0 D 0 is obtained by substituting P01 (0) D 1 P01 (0) D 0
in Eq. (6.10), integrating P01 and multiplying with the number of available traps,
the average number of occupied traps obtained by substituting the logarithmic
distribution of time constants, and the U shaped distribution of trap energies:
EZ
T max 10p
Z max t
N g .ET / dET e u 1
n.t / D EF
: du (6.13)
ln 10 .pmax pmin / 1 C exp ETkT u
0 10p min t
where g(ET ) is the trap energy distribution, and pmin and pmax represent fast and slow
traps, respectively. The trap energy, ET changes as a function of electric field (Eox ).
Assuming pmin 1 and pmax > 10, and ET 1/Eox ,
6 Compact Modeling of BTI for Circuit Reliability Analysis 101
DVth (a.u.)
0.8
1.3V, 30°C
0.4
1.1V, 30°C
4x103 8x103
Time (s)
N ˇVg E0
n.t / D exp exp ŒA C B log 10p max t
Equation (6.14) describes the aging under a constant stress voltage and temperature.
Similar as previous R–D model, it is an exponential function of the stress voltage,
temperature and Tox . Furthermore, it has a statistical nature with N, an index for the
number of traps per device. For the simplicity, Eq. (6.14) is written as:
Vth .t / D ŒA C B log .1 C C t /
(6.15)
Equations (6.8) and (6.15) represent Vth shift based on R–D and T–D theory
respectively. Aging prediction of these models is compared with 45 nm silicon
data in Fig. 6.5. Within the range of data fluctuations, both models match well with
measurement under the static stress. The exponential dependence on stress voltage
and temperature are correctly captured by both aging models.
Today’s circuits typically have a reduced activity factor (or duty cycle) and dynamic
voltage scaling (DVS), to reduce power consumption. Therefore, a significant
portion of the operation is under lower supply voltage, resulting in large recovery.
Since the degradation is highly sensitive to the stress voltage, DVS leads to different
amounts of circuit aging. For a random stress pattern, it is necessary to derive
voltage dependent recovery to accurately predict Vth shift.
102 K.B. Sutaria et al.
The hydrogen atoms that are generated during the stress phase will recover if the
stress voltage is removed completely. Atoms close to Si–SiO2 interface anneal the
broken Si–H bond while atoms deep in Poly-gate continue to diffuse away, leading
to incomplete recovery of Vth . Approximate profile of hydrogen species is shown in
Fig. 6.6. Hydrogen atoms diffuse quickly in oxide and anneal some of Si–H bonds
in a short time. If a stress voltage is removed after time (t1 ), the interface charges
generated at the given time are NIT (t1 ). The total number of charges to be annealed
is given by NA IT (t). Interface charges at a given time t is given by
as shown in Fig. 6.6; the number of annealed traps can be divided in to two parts:
(1) recombination of H2 in oxide, and (2) back diffusion of H2 in poly-gate [31].
Thus we have
1p
NI T .t / D 2 1 te C
A
2 C .t t1 / NH2 .0/ (6.17)
2
In Eq. (6.17) above, 1 and 2 are the back diffusion constants. From Eqs. (6.7) and
(6.17), we get
p !
21 te C 2 C .t t1 /
NIAT .t / D NI T .t / p (6.18)
.1 C ı/ tox C C t
Substituting Eq. (6.18) in Eq. (6.16), simplifying equations and using 4Vth D qNIT /
Cox , we get the recovery model as:
p !
21 te C 2 C .t t1 /
Vth .t / D Vth .t1 / 1 p (6.19)
.1 C ı/ tox C C t
NH2(t)
Poly-Silicon
NAH2(0)
Fig. 6.7 Approximate diffusion profile of hydrogen atoms in PMOS under dynamic stress voltage:
(a) from high to low and (b) from low to high
The equation above represents Vth recovery when stress voltage is completely
removed. This model is not applicable to the Vth behavior when the stress voltage
is lowered or increased. Under DVS operation, the stress voltage is changed up
and down. Diffusion profile of hydrogen atoms when the stress voltage changes
from high to low is different from that when the voltage changes from low to
high [54]. Time-dependent diffusion profiles are shown in Fig. 6.7 for a voltage
change from high (V1 ) to low (V2 ) at time (t0 ). The resulting Vth shift is divided in
to two components. The first component is the diffusion due to the lower voltage
(V2 ) which is the un-shaded region in the figure. The shaded part is the recovery
component that gradually decreases with time. Eventually, the 4Vth degradation is
driven by the lower voltage. To derive a closed form solution, we start with 4Vth
under the lower voltage:
p 2n
Vth .t / D A2 .1 C ı/ tox C C .t t0 / C s.t / (6.20)
where A2 is the function of the lower voltage (V2 ). s(t) is time-dependent initial
distance of hydrogen diffused, shown as the shaded area in first graph of Fig. 6.7a.
As time progresses, these hydrogen atoms recover, given by the equation:
p !
21 Te C 2 C .t t0 /
Vthr .t / D Vth .T / 1 p (6.21)
.1 C ı/ tox C C t
104 K.B. Sutaria et al.
By substituting t D t0 in Eq. (6.20) and equating the results with Eq. (6.21), we get
a compact model which predicts Vth shift when stress voltage changes from a high
to low voltage:
0 v !12n
u p
p p u 2 T C C .t t /
Vth .t / D @ A2 C .t t0 / C tVth .t0 / 1 A
1 e 2 0
2n 2n
p
.1 C ı/ tox C C t
(6.22)
1 and 2 are back diffusion constants, same as those in the recovery model. This
equation is capable of predicting the non-monotonic behavior of initial recovery and
eventually converging with rising Vth shift due to the lower voltage.
When the stress voltage changes from a low (V2 ) to high (V1 ) value at time (t0 ),
the diffusion profile approximation is shown in Fig. 6.7b. In this case, there is no
recovery component. The diffusion due to low voltage continues to diffuse at the
same rate. However, the diffusion front due to higher voltage is dominant, and Vth
shift eventually converges. We start from Eq. (6.20) in the derivation. In this case,
A2 is the function of the higher voltage (V1 ). The diffusion profile of hydrogen under
V1 follows the shaded region in Fig. 6.7b. Vth shift under the lower voltage (V2 ) at
time (t0 ) is given by:
p 2n
Vth .t0 / D A1 .1 C ı/ tox C C t0 (6.23)
Substituting t D t0 in Eq. (6.20) and equating with Eq. (6.23), we arrive at the model
which describes the behavior when voltage transits from a low to a high value:
hp p p p i2n
Vth .t / D 2n
A2 .1 C ı/ tox C C .t t0 / C 2n A1 .1 C ı/ tox C C t
(6.24)
Table 6.1 summarizes the models under random stress, and static models based on
the R–D theory. Figure 6.8 validated the model prediction with 45 nm silicon data
under random stress patterns. The stress and recovery behaviors of threshold voltage
are accurately captured when device is stressed under arbitrary voltage (Fig. 6.8a).
Fig. 6.8b emphasizes the model prediction of the recovery component. In this figure,
a device is stress under 1.3 V and after 1,000 s, the stress voltage is lowered to 1.2 V.
As a result of excessive stress until 1,000 s, the device initially recovers and then
Vth shift increases under the lower voltage. This non-monotonic behavior is well
predicted by the dynamic model [54].
6 Compact Modeling of BTI for Circuit Reliability Analysis 105
a b
0.5
0.0
0 20000 40000 102 103 104
Time (s) Time (s)
Fig. 6.8 Random input model validation (a) arbitrary stress pattern and (b) non-monotonic
behavior of Vth shift under a lower voltage
Δ2
t0
Time (sec)
In this section, trapping/de-trapping based models are presented which can handle
random stress waveform. Since the degradation is highly sensitive to the voltage
(Eq. (6.15)), dynamic voltage scaling leads to different amounts of circuit aging.
The voltage tuning is categorized into two cases. In case 1, stress voltage is changed
from lower voltage V1 to a higher voltage V2 and in case 2, V2 is lower than V1 at
time t0 . To handle such a voltage transition, using Eq. (6.10) and a non-zero time,
t0 , to calculate the occupation probability at time t (time elapsed after t0 ), as shown
in Fig. 6.9 [48].
106 K.B. Sutaria et al.
where eq2 , c2 represent the time constants under voltage V2 . Using Eqs. (6.5) and
(6.6), eq1 D eq2 , since eq depends only on parameter p, which is independent of
the voltage. Substituting this property in Eq. (6.25):
eq eq t=eq
P01 .t C t0 / D 1 e t=eq e e .tCt0 /=eq (6.26)
c1 c2
ΔVth (a.u.)
VDD=1.5V VDD=1.5V
0.4
0.0
VDD=1.2V
1.8V 1.65V
0.4
0.0
50s 200s
prediction under DVS. The two components in Eq. (6.27) play an important role
in long-term prediction under multiple cycles.
Higher recovery is seen when the device is stressed at 1.8 V compared to 1.65 V
as more traps are captured under higher stress voltage. The T–D model captures such
behavior. As stress voltage continues to be 1.2 V after 200 s, the temporary recovery
is overwhelmed by the capture of traps under lower stress voltage. Increasing stress
time causes the degradation to converge to this constant stress voltage at 1.2 V.
This behavior is predicted from Eq. (6.27), where the second component dominates
initially, resulting in the recovery, while the second component decays down and the
first component takes over, leading to the stress behavior under the second voltage.
Figure 6.11 evaluates the model prediction, with different periods under the same
voltage. In this study, the device is initially stressed under 1.8 V, for 50 or 200 s;
108 K.B. Sutaria et al.
a b
1.6
Symbols: 65nm Data Symbols: 65nm Data
Line: T-D model Line: T-D model
1.2
1.8V- 1.2V with α =0.5 1.8V- 0V with α =0.5
ΔVth (a.u.)
0.8
0.4
0.0
3000 6000 9000 60000 63000
Time (s) Time (s)
c 1.6
d
Symbols: 65nm Data Symbols: 65nm Data
Line: T-D model Line: T-D model
1.2
1.8V- 0V with α=0.97
1.8V- 0V with α =0.03
ΔVth (a.u.)
0.8
0.4
0.0
12000 15000 105000 108000
Time (s) Time (s)
Fig. 6.12 Validation of the dynamic stress model under different voltages and duty cycles
then the voltage is switched to 1.65 V. As the stress voltage is lowered, a temporary
recovery behavior is observed due to the emission of excessive amount of trapped
charges. The T–D model captures such behavior in both cases. In the case where the
device is first stressed under 1.8 V for 200 s, the stress time is higher as compared
to the case where the initial stress is 50 s causing higher Vth shift. However, since
in both cases, the device is later stressed for a much longer time ( 10 ks) at 1.65 V,
the degradation converges to the constant stress condition at 1.65 V. This validation
helps predict the aging under various switching activities (˛) much needed for aging
evaluation of digital circuits.
Figure 6.12 presents the measurement under different patterns of voltages and
duty cycles. Figure 6.12a and b show the multiple cycle prediction stressed at 1.8,
1.2 V and 1.8, 0 V at ˛ D 0.5. When the voltage is lowered from 1.8 V, a recovery
is observed from silicon data, which is well predicted from the model. Furthermore,
6 Compact Modeling of BTI for Circuit Reliability Analysis 109
if the voltage is lowered down to 0 V, the recovery is more pronounced (Fig. 6.12b).
The cycle-to-cycle model in Eq. (6.27b) is scalable with different duty cycles.
Figure 6.12c and d show the validation of our model with silicon data at very low
(˛ D 0.03) and very high (˛ D 0.97) duty cycles, respectively. A voltage transition
even for a short duration results in the sudden shift in threshold voltage, due to the
fast trapping/de-trapping (Fig. 6.12c and d).
Random input models derived in this section are thoroughly validated with 45
and 65 nm data. These models provide an accurate and efficient way to identify
aging under arbitrary stress patterns, allowing designers to adequately guard-band
their design. Such models can be more beneficial to AMS designs where the number
of transistors are less than that in a digital design and the input is highly random.
However for large scale digital circuits, using random stress models is insufficient
to predict the lifetime of a high-speed design. Thus, there is a clear need for long-
term aging models based on both R–D and T–D theories, which can estimate a tight
upper bound of aging, without tracking the behavior cycle by cycle. The following
section presents the solutions.
Under the extensive usage of DVS, accurate Vth shift can be predicted using the
random input stress models derived in previous section. However for a large scale
logic design, applying such random stress models will not be efficient in circuit
simulation. Most digital circuits operate at a regular frequency periodically with
a representative duty cycle. A long-term model will be desirable for efficient aging
prediction, which directly estimates aging at a given operation time without tracking
the stress-recovery over many cycles. This long-term model predicts a tight upper
bound after multi-cycle operations under DVS. Based on the multi-cycle model
in previous section, stress (
Vths,m ) and recovery (
Vths,mC1 ) in Fig. 6.13 can be
connected to derive a long-term model based on both R–D and T–D theories.
It is possible to obtain a closed form solution to predict the upper bound of
Vth
for different clock cycle (Tclk ), duty cycle (˛) and stress time for a circuit oscillating
between two voltages. To derive a closed form solution,
Vths,m and
Vths,mC1 are
connected iteratively using random stress models as:
2 p 32n
1=2n 1=2n 1=2n
2n
A2 X 1 C ˇm C ˇm ˇm1 C : : : C
Vths;mC1 D 4 2n
p p 1=2n 1=2n 1=2n
5
A1 Y C C ˛Tclk 1 C ˇm C ˇm ˇm1 C : : :
(6.28)
110 K.B. Sutaria et al.
ΔVthr,m+1
ΔVthr,m
Tclk
where,
p p
X D .1 C ı/ tox C C ˛Tclk ; Y D C .1 ˛/ Tclk
p
21 te C 2 C .1 ˛/ Tclk
ˇ D1 p
.1 C ı/ tox C C mTclk
Using ˇ 1 < ˇ 2 < ˇ m-1 < ˇ m and geometric series approximation, the upper bound
of degradation is derived as:
p p p 2n
1 1
Vths;mC1 D 2n
A2 X C A1 Y C C ˛Tclk
2n
1 ˇ 1=2n 1 ˇ 1=2n
(6.29)
The new long-term model is capable of predicting the upper bound of
Vth for cycle
of two non-zero voltages [54].
Figure 6.14 confirms that for multiple cycles, the long-term model predicts the
same result as the model under random input, given a constant duty cycle. Such
model predicts a tight upper bound under the periodic stress.
Similar to the derivation of the R–D model, a long-term model based on the T–
D theory is obtained. Based on the multi-cycle model in the previous sub-section,
Vths,m and
Vths,mC1 are connected by:
1.0
α=0.9
ΔVth (a.u.) 0.7
α=0.5
α=0.1
0.3
0.0
0 1x105 2x105
Time (s)
Fig. 6.14 Under constant duty cycles, the long-term model is consistent with the model for
random input
Using Eq. (6.30) and repeatedly replacing the
Vths,mC1 by
Vths,i for i D m, : : : ,1,
we get:
0 1
X
m Y
m
Vths;m D 1 ŒA C B log .1 C C ˛TC lk /
@1 C ˇ1;:j :ˇ2;:j A
iD1 j DmiC1
0 1
X
m Y
m
C 2 ŒA C B log .1 C C .1 ˛/ TC lk /
ˇ1;m @1 C ˇ1;:j 1 :ˇ2;:j A
iD1 j DmiC1
(6.31)
Since obtaining a closed-form solution for Eq. (6.31) is not straight-forward, we use
the property ˇ 1,m-1 < ˇ 1,m and ˇ 2,m-1 < ˇ 2,m ,
2
Vths;mC1 1 ŒACB log .1CC ˛TC lk /
1Cˇ1;:m :ˇ2;:m C ˇ1;:m :ˇ2;:m C : : :
2
C2 ˇ1;m ŒACB log .1CC .1˛/ TC lk /
1Cˇ1;:m :ˇ2;:m C ˇ1;:m :ˇ2;:m C : : :
(6.32)
Equation (6.32) is a geometric series and the upper bound of degradation is:
ΔVth (a.u.)
0.6
0.4
Tclk=1000s
0.2
Equation (6.33) is sensitive to the duty cycle, ˛ (ratio of time under V1 to time under
V2 ), time period (sum of operation times under V1 and V2 for a single cycle), and the
stress voltage. The long term model captures the tight upper bound of cycle-to-cycle
prediction. Furthermore, Fig. 6.15 presents the aging behavior under a wide range
of duty cycles as validated with 65 nm silicon data. The degradation rate changes
rapidly when ˛ approaches 0 or 1, but gradual for intermediate duty cycles. This
behavior is due to the sudden change in degradation at the beginning of the stress
and recovery phase, due to the voltage dependent time constants. It is well predicted
by a single equation in the long-term model [48].
New models are derived from basic physics and are comprehensively validated with
discrete device data in previous sections. In this section, these models are further
integrated with design tools at the circuit level. Different types of circuits have
different requirements on reliability tools, due to the diversity in circuit operation
patterns. Digital circuits usually operate at two distinct voltage levels (VDD and
GND) as opposed to continuous voltages in analog circuits. Thus long-term models
are more appropriate for digital circuits and random input stress models are better
for AMS designs. Aging models required to predict circuit lifetime under any given
stress conditions are presented in previous sections. Based on them, we present
the aging analysis of digital and AMS designs by implementing aging models in
SPICE for real time simulation under any possible stress conditions, supporting
design practice for reliability. The model prediction is validated with 45 nm ring
oscillator data.
6 Compact Modeling of BTI for Circuit Reliability Analysis 113
VRING
VDIV
Frequency Output
Divider
VRING : ROVDD
Venable VDIV : VDDof peripheral circuits
Venable : OscillationEnable
To validate the new models at circuit level, silicon data is measured from 45 nm
ring oscillators (ROs). The frequency change (4F/F) of RO is measured as a direct
index of the degradation, which is proportional to PMOS transistor threshold voltage
change under NBTI. Figure 6.16 presents the test circuit of ROs used for aging
characterization. Frequency change in 11 stage ring oscillator is monitored during
the test. The ring oscillator is activated by the enable (Venable ) pin. Different from
traditional RO based aging test, the supply voltage of this circuit, VRING , is switched
between VDD (stress mode) and 0 (recovery mode) at different duration, in order to
emulate dynamic voltage scaling encountered in logic circuits. The data is collected
at regular intervals at multiple supply voltages and temperature. The pin, VDIV , is
implemented to control the frequency divider and the output buffer. It also helps to
eliminate the impact on frequency shift of ROs due to aging of peripheral circuits,
thus giving a clean data [54].
Vth of a device in RO has a strong dependence on the dynamic scaling of supply
voltages due to the recovery effect. Figure 6.17a shows the basic test pattern in a
ring oscillator with alternate active and sleep modes, each for 5,000 s duration. The
114 K.B. Sutaria et al.
a b
1: active sleep active sleep 1.0 active sleep
1.0
active sleep active
0.8
0.8
ΔF/F (a.u.)
0.6
0.6
0.4
0.4
Symbols: 45nm data Symbols: 45nm data
Aging models 0.2 Aging models
0.2
Fig. 6.17 (a) Model validation with circuit-level DVS data. (b) Aging under different stress
patterns well captured by the model
increase of frequency degradation in the active mode and its decrease in the sleep
mode are well predicted by the new random stress models based on R–D or T–D
theory. Figure 6.17b shows the impact of the recovery on the frequency shift of RO.
Although the stress times are same for both patterns, the recovery in PMOS device
causes over all degradation to be less. These effects are successfully captured by the
aging models described previously.
Vbias P
sin (ω1t)
MIXER P P Vcm
a b
0.6
Average ΔVth
Acutal ΔVth
0.6
Output Voltage (V)
ΔVth (a.u.)
0.4
0.3
Average ΔVth
(Sim. time = 1.50s) 43mV distortion in
Acutal ΔVth 0.2 the amplitude is not
0.0
(Sim. time = 3.44s) captured by the
average aging model
Fig. 6.19 Aging analysis with the average value fails to capture the distortion. (a) Vth shift
prediction (b) output voltage of the amplifier
increase the internal time steps which take longer simulation time compared to the
average aging stress pattern in Fig. 6.19.
4 Summary
Aging issue has become increasingly important as the technology scales. BTI,
especially NBTI in a PMOS device, has emerged as a dominant mechanism in digital
as well as AMS designs, impacting circuit performance over time. The challenges
are further compounded with Dynamic Voltage Scaling (DVS) and random inputs
that are encountered in a realistic operation. In this chapter, both R–D and T–
D theories are comprehended in detail. Compact BTI models for both R–D and
116 K.B. Sutaria et al.
Fig. 6.20 Hierarchical development of BTI models for circuit aging analysis
T–D theories are then derived and validated using 45 and 65 nm silicon data.
Figure 6.20 presents distinct levels of reliability modeling. Three different types of
models are developed for diverse design needs: Static, Random stress and Long-
term models. The models are implemented in the SPICE environment, through
Verilog-A. Depending on the operation pattern, appropriate models can be applied
to circuit-level aging analysis and help improve circuit reliability.
6 Compact Modeling of BTI for Circuit Reliability Analysis 117
References
1. C. Constantinescu, “Trends and challenges in vlsi circuit reliability,” IEEE Computer Society,
pp. 14–19, 2003.
2. Semiconductor Industry Associate, International Technology Roadmap for Semiconductors,
2010. (Available at http://public.itrs.net).
3. D. K. Schroder and J. A. Babcock, “Negative bias temperature instability: Road to cross in
deep submicron silicon semiconductor manufacturing,” Journal of Applied Physics, vol. 94,
no. 1, pp. 1-18, 2003.
4. K. O. Jeppson and C. M. Svensson, “Negative bias stress of MOS devices at high electric fields
and degradation of MNOS devices,” J. Applied Physics, vol. 48, no. 5, pp. 2004-2014, 1977.
5. K. Bernstein, D. J. Frank, A. E. Gattiker, W. Haensch, B. L. Ji, S. R. Nassif, E. J. Nowak,
D. J. Pearson, N. J. Rohrer, “High-performance CMOS variability in the 65-nm regime and
beyond,” IBM Journal of Research and Development, vol. 50, no. 4/5, pp. 433-449, July 2006.
6. S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, V. De, “Parameter variations and
impact on circuits and microarchitecture,” Design Automation Conference, pp. 338-342, 2003.
7. F. Arnaud, L. Pinzelli, C. Gallon, M. Rafik, P. Mora, F. Boeuf, “Challenges and opportunity in
performance, variability and reliability in sub-45 nm CMOS technologies,” Microelectronics
Reliability, vol. 51, no. 9-11, pp. 1508-1514, Sept.-Nov. 2011.
8. B. H. Calhoun, Y. Cao, X. Li, K. Mai, L. T. Pileggi, R. A. Rutenbar, and K. L. Shepard, “Digital
circuit design challenges and opportunities in the era of nanoscale CMOS,” Proceedings of the
IEEE, vol. 96, no. 2, pp. 343-365, Feb. 2008.
9. A. Ramadan, “Compact model council’s standard circuit simulator interface for reliability
modeling,” International Reliability Physics Symposium, pp. 2A.5.1-2A.5.6, 2013.
10. J. Hicks, D. Bergstrom, M. Hattendorf, J. Jopling, J. Maiz, S. Pae, et al., “45 nm transistor
reliability,” Intel Technology Journal, vol. 12, no. 02, pp. 131-144, Jun. 2008.
11. S. V. Kumar, C. H. Kim and S. S. Sapatnekar, “Adaptive techniques for overcoming perfor-
mance degradation due to aging in digital circuits,” Asia and South Pacific Design Automation
Conference, pp. 284-289, 2009.
12. V. Reddy, “Impact of negative bias temperature instability in digital circuit reliability,”
International Reliability and Physics Symposium, pp. 248-253, 2002.
13. R. Zheng, et al., “Circuit aging prediction for low-power operation,” Customs Integrated
Circuits Conference, pp. 427-430, 2009
14. G. Chen, K. Y. Chuah, M. F. Li, Daniel SH Chan, C. H. Ang, J. Z. Zheng, Y. Jin, and D. L.
Kwong, “Dynamic NBTI of PMOS transistors and its impact on device lifetime,” International
Reliability Physics Symposium, pp. 196-202, 2003.
15. B. C. Paul, K. Kang, H. Kufluoglu, M. A. Alam, and K. Roy, “Impact of NBTI on the temporal
performance degradation of digital circuits,” IEEE Electronic Device Letters, vol. 26, no. 8,
pp. 560-562, Aug. 2005.
16. R. Vattikonda, W. Wang, and Y. Cao, “Modeling and minimization of pmos nbti effect for
robust nanometer design,” IEEE/ACM Design Automation Conference, pp. 1047-1052, Jul.
2006.
17. M. Agarwal, B. C. Paul, Ming Zhang, and S. Mitra, “Circuit failure prediction and its
application to transistor aging,” VLSI Test Symposium, pp. 277-286, 2007.
18. V. Reddy, A. T. Krishnan, A. Marshall, J. Rodriguez, S. Natarajan, T. Rost, and S. Krishnan,
“Impact of negative bias temperature instability on digital circuit reliability,” International
Reliability Physics Symposium, pp. 248-254, 2002.
19. W. Wang, Z. Wei, S. Yang and Y. Cao, “An efficient method to identify critical gates under
circuit aging,” Int. Conference on Computer Aided Design, pp. 735-740, 2007.
20. A. T. Krishnan, F. Cano, C. Chancellor, V. Reddy, Q. Zhangfen, P. Jain, J. Carulli, J. Masin, S.
Zuhoski, S. Krishnan, and J. Ondrusek, “Product drift from NBTI: Guardbanding, circuit and
statistical effects,” Int. Electron Devices Meeting, pp. 4.3.1-4.3.4, 2010.
118 K.B. Sutaria et al.
21. W. Wang, S. Yang, S. Bhardwaj, R. Vattikonda, S. Vrudhula, F. Liu, Y. Cao, “The impact of
NBTI effect on combinational circuit: Modeling, simulation, and analysis,” IEEE Transactions
on VLSI Systems, vol. 18, no. 2, pp. 173-183, 2010.
22. H. Sangwoo, K. Juho, “NBTI-aware statistical timing analysis framework,” IEEE International
SOC Conference, pp.158-163, 2010.
23. E. Maricau and G. Gielen, “Computer-aided analog circuit design for reliability in nanometer
CMOS,” IEEE Transactions on Emerging and Selected Topics in Circuits and Systems, vol. 1,
no. 1, pp. 50-58, Mar. 2011.
24. A. R. Brown, V. Huard and A. Asenov, “Statistical simulation of progressive NBTI degradation
in a 45-nm technology pMOSFET,” IEEE Transactions on Electron Devices, vol. 57, no. 9, pp.
2320-2323, Sept. 2010.
25. S. Ogawa and N. Shiono, “Generalized diffusion-reaction model for the low-field charge-
buildup instability at the Si-SiO2 interface,” Physical Review B, vol. 51, no. 7, pp. 4218-4230,
Feb. 1995.
26. G. Chen, K. Y. Chuah, M. F. Li, D. S. H. Chan, C. H. Ang, J. Z. Zheng, Y. Jin, et al., “Dynamic
NBTI of PMOS transistors and its impact on device lifetime,” International Reliability Physics
Symposium, pp. 196-202, 2003.
27. S. Chakravarthi, A. T. Krishnan, V. Reddy, C. F. Machala and S. Krishnan, “A comprehensive
framework for predictive modeling of negative bias temperature instability,” International
Reliability Physics Symposium, pp. 273-282, 2004.
28. M. A. Alam, S. Mahapatra, “A comprehensive model of PMOS NBTI degradation,” Micro-
electronics Reliability, vol. 45, pp. 71-81, 2005.
29. V. Huard, C. R. Parthasarathy, C. Guerin, M. Denais, “Physical modeling of negative
bias temperature instabilities for predictive extrapolation,” International Reliability Physics
Symposium, pp. 733-734, 2006.
30. S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, “An analytical model for Negative Bias
Temperature Instability (NBTI),” International Conference for Computer Aided Design, pp.
493-496, 2006.
31. W. Wang, V. Reddy, A. Krishnan, R. Vattikonda, S. Krishnan and Y. Cao, “Compact modeling
and simulation of circuit reliability for 65 nm CMOS technology,” IEEE Transactions on
Device and Materials Reliability, vol. 7, no. 4, pp. 509-517, Dec. 2007.
32. R. Tu, E. Rosenbaum, W. Chan, C. Li, E. Minami, K. Quader, et al., “Berkeley reliability tools –
BERT,” IEEE Transactions on Computer-Aided Design of Integrate Circuits and Systems, vol.
12, no. 10, pp. 1524-1534, Oct. 1993.
33. Reliability Simulation in Integrated Circuit Design, Cadence, 2003.
34. MOS Device Aging Analysis with HSPICE and CustomSim, Synopsys, 2011.
35. ELDO User’s Manual, Mentor Graphics, 2005.
36. V. Huard, M. Denais, “Hole trapping effect on methodology for DC and AC negative bias
temperature instability measurements in pMOS transistors,” International Reliability Physics
Symposium, pp. 40-45, 2004.
37. T. Grasser, B. Kaczer, W. Goes, H. Reisinger, T. Aichinger, P. Hehenberger, et al., “The
paradigm shift in understanding the bias temperature instability: from reaction-diffusion to
switching oxide traps,” IEEE Transactions on Electron Devices, vol. 58, no. 11, pp. 3652-
3666, Nov. 2011.
38. G. I. Wirth, R. da Silva and B. Kaczer, “Statistical model for MOSFET bias temperature
instability component due to charge trapping,” IEEE Transactions on Electron Devices, vol.
58, no. 8, pp. 2743-2751, Aug. 2011.
39. J. B. Velamala, K. B. Sutaria, T. Sato, Y. Cao, “Aging statistics based on trapping/detrapping:
silicon evidence, modeling and long-term prediction,” International Reliability Physics Sym-
posium, pp. 2 F.2.1-2 F.2.5, 2012.
40. T. Grasser, H. Reisinger, W. Goes, T. Aichinger, P. Hehenberger, P.-J. Wagner, M. Nelhiebel,
J. Franco, B. Kaczer, “Switching oxide traps as the missing link between negative bias
temperature instability and random telegraph noise,” International Electron Devices Meeting,
pp.1-4, 2009.
6 Compact Modeling of BTI for Circuit Reliability Analysis 119
41. G. I. Wirth, J. Koh, R. da Silva, R. Thewes, and Ralf Brederlow, “Modeling of statistical low-
frequency noise of deep-submicron MOSFETs,” Trans. on Electron Dev., vol. 52, pp. 1576-
1588, 2005.
42. A.P. van der Wel, E.A.M. Klumperink, J.S. Kolhatkar, E. Hoekstra, M.S. Snoeij, C. Salm,
H. Wallinga, B. Nauta, “Low-Frequency Noise Phenomena in Switched MOSFETs,” IEEE
Journal of Solid-State Circuits, vol. 42, no. 3, pp.540-550, March 2007.
43. M. J. Kirton and M. J. Uren, “Noise in solid-state microstructures: A new perspective on
individual defects, interface states and sow-frequency (1/f) noise,” Advances in Physics, vol.
38, p. 367-468, 1989.
44. G. Wirth, R. da Silva and R. Brederlow, “Statistical model for the circuit bandwidth dependence
of low-frequency noise in deep-submicrometer MOSFETs,” Trans. on Electron Devices, vol.
54, pp.340-345, Feb. 2007.
45. G. Wirth, R. da Silva, P. Srinivasan, J. Krick and R. Brederlow. “Statistical model for
MOSFET low-frequency noise under cyclo-stationary conditions,” International Electron
Devices Meeting, p.30.5.1-4, 2009.
46. B. Kaczer, T. Grasser, Ph. J. Rousse, J. Martin-Martinez, R. O’Connor, B. J. O’Sullivan,
and G. Groeseneken, “Ubiquitous relaxation in BTI stressing—new evaluation and insights,”
International Reliability Physics Symposium, pp. 20-27, 2008.
47. T. Grasser, H. Reisinger, P.-J. Wagner, F. Schanovsky, W. Goes and B. Kaczer, “The time
dependent defect spectroscopy (TDDS) for the characterization of the bias temperature
instability,” International Reliability Physics Symposium, pp. 16-25, 2010.
48. J. B. Velamala, K. B. Sutaria, H. Shimizu, H. Awano, T. Sato, G. Wirth, Yu Cao, “Compact
modeling of statistical BTI under trapping/detrapping,” IEEE Transactions on Electron
Devices, vol.60, no.11, pp.3645-3654, Nov. 2013
49. J. B. Velamala, K. Sutaria, H. Shimizu, H. Awano, T. Sato, Y. Cao, “Statistical aging
under dynamic voltage scaling: A logarithmic model approach,” Custom Integrated Circuits
Conference, pp. 1-4, 2012.
50. K. Sutaria, J. Velamala, V. Ravi, G. Wirth, T. Sato, Y. Cao, “Multilevel reliability simulation
for IC design,” pp. 719-749, in Bias Temperature Instability for Devices and Circuits, edited
by T. Grasser, Springer, 2014.
51. K. B. Sutaria, J. B. Velamala, C. Kim, T. Sato, Y. Cao, “Aging statistics based on trap-
ping/detrapping: Compact modeling and silicon validation,” IEEE Transactions on Device and
Materials Reliability, vol. 14, no. 2, pp. 607-615, June 2014.
52. M. Denais, C. R. Parthasarathy, G. Ribes, Y. Rey-Tauriac, N. Revil, A. Bravaix, V. Huard,
F. Perrier, “On-the-fly characterization of NBTI in ultra-thin gate oxide PMOSFET’s,”
International Electron Devices Meeting, pp. 109-112, 2004.
53. H. Reisinger, O. Blank, W. Heinrigs, A. Muhlhoff, W. Gustin, C. Schlunder, “Analysis of NBTI
degradation- and recovery-behavior based on ultra-fast VT-measurements,” International
Reliability Physics Symposium, 2006.
54. K. B. Sutaria, A. Ramakumar, R. Zhu, R. Rajeev, Y. Ma, Y. Cao, “BTI-induced aging under
random stress waveforms: Modeling, simulation and silicon Validation,” Design Automation
Conference, pp. 1-6, 2014.
Chapter 7
Circuit Resilience Roadmap
1 Introduction
Faults in integrated circuits can broadly result from four aspects: (a) design bugs
which originate from mistakes in the system specification or implementation, (b)
manufacturing problems which can range from variations of design parameters to
complete changes in topology, such as resistive shorts or opens, (c) aging problems
which arise over lifetime and also can range from a degradation of design parameters
to changes in the circuit topology, and (d) signal corruption which results from
intrinsic or extrinsic temporary disturbances.
All these faults may lead to errors and failures in the whole system. These we
can again categorize broadly: (a) permanent errors which relate to changes in the
topology, (b) soft errors which result from signal corruptions, such as a particle
strike changing the contents of a storage element, and (c) parametric errors, where
the circuit fails to meet its specification in terms of frequency of operation, power,
or similar metrics.
In this context, a circuit is more resilient if it is able to tolerate increased
occurrence of soft and parametric errors. For digital synchronous circuits, we
can easily evaluate their resilience, using standardized measures and metrics for
their performance and correctness. Consider for example a metric such as logic
functionality: we can easily state that the steady-state output voltage of an inverter
has to be the logic inverse of its input voltage. Similarly, for example a latch has to
keep its stored value until the next latching clock pulse.
Traditionally, hard and soft errors were distinguished in the following way. Hard
errors are associated with an incorrect behavior of the circuit under all conditions,
which results from a topological change of the circuit. In contrast, soft errors are
based on changes of electrical parameters of the circuit, which causes a faulty
behavior only under specific conditions. For example, a short or open inside a
CMOS inverter may cause its output to be stuck at logic one or zero, while a shift in
gate oxide thickness behaves more subtle, as it changes the time the inverter takes
to change its state given a switching input signal. This in turn, leads only to an error
if this change causes a path to violate one of its timing constraints.
However, excessive parametric variability can cause the circuit to behave in a way
which we would traditionally associate with a hard error. A prominent example for
this in current technologies can be found in static random access memory (SRAM),
where we use the smallest possible device due to high density requirements. These
small devices are extremely susceptible to various sources of variability, such as
Random Dopant Fluctuations (RDF) or Line Edge Roughness (LER) [1]. Such
excessive variability can lead to scenarios, where a particular SRAM cell cannot
be read or written under all practical operating conditions, although the underlying
topology might still be correctly manufactured [13].
In addition to parametric variability, which we discussed above, circuits—as
they get smaller—become more sensitive to various forms of noise. As the amount
of charge being held in a storage element or transported by a switching signal
decreases, the potential for a noise source, such as a striking particle, to disrupt
this charge increases.
These problems already exist in memories for several years, which has lead
to a huge variety of solutions and techniques to deal with them. Over the years,
various kinds of redundancy, parity checking, and error correction, were proposed
and are nowadays an essential part of any practical memory design. As we start to
face similar problems now with the remaining logic parts of the design, also here
several techniques have been proposed over the last years, such as RAZOR [2] or
BISER [22] to name just two prominent examples.
Our goal in this chapter is to discuss and analyze the ongoing trend of continued
technology scaling and its implications for circuit resilience. This continues earlier
work on predicting circuit resilience conducted in [10] and [14]. To demonstrate this
behavior on specific examples, we will focus on two basic circuit blocks. On the
one hand we analyze the behavior and resilience of different SRAM architectures.
7 Circuit Resilience Roadmap 123
As already mentioned above, SRAM cells represent the most susceptible parts of a
chip due to their minimal sizing. On the other hand we discuss trends of technology
scaling in logic standard cells. Standard cells in logic designs, such as inverters,
represent (compared to SRAM cells) the other extreme and are typically considered
to be the most resilient parts of the design.
In the following, Sect. 2 discusses general trends in technology, focusing on man-
ufacturing variability, aging degradation, and particle strikes. Section 3 discusses
the behavior of SRAM performances for different architectures and technologies.
Section 4 does the same for different logic standard cells. Section 5 concludes the
chapter.
2 Trends in Technology
There exist several technology related effects, which influence the final resilience of
the circuit and its individual components. For example, manufacturing variations,
which may be in their nature either random, such as process variations, or system-
atic, such as chemical-mechanical polishing, cause a static deviation, which remains
constant over the entire life-time of the chip. Aging effects cause an additional
degradation of these characteristics which increases over life-time. Finally, effects
such as particle strikes, disturb chip operation for small amounts of time, but this
possibly quite frequently.
This leads to line edge roughness (LER) which impacts for example the effective
gate length of all transistors. LER does not scale with device scaling and is almost
independent of the lithographic process.
Metal gates, which were introduced to overcome limitations of using polysilicon
gate electrodes, are also subject to another variability source called metal gate
granularity (MGG). Metal gate granularity refers to the random orientation of
the different metal grains and leads to variations in the gate work function. The
magnitude of this effect is primarily dependent on the number of metal grains inside
the gate. As the size of metal grains is a material-specific property [3], the number
of metal grains decreases with ongoing scaling, which increases the effect of MGG
in future technologies.
The number and position of dopant atoms in the channel is another common
source of process variability called Random Dopant Fluctuations (RDF). Due to
decreasing numbers of dopant atoms in the channel, this effects tends to have
more influence in CMOS technologies, as one dopant more or less means a large
difference in resulting device properties.
With ongoing technology scaling transistors are not anymore produced in their
common planar style, but more advanced techniques, such as FinFET evolved,
especially due to problems resulting from RDF. Figure 7.1 shows the variation in
threshold voltage dependent on doping concentration for a planar and a FinFET
transistor produced in a similar technology.
As FinFET transistors are able to tolerate low channel doping, they can be
produced with almost negligible amount of threshold voltage variation due to RDF.
Traditional planar CMOS technologies require higher doping concentrations and,
hence, impose a higher threshold variability due to RDF.
In FinFET the transistor channel is produced in a 3D fashion and the gate is
folded around the channel, which enables a better control of the channel as well
as low channel doping, but also introduces additional variation sources. Fin height
7 Circuit Resilience Roadmap 125
and fin thickness are additionally subject to global variations due to mask offsets.
Additionally fin thickness also suffers from line edge roughness. In contrast, fin
height is only dependent on film thickness and not on lithography and, hence, does
not vary due to LER.
Table 7.1 summarizes the main process parameter variations which are modeled
in this work. Global variations affect each transistor on the die in the same
way, hence, each transistor pair is fully correlated in this regard. Local variations
affect each transistor on the die individually, hence, each transistor pair is fully
uncorrelated.
2.2 Aging
Aging changes transistor parameters after the manufacturing process. These changes
are the consequence of a certain stress condition, which lead to changes in the
physical structures of the transistor. For example, Time Dependent Dielectric
Breakdown (TDDB) leads to additional conducting paths in the gate oxide, which
increase the gate leakage current. TDDB can be divided into a soft breakdown phase,
where this additional current leads to a degradation of the physical properties of the
transistor. The soft breakdown is typically followed by a hard breakdown, which
leads to a complete loss of oxide dielectric properties of the transistor. Another
effect is hot carrier (HC) degradation, which is caused by a large electric field
near the drain of the transistor channel. HC leads to a degradation of threshold
voltage and carrier mobility of the transistor. Bias Temperature instability (BTI)
has attracted much attention in the last years. As gate oxides become thinner than
4 nm, BTI-induced shift in Vth has become the dominant factor limiting device
lifetime and circuit reliability [19]. BTI can be distinguished into Negative Bias
Temperature Instability (NBTI), which occurs in p-channel transistors and Positive
Bias Temperature Instability (PBTI), which occurs in high-k dielectric n-channel
transistors. BTI requires that the transistor is operated in inversion, which means
for a PMOS transistor that its gate is negatively biased with respect to its drain.
126 V.B. Kleeberger et al.
The exact amount of Vth shift is a strong function of device parameters and circuit
operation conditions such as the switching activity. Figure 7.2 presents a prediction
of NBTI induced Vth degradation for future technologies based on a model from [9].
Logic upsets and value changes in high density memories can be induced by charge
collection from ion tracks [21]. Soft error transients (SETs) are momentary current
or voltage disturbances in a circuit caused by energetic particle strikes (see Fig. 7.3).
Such an SET may propagate through subsequent circuitry, eventually reaching
a memory element or latch and causing a logic failure. The sensitivity of digital
ICs to SETs is rapidly increasing with aggressive technology scaling. Due to
decreasing parasitic capacitances and operating voltage, the critical charge Qc ,
which is required to upset a storage cell also decreases. Figure 7.4 shows the
dependence of Qc on supply voltage for different technology nodes in a 6T SRAM
cell, which can be found using a binary search and the simulation setup from
Fig. 7.3. When entering the f C region for the critical charge, as in current logic
and SRAM devices, lighter particles such as alpha and proton particles become
7 Circuit Resilience Roadmap 127
dominant (see Fig. 7.5). This increases not only error rates, but also their spread,
as the range of lighter particles is much longer compared to residual nucleus [6].
For a scaled transistor, the surroundings of the drain junction are the most
sensitive sites to SET, e.g., the drain junction of the inverter and internal storage
nodes in a SRAM cell. For alpha particles, the charge deposition is modeled by a
double-exponential current pulse (i.e., funneling current) [12]:
where I0 is the maximum charge collection current; ˛ is the charge collection time
constant and ˇ is the time constant for the rising edge of the current pulse, which
is mainly determined by the energy of the particle.
Assuming the device is a bulk CMOS transistor, the dependence of I0 and ˛ on
device parameters is expressed as:
I0 D qNE0 (7.2)
and
2.0x10-12
-5
6.0x10 3.0x10-11
1.5x10-12
-11
tβ (s)
2.5x10
4.0x10-5
I0 (A)
tα (s)
1.0x10-12
-11
2.0x10
-13
5.0x10
2.0x10-5 -11
1.5x10
0.0
12 16 22 32 45 12 16 22 32 45
Technology (nm)
Fig. 7.6 Maximum charge collection current I0 , and collection time constants ˛ and ˇ as a
function of CMOS technology
where N is the doping concentration and is the effective mobility. With the
scaling of CMOS technology, the product of N increases, even though the effective
mobility decreases due to higher channel doping concentration. Hence, I0 increases
and ˛ decreases when the device moves to a smaller technology node, as shown in
Fig. 7.6. However, ˇ is relatively independent on device parameters (Fig. 7.6). It is
also much smaller compared to ˛ [12].
3 Trends in Memory
In this section, we first analyze different design options for SRAM cells based
on a 14 nm FinFET technology extracted from TCAD simulation. Second, we
demonstrate resilience trends for FinFET-based SRAM architectures down to 7 nm
with the help of Predictive Technology Models (PTM) [16, 18]. For this purpose we
have chosen two common SRAM architectures, namely the 6-transistor (6T) and
8-transistor (8T) bit cell shown in Fig. 7.7.
The SRAM is well known to already have high failure rates in current technologies.
Figure 7.7 shows both architectures we analyzed. For the 6T architecture we have as
design choices the number of fins for the pull-up transistors (PU), the number of fins
for the pull-down transistors (PD) and the number of fins for the access transistors
(PG). The resulting architecture choice is then depicted by 6T_(PU:PG:PD). For
7 Circuit Resilience Roadmap 129
Fig. 7.7 Circuit schematics for standard 6T (a) and 8T (b) SRAM bit cells
the 8T architecture we have additionally two transistors for the read access (PGR).
Hence, the corresponding architecture choice is named 8T_(PU:(PG:PGR):PD).
As any other circuit an SRAM cell can fail in many different ways, for
example:
• Read delay failure: An SRAM cell cannot be read within a specified time.
• Write Trip Voltage (WTV) failure: The voltage swing during a write is not high
enough at the SRAM cell.
• Static Voltage Noise Margin (SVNM) failure: An SRAM cell can be flipped
unintentionally, when the voltage noise margin is too low (stability).
• Soft Error/Single Event Upset (SEU) failure: If the critical charge Qcrit is low,
the susceptibility to a bit flip caused by radiation is higher.
For the design parameters we have extended this list by:
• Write delay: Write a new content into the cell within a specified time.
• Dynamic power (SW): The power measured during a new content write
(switching).
• Leakage power: Standby leakage with zero and one content averaged (word line
set to zero).
We measure the write delay in our experiments, by first initializing the SRAM
content with a one, setting the bit lines such that they will attempt to write a zero
into the cell, pulse the word line, and measure the delay until the internal cell node
takes on the new value.
For the complete SRAM array architecture we assumed 128 cells connected to
one column (nrows = 128).
130 V.B. Kleeberger et al.
b
a
Fig. 7.8 N-Curve extraction circuit (a) and resulting N-Curve (b)
We identified four parameters, namely Read delay, WTV, SVNM and Qcrit as
resilience key parameters.
To measure the SVNM and WTV we used the N-Curve metric [20]. This metric
finds the minimal amount of voltage that is required to change the cell’s content.
The extraction procedure for the N-curve is as follows (see also Fig. 7.8):
1. The bitlines BL and BR and the wordline W L are clamped to VDD
2. A DC voltage source Vi n is attached to the node R
3. The voltage of Vi n is swept from VSS to VDD and the current Ii n through the
voltage source is measured.
This then results on the curve shown in Fig. 7.8b. The voltage difference between
the points A and B in the curve is the maximum tolerable DC voltage noise or the
static voltage noise margin (SVNM) [5]. Similarly, the voltage difference between
the points C and B is the voltage drop required to flip the internal node of the cell,
also called the write-trip voltage (WTV).
For measuring the read delay, we initialize the SRAM bit with a one, initialize
both bit lines to the supply voltage, leave them floating (high-Z), and pulse the
wordline. Then we measure the delay as the time the bit line node requires to drop
to 2/3 of the supply voltage.
Figure 7.9a shows the impact of the number of rows used in one column on
the read delay for various 6T and 8T architectures. The 6T and 8T one fin for all
transistors designs are very similar (in fact their read delays are equal) and show
the largest impact, while the double sized 8T cell is almost as good as the high-
performance 6T from Intel (6T_(1:2:3)) [8]. Figure 7.9b shows the impact of VDD
on the SVNM. We see that 8T cells show for both models PTM 20 nm and TCAD
14 nm a very similar trend as they have nearly identical P/N ratios. The SVNM of
the 8T cell relies only on the intrinsic feedback of the storage inverter. The 6T cell
7 Circuit Resilience Roadmap 131
a 60
b 500
6T_(1:1:1) 6T_1:1:1_TCAD_14nm
8T_1:(1:1):1 450 8T_1:(1:1):1_TCAD_14nm
50 6T_1:1:1_PTM_20nm
8T_2:(2:2):2 400 8T_1:(1:1):1_PTM_20nm
Read delay [ps]
6T_(1:2:3)
SVNM [mV]
40 350
30 300
250
20
200
10 150
0 100
nrows:128 nrows:256 nrows:512 500 550 600 650 700 750 800 850 900 950 1000
VDD [mV]
Fig. 7.9 Impact of number of rows on read delay for a 14 nm TCAD model (a) and SVNM
comparison between 6T and 8T cells using different models (b)
behavior differs between TCAD based model and PTM. This is due to the 100 mV
lower threshold voltage (Vth ) of the PTM models, which reduces the SVNM.
For Qcrit measurements we use the setup shown in Fig. 7.3b. Basically, we
measured the amount of charge needed to flip the cell, i.e., to corrupt the data.
We observed that there is no single cell, which provides the best solution under any
requirement specification. Therefore, also Intel presented three different sizings for
the 6T cell [8]. There is a need for at least a high-density cell and a high-performance
cell. So we included in our study the three flavors of Intel’s 6T cell and several 8T
design alternatives to show what might be achievable.
First, we compared various 6T and 8T architectures by an exhaustive analysis to
quantify the design influence.
In Fig. 7.10 different SRAM design options are evaluated for the resilience
key parameters normalized to the respective values of a 6T_(1:1:1) minimal sized
cell. In general, higher values indicate a performance improvement (e.g. +10 %).
Obviously, it is not good idea to use the design options 6T_(1:2:1) and 6T_(2:2:1),
as they decrease the SVNM by 30 %. For 8 fin designs the 8T_(1:(1:1):1) cell might
be a much better solution, which offers 22 % SVNM improvement. But also the
6T_(1:1:2) used by Intel as high density low voltage design has an increased SVNM
(10 %). When it comes to high performance cell options, the Intel variant 6T_(1:2:3)
shows the fastest read delay, followed by many 8T designs and some 6T cells.
However, if we want to lower the soft error rate, despite of increasing the voltage
supply, the best SRAM design option is the 8T_(2:(2:2):2) cell, which unfortunately
occupies a very large area.
132 V.B. Kleeberger et al.
120%
110%
100%
90%
80% Read delay
70% Qcrit
SVNM
60% WTV
50%
40%
30%
20%
10%
0%
-10%
-20%
-30%
8T_2:(1:1):1
8T_1:(2:1):2
8T_1:(1:2):2
8T_2:(2:2):2
8T_1:(2:2):3
6T_1:2:1
6T_1:2:2
6T_2:2:2
6T_1:2:3
6T_2:1:1
6T_1:1:2
8T_1:(1:1):1
6T_2:2:1
8T_1:(1:1):2
8T_1:(2:2):1
8T_2:(2:2):1
8T_1:(2:2):2
8T_1:(3:2):2
8 FINS 10 FINS 12 FINS 14 FINS 16 FINS
Fig. 7.10 Comparison of read delay, SVNM, WTV and Qcrit for different design variants
normalized to the results of the 6T_(1:1:1) bit cell
Write delay
25% Leakage (avg)
Power (SW)
0%
-25%
-50%
-75%
-100%
-125%
-150%
6T_2:1:1
6T_1:1:2
6T_1:2:1
8T_1:(1:1):1
6T_2:2:1
6T_1:2:2
8T_2:(1:1):1
8T_1:(1:1):2
6T_2:2:2
6T_1:2:3
8T_1:(2:2):1
8T_1:(2:1):2
8T_1:(1:2):2
8T_2:(2:2):1
8T_1:(2:2):2
8T_2:(2:2):2
8T_1:(2:2):3
8T_1:(3:2):2
Fig. 7.11 Comparison of write delay, leakage, and switching power (SW) for different design
variants normalized to the results of the 6T_(1:1:1) bit cell
In Fig. 7.11 we plotted the additional design parameters. We see that all 8T
cells have much increased leakage power, while Intel’s 6T_(1:2:3) offers a 10 %
decreased write delay with an moderate increase of dynamic power consumption.
However, the leakage power increase is for this option also above 100 % of a
6T_(1:1:1) cell. An interesting alternative design option with 12 fins could be the
8T_(1:(2:2):1) cell, as it shows 25 % decreased write delay with no impact on
dynamic power. The leakage power increase of this variant could be controlled by
lowering VDD in standby or by active backward body biasing.
7 Circuit Resilience Roadmap 133
a 30 10 b 30 10
9 9
3σ (% of Median Value)
3σ (% of Median Value)
25 8 25 8
20 7 20 7
6 6
15 5 15 5
4 4
10 3 10 3
5 2 5 2
1 1
0 0 0 0
20 16 14 10 7 20 16 14 10 7
Technology Node [nm] Technology Node [nm]
6T_(1:1:1) cell 8T_(1:(1:1):1) cell
Fig. 7.12 Impact of VDD variation (˙10 %) in absolute values (box plots) and relative increase
compared to the median (dashed line) for a 6T_(1:1:1) high density (a) and a standard
8T_(1:(1:1):1) cell (b)
From a design perspective there is no clear winner for all evaluated design
variants. The best choice depends on the individual requirements and priorities,
which can be either high density, high robustness or high performance.
In this section we want to analyze different scaling related trends for SRAM cells
using PTM SPICE models from [18]. First of all we want to analyze the impact of
VDD variations. Afterwards, we quantify trends considering transistor technology
variations as described in Sect. 2.1.
To evaluate the different technology nodes we run Monte-Carlo (MC) simula-
tions with 1,000 samples to get an impression of the general trend.
We quantified the impact of supply voltage noise by assuming a maximal VDD
variation of ˙10 % for a 3 yield. Figure 7.12 shows the corresponding results of
the respective Monte Carlo simulations. We recognize, as expected, an increasing
influence of VDD on the read delay for the smaller nodes (up to 9.4 % for the 7 nm
node). Additionally, we observe a very similar trend for the 6T and 8T architectures.
To evaluate the scaling trends of the chosen resilience key parameters when
scaling down to the 7 nm node, we run MC simulations considering process
variations. Figures 7.13 and 7.14 show box plots and the percentage deviation for
a 3 yield with respect to median values for a 6T_(1:1:1) high density and an
8T_(1:(1:1):1) SRAM cell, respectively.
Although the read delay decreases with technology scaling (see Figs. 7.13a and
7.14a), which theoretically enables a higher working frequency, its relative 3
variation becomes as high as 50 % at the 7 nm node. This compromises its robustness
and cripples the possible increase in frequency.
134 V.B. Kleeberger et al.
a 35 60
b 800 35
3σ (% of Median Value)
3σ (% of Median Value)
700
Read delay [ps]
30 50 30
25 600 25
Qcrit [aC]
40 500
20 20
30 400
15 300 15
10 20 10
200
5 10 100 5
0 0 0 0
20 16 14 10 7 20 16 14 10 7
Technology Node [nm] Technology Node [nm]
c 600 70 d 500 100
3σ (% of Median Value)
3σ (% of Median Value)
500 60
400 80
SVNM [mV]
WTV [mV]
400 50
40 300 60
300
30 200 40
200 20
100 100 20
10
0 0 0 0
20 16 14 10 7 20 16 14 10 7
Technology Node [nm] Technology Node [nm]
Fig. 7.13 Scaling trends for a 6T_(1:1:1) high density SRAM cell: (a) read delay, (b) Qcrit , (c)
WTV and (d) SVNM
a b
35 60 800 35
3σ (% of Median Value)
3σ (% of Median Value)
700
Read delay [ps]
30 50 30
25 600 25
Qcrit [aC]
40 500
20 20
30 400
15 300 15
10 20 10
200
5 10 100 5
0 0 0 0
20 16 14 10 7 20 16 14 10 7
Technology Node [nm] Technology Node [nm]
c 600 60 d 600 30
3σ (% of Median Value)
3σ (% of Median Value)
500 50 500 25
SVNM [mv]
WTV [ mV]
400 40 400 20
300 30 300 15
200 20 200 10
100 10 100 5
0 0 0 0
20 16 14 10 7 20 16 14 10 7
Technology Node [nm] Technology Node [nm]
Fig. 7.14 Scaling trends for a 8T_(1:(1:1):1) SRAM cell: (a) read delay, (b) Qcrit , (c) WTV and
(d) SVNM
7 Circuit Resilience Roadmap 135
The critical charges Qcrit shows a similar trend, where its decrease with technol-
ogy scaling is accompanied by a noticeable increase in 3 variability (around 30 %
at 7 nm). Nevertheless, the 8T_(1:(1:1):1) design provides an approximately 10 %
higher Qcrit than the 6T_(1:1:1), and a slightly smaller 3 variability. Regarding
WTV, a quick examination of Fig. 7.7 shows that both designs are identical in terms
of writing procedure, which is exemplified in Figs. 7.13c and 7.14c. In both cases
their 3 variability increases from 40 % at 20 nm to nearly 58 % at 7 nm. Such
high variations have to be taken into account to guarantee enough voltage swing
during a write procedure to flip the cell’s content as expected. Perhaps the biggest
advantage of the 8T_(1:(1:1):1) design over the 6T_(1:1:1) version with respect to
the resilience key parameters presented here, is its much improved SVNM. Not only
reaches the 8T design an approximately 22 % higher SVNM than the 6T design, but
it is also much more robust in terms of 3 variability (28 % for 8T 7 nm compared
to 90 % for 6T 7 nm).
This overall analysis of the resilience key parameters presented here (read
delay, Qcrit , WTV, SVNM) shows clearly that the variability increases rapidly as
technology is scaled down, which has to be taken into account when designing
robust SRAM cells.
4 Trends in Logic
Before making predictions regarding the scaling trend of FinFET standard cells,
we first want to evaluate in detail how process variations influence the behavior of
FinFET-based logic.
Fig. 7.16 Q-Q plot—delay and slope. (a) NAND—delay. (b) NOR—delay. (c) NAND—slope.
(d) NOR—slope
To evaluate these cells we run a Monte-Carlo simulation with 1,000 samples and
measure delay, slope, and leakage power for every input timing arc. Figures 7.16
and 7.17 show the corresponding Q-Q plots.
Obviously, delay is very well standard normal distributed as we already expect it
to be from bulk technologies. For the slope inverse (i.e., switching time) this holds
also for most timing arcs. Only for the timing arcs where the entire transistor stack
is involved (e.g., Br ! Zf for the NAND) this property does not hold. Figure 7.17
shows the lognormal Q-Q plot for leakage power. Leakage variability fits very well
a lognormal distribution for most samples. However, the tail seems to not fulfill this
assumption, hence, for accurate yield predictions more advanced distribution types
will probably be required here.
Additionally we conduct a worst case analysis as described by [4]. Based on
the gradient pointing to the worst-case point in the process parameter space, we
7 Circuit Resilience Roadmap 137
Fig. 7.18 Delay and leakage variability—NAND. (a) Variability sources. (b) Global vs. local
compute the contribution of the different variability sources as well as the single
transistors compared to global variability (Figs. 7.18 and 7.19).
In both cell types, metal gate granularity is the major variability source for delay
and leakage. Besides, fin thickness also contributes considerably, especially for
leakage. The contribution of random dopant fluctuations is well suppressed due to
the low channel doping. Most variability is caused by local variations, which can
again be attributed to the dominance of metal gate granularity, followed by line
edge roughness effects.
138 V.B. Kleeberger et al.
Fig. 7.19 Delay and leakage variability—NOR. (a) Variability sources. (b) Global vs. local
Fig. 7.21 Scaling of minimal-sized inverter performances. (a) P/N ratio. (b) Delay. (c) Slope.
(d) Input capacitance
transistor types. The delay of the FO4 inverter initially scales with a very steep slope,
but after some generations this trend quickly reverts to an less steep scaling as we
know from planar technologies. Interestingly, slope scales very steeply in FinFET
technology. Input capacitance is in general much higher for FinFET compared to
planar technologies in the same technology node. Both, the scaling of slope and the
value of the capacitance, can be very well explained by the three-dimensional gate
controlling the channel.
To evaluate the scaling trend in FinFET technologies in the presence of variabil-
ity, we size a mid-performance inverter and evaluate its behavior under different
specifications, which are listed in Table 7.2. Specification 0 represents nominal
conditions without any variability. In each subsequent specification one additional
variation source is added: Process variations in spec 1, device temperature variation
in spec 2, and supply voltage variation in spec 3.
140 V.B. Kleeberger et al.
Fig. 7.22 Scaling trends for timing performances of a medium-sized inverter. (a) Nominal delay.
(b) Delay degradation. (c) Nominal slope. (d) Slope degradation
Figure 7.22 shows the change of timing performances (delay and slope) over
the respective technology nodes, while Fig. 7.23 shows the change in power
performances (leakage power and switching energy).
Delay degradation due to process variations increases from 17 % up to 30 %
for a 3 -design. A detailed analysis reveals that from this increase about 5 % can
be accounted to an increase of sensitivity to process variation due to technology
scaling. The remaining larger increase results mostly from the increase of gate
work function variability due to metal gate granularity. This corroborates that metal
gate granularity is the major source of process variability in FinFET as metal grain
size does not scale. Another interesting insight is that the influence of temperature
increases over the studied technologies. While we observed in this study almost no
7 Circuit Resilience Roadmap 141
Fig. 7.23 Scaling trends for power performances of a medium-sized inverter. (a) Nominal leakage.
(b) Leakage degradation. (c) Nominal switching energy. (d) Switching energy degradation
5 Conclusion
Acknowledgements The authors want to gratefully acknowledge the contributions of Sani Nassif,
Nikhil Mehta, Helmut Graeb, and Javier Varela to this work. Veit Kleeberger was employed at the
Institute for Electronic Design Automation (TUM) during this work. This work was supported
in part by the German Research Foundation (DFG) as part of the priority program “Dependable
Embedded Systems” (SPP 1500—http://spp1500.itec.kit.edu).
References
1. Asenov, A.: Random dopant induced threshold voltage lowering and fluctuations in sub-0.1 m
mosfet’s: A 3-d atomistic simulation study. IEEE Trans. Electron Devices 45(12), 2505–2513
(1998)
2. Austin, T., Blaauw, D., Mudge, T., Flautner, K.: Making typical silicon matter with Razor.
IEEE Computer 37(3), 57–65 (2004)
3. Dadgour, H., Endo, K., De, V., Banerjee, K.: Modeling and analysis of grain-orientation effects
in emerging metal-gate devices and implications for SRAM reliability. In: IEEE International
Electron Devices Meeting (IEDM), pp. 1–4 (2008)
4. Graeb, H.: Analog Design Centering and Sizing. Springer Verlag (2007)
5. Grossar, E., Stucchi, M., Maex, K., Dehaene, W.: Read stability and write-ability analysis of
SRAM cells for nanometer technologies. IEEE Journal of Solid-State Circuits 41(11), 2577–
2588 (2006)
6. Ibe, E., Chung, S.S., Wen, S., Yamaguchi, H., Yahagi, Y., Kameyama, H., Yamamoto, S.,
Akioka, T.: Spreading diversity in multi-cell neutron-induced upsets with device scaling. In:
IEEE Custom Integrated Circuits Conference (CICC), pp. 437–444 (2006)
7. International technology roadmap for semiconductors (2013)
8. Jan, C.-H. et al.: A 22nm SoC Platform Technology Featuring 3-D Tri-Gate and High-k/Metal
Gate, Optimized for Ultra Low Power, High Performance and High Density SoC Applications.
In: IEEE International Electron Devices Meeting (IEDM), pp. 44–47 (2012)
9. Kleeberger, V.B., Barke, M., Werner, C., Schmitt-Landsiedel, D., Schlichtmann, U.: Compact
Model for NBTI Degradation and Recovery under Use-Profile Variations and ist Applica-
tion to Aging Analysis of Digital Integrated Circuits. Microelectronics Reliability 54(6–7),
pp. 1083–1089 (2014)
10. Kleeberger, V.B., Graeb, H., Schlichtmann, U.: Predicting Future Product Performance:
Modeling and Evaluation of Standard Cells in FinFET Technologies. In: ACM/IEEE Design
Automation Conference (DAC) (2013)
11. Lu, D.D., Lin, C.H., Niknejad, A.M., Hu, C.: Compact Modeling of Variation in FinFET
SRAM Cells. IEEE Design & Test of Computers 27(2), 44–50 (2010)
12. Messenger, G.C.: Collection of charge on junction nodes from ion tracks. IEEE Trans. Nuclear
Science, vol. 29, issue 6, pp. 2024–2031 (1982)
13. Nassif, S.R., Kleeberger, V.B., Schlichtmann, U.: Goldilocks failures: Not too soft, not too
hard. In: IEEE International Reliability Physics Symposium (IRPS) (2012)
14. Nassif, S.R., Mehta, N., Cao, Y.: A resilience roadmap. In: IEEE Conference on Design,
Automation & Test in Europe (DATE) (2010)
15. Pehl, M.: Discrete Sizing of Analog Integrated Circuits. Ph.D. thesis, Technische Universität
München (2012)
16. Predictive technology model (ptm). available at http://www.eas.asu.edu/~ptm
7 Circuit Resilience Roadmap 143
17. Saha, S.K.: Modeling process variability in scaled CMOS technology. IEEE Design & Test of
Computers 27(2), 8–16 (2010)
18. Sinha, S., Yeric, G., Chandra, V., Cline, B., Cao, Y.: Exploring sub-20nm FinFET design with
predictive technology models. In: ACM/IEEE Design Automation Conference (DAC) (2012)
19. Wang, W., Reddy, V., Krishnan, A.T., Vattikonda, R., Krishnan, S., Cao, Y.: Compact modeling
and simulation of circuit reliability for 65nm cmos technology. IEEE Trans. on Device and
Materials Reliability 7(4), 509–517 (2007)
20. Wann, C., Wong, R., Frank, D., Mann, R., Ko, S.B., Croce, P., Lea, D., Hoyniak, D., Lee, Y.M.,
Toomey, J., et al.: SRAM cell design for stability methodology. In: International Symposium
on VLSI Technology (VLSI-TSA-Tech), pp. 21–22 (2005)
21. Wirth, G., Vieira, M., Neto, E., Kastensmidt, F.: Generation and propagation of single event
transients in cmos circuits. In: Design and Diagnostics of Electronic Circuits and systems, 2006
IEEE, pp. 196–201 (2006)
22. Zhang, M., Mitra, S., Mak, T., Seifert, N., Wang, N.J., Shi, Q., Kim, K.S., Shanbhag, N.R.,
Patel, S.J.: Sequential element design with built-in soft error resilience. IEEE Transactions on
Very Large Scale Integration (VLSI) Systems 14(12), 1368–1378 (2006)
23. Zhao, W., Cao, Y.: New generation of predictive technology modeling for sub-45nm early
design exploration. IEEE Trans. Electron Devices 53(11), 2816–2823 (2006)
Chapter 8
Layout Aware Electromigration Analysis
of Power/Ground Networks
1 Introduction to Electromigration
D. Li
Qualcomm Inc, 5775 Morehouse Dr, San Diego, CA 92121, USA
e-mail: dianl@qti.qualcomm.com
M. Marek-Sadowska ()
University of California, Santa Barbara, CA 93106, USA
e-mail: mms@ece.ucsb.edu
S.R. Nassif
Radyalis, 7000 North Mopac Expressway, Suite 200, Austin, TX 78731, USA
e-mail: srn@radyalis.com
a b
extrusion
void
extrusion which will grow in size over time and eventually form an open or short
circuit as shown in Fig. 8.1.
In reality, void formation [3] occurs much faster than extrusion growth, which
is very unlikely to occur [4]. Therefore voids are considered the major EM-
caused failures. In this chapter, we only focus on the EM-induced void damage
of interconnects.
The ion movement is caused by several forces, the two main ones are: (1) The
direct electrostatic force Fe caused by the electric field, and (2) The force from the
momentum transfer with moving electrons Fp , which is in the opposite direction to
the electric field [5]. The resulting force Fres acting on an ion is given by (8.1).
Fres D Fe Fp D q Ze C Zp E D qZ j (8.1)
In (8.1), q is the electron charge, Ze is the nominal valence of the metal, Zp accounts
for the magnitude and direction of the momentum exchange between the conducting
electrons and the metal ions, Z* is the effective valence, E is the electrical field, j is
current density and is the metal resistivity.
Black’s Equation is a mathematical model for the mean time to failure (MTTF) of a
wire due to electromigration [6]. The equation is given by (8.2).
Ea
M T TF D Aj n e . kT / (8.2)
experimental data taken at elevated temperature and stress levels in short periods of
time to the expected component failure rates under actual operating conditions.
Da jZ j ej
vd D (8.3)
kT
I.A. Blech observed that the metal ions stopped moving, when the wire length was
reduced to a certain length. Also, he observed that no ion drift could be detected
below a threshold current density.
These observations can be explained by considering the flux due to electromigra-
tion and the gradient of the chemical potential via a gradient of mechanical stress
[7–10] according to (8.4)
D v Cv @
Jv D jZ j ej (8.4)
kT @x
where ˝ is the atomic volume, and is the hydrostatic stress. This equation shows
that a gradient of mechanical stress acts as driving force against electromigration.
Thus, electromigration stops, when the opposing stress gradient, commonly referred
to as “back stress” [11], equals the electromigration driving force, so that Jv D 0.
This steady-state condition is the so-called “Blech Condition”, given by (8.5)
@ jZ j ej
D (8.5)
@x
148 D. Li et al.
jZ j ej
.x/ D 0 C x (8.6)
where 0 is the stress at x D 0. This equation shows that the stress varies linearly
along the line, when the backflow flux equals the electromigration flux. Given that
the maximum stress a conductor line can withstand is th , the critical product for
electromigration failure can be stated as (8.7)
.th 0 /
.jL/c D (8.7)
jZ j e
This is the so-called “Blech Product”. The critical product provides a measure of
the interconnect resistance against electromigration failure and several experimental
works have reported that the critical product for modern copper interconnects is in
the range from 2,000 to 10,000 A/cm [12–15].
Current density of a wire is a ratio of the total current flowing through that wire, I,
and its cross-section area, A (8.8).
I
j D : (8.8)
A
According to Black’s equation, MTTF of a wire is inversely proportional to its
current density. And their correlation depends on the current exponent n. There is
some controversy across EM literature regarding the exact value of n. A common
agreement is that the typical value of n is between 1 and 2. Interestingly, the most
important factor that determines the value of n is the current density itself. The
value of n is close to 1 when the current density is small, and it gradually increases
to two as the current density increases. For example, in [16], the authors report
n D 1 for j 0.1 MA/cm2 , n D 1.5 for 0.1 MA/cm2 < j < 1 MA/cm2 and n D 2 for
j 1 MA/cm2 ; in [17], the authors report n D 1.1 ˙ 0.2 for j 2.5 MA/cm2 and
n D 1.8 for j > 2.5 MA/cm2 . For n D 1, the time-average current density can be used
in (8.2) and the current waveform shape does not affect EM lifetime; else if n > 1,
transient current density should be considered and the shape of a current waveform
matters.
8 Layout Aware Electromigration Analysis of Power/Ground Networks 149
1.0
Ea=0.7eV
Normalized MTTF 0.8
0.6
0.4
0.2
0
300 310 320 330 340 350 360
Temperature (K)
1.3.2 Temperature
Wire length affects two parameters: jlimit and jeffect . According to Blech length effect,
there is a wire length limit that any wire shorter than that limit is considered EM-
immortal. From a different perspective, it can be also interpreted as given a wire
length, there is a current density limit that prevents the EM on the wire, as stated
in (8.9).
.th 0 /
jlim it D (8.9)
jZ j eL
Since the Blech length effect is caused by the mechanical stress buildup with a
reverse migration process, even if the current density on a wire is greater than jlimit ,
the reverse stress still exists. Therefore, the effective current density jeffect that is
150 D. Li et al.
a I1
I2
Ivia= I1+I2 = I3+I4
I3 I4
b c d
I1=1A I1=1.2A I1=1.9A
Fig. 8.4 Via-array current distribution. (a) 3D orthogonal wire structure with 4-by-4 via-array. (b)
Even distribution (c) uneven distribution (d) extreme uneven distribution
The interconnect structure that affects EM mainly refers to the via-array design. For
a single via, all current is flowing through the via and its current density can be
easily obtained. However in a via-array, the current is not evenly distributed into
each via. References [20, 21] discuss this effect.
In Li and Guan [22], a 3D orthogonal wire structure with a 4-by-4 via-array
is built as shown in Fig. 8.4a. Several current configurations are applied to the
structure, and current densities through the horizontal cross-sectional plane of
multiple vias are recorded as shown in Fig. 8.4b–d. It can be observed that when
all wire currents are identical as in Fig. 8.4b, current distribution into the vias is also
uniform. Greater differences among the wire segment currents cause the current
split into the vias to be more non-uniform.
This effect can be explained by the differences among path resistances through
different vias. A SPICE simulation on a pure resistive mesh validates this observa-
8 Layout Aware Electromigration Analysis of Power/Ground Networks 151
tion. Based on the resistive mesh model, the current distribution into each via can
be determined and the EM lifetime of the via-array can be estimated. A detailed
discussion can be found in [22].
anode
cathode Cu+
e- e- barrier
e- void anode layer
grain
boundary surface e- cathode
bulk
interface barrier Cu+
void
layer
atoms become separated from the grain boundaries/interfaces and are transported
in the direction of the current. Figure 8.6 shows the EM diffusion paths in a dual-
damascene copper interconnect structure, which is the main metallization process
in modern semiconductor fabrication [26].
In a dual-damascene structure, only a single metal deposition step is used to
simultaneously form the main metal line and the via beneath it. After the via and
trench recesses are etched, the via is filled in the same metal-deposition step that
fills the trench. Therefore, vias are always in the same copper (Cu) body as the
upper level metal, but are separated from the lower level metal by a barrier layer
which prevents copper diffusion into the dielectric as shown in Fig. 8.6. Typical
barrier layer materials are Ta, TaN, TiN and TiW. The diffusion processes caused by
electromigration can be divided into grain boundary diffusion, bulk diffusion and
surface diffusion. In general, surface diffusion is dominant in copper interconnects.
The diffusion path differences can be explained by their different activation energies
listed in Table 8.1.
2 Physical Simulation of EM
2.1 Background
The governing equation which describes the atom concentration evolution through-
out an interconnect segment, is the conventional mass balance (continuity) equation
in (8.11) [1].
@N .x; t /
Cr J D0 (8.11)
@t
In (8.11), N(x, t) is the atom concentration at the point with coordinates x D (x, y, z)
at the moment of time t, and J is the total atomic flux at this location. The total
atomic flux J is a combination of the fluxes caused by the different atom migration
forces. The major forces are induced by the electric current, and by the gradients of
temperature, mechanical stress and concentration: J D JE C Jth C JS C JC .
To define the fluxes mentioned above, we have (8.12).
8!
ˆ
ˆ J E D kTN
eZ jD0 e kT
Ea
ˆ
ˆ !
ˆ
ˆ
D E
0 kTa
ˆ
<! J th D NQ kT 2
e grad T
N D0 E a
J s D kT e kT gradH (8.12)
ˆ
ˆ !
ˆ
ˆ
Ea
J D D0 e kT gradN
ˆ
ˆ dNc
:̂ ! !
!
!
dt
C d i v J E C J th C J s C J c D 0
In (8.12), JE , Jth , JS and JC are atomic fluxes induced by electrical, thermal, stress
and atomic concentration forces, respectively. N is the atomic concentration, Q* is
the specific heat of transport of metal, and H is the local hydrostatic stress. EM
lifetime is considered to be proportional to atomic flux divergence (AFD D divJ), so
we have here MTTF / AFD.
154 D. Li et al.
The 3D interconnect structure used for simulation is a copper wire segment with
a via below the wire. The wire geometry is specified by its 5 m width, 100 m
length and 1.5 m height; the via is of 2.5 m size and is 1.5 m off the wire edge;
the initial temperature is set to 300 K; the activation energy is set to 0.7 eV for the
copper body and is set to 1.25 eV along the surface to mimic the behavior of the
interface between the copper body and the cap layer. The current is flowing from
the via bottom up to the wire, then to the other end of the wire. A 3D view is shown
in Fig. 8.7.
We build an EM physical simulator using ANSYS [29] and a CCC program.
ANSYS is a multi-physics FEM-based tool, which provides AFD distribution for
specified physical parameters including temperature, current density and stress. The
whole simulation time is divided into many small time steps 4t. During each time
step, ANSYS simulates the wire structure and CCC program calculates the AFD
using (8.12), then the AFD is fed back into ANSYS for simulation of the next time
step. This ANSYS/CCC loop continues until an EM failure state is reached (e.g.
10 % resistance increase).
Figure 8.8 shows the AFD contour map of the interconnect structure. The wire
end that connects to the via below has large AFD, while small AFD is found at the
other end. The result matches well with the theoretical analysis and experimental
observations from real EM tests in literature [30, 31].
the adjacent dielectric are removed from the wafer at different rates during CMP,
creating surface anomalies and a varying topography [33]. Many factors, including
pattern geometry (e.g., line density), affect the material removal rates. The CMP
process strives to achieve flat topography to improve yield. Dishing is a copper
surface anomaly caused by CMP. Dishing occurs when copper recedes below the
level of the adjacent dielectric. Although dummy metal fills are added in order to
achieve a flat topography, dishing still occurs (Fig. 8.9).
3.1.2 EPE
dishing
necking
bumping
25
20
15
10
10 20 30 40 50
wire length (um)
We define the critical region of a wire as a region where the AFD is greater than
10 % of AFDmax (the maximum value of AFD along the entire wire). The critical
region boundary position is found to be a piecewise function of wire length and
current density as shown in (8.13), where k1 D 0.85, k2 D 1.65e-12, CR D 0.85e6,
CL D 1.02e16 and CC D 0.0915 are constants, l is the wire length and j is the current
density.
(
0 0l < CL
BL D k1 j
k1 l CL
l CL
( j k1 j
(8.13)
k2 .j C CC / l 0 l < CR
k1 j 2
BR D
k1 l Cj R2 l CR
k1 j 2
A sample curve for current density 40 mA/mm2 is shown in Fig. 8.12. In this figure,
the square- and circle-marked lines represent distances from the cathode via to the
right and left boundaries of the critical region. The double arrowed lines mark the
lengths of critical regions. It can be observed that for a fixed current density, the
length of critical region does not change.
To quantify the effects of process variation we introduce the variation coefficient.
Definition 1
Variation coefficient Cvar is the ratio of an ideal wire EM lifetime tlife0 and EM
lifetime tlife of the corresponding wire with geometric variations.
158 D. Li et al.
Cvar
simulated
1.0
fitted h0
Cvar=kCMP.(h'/h0) h’
0.8
0.6
0.4
0.2
h'/h0
0.2 0.4 0.6 0.8 1.0
tlif e 0
Cvar D (8.14)
tlif e
Unlike for process variation induced global effects on EM lifetime, when consider-
ing local effects, we assume that the amount of charge passing through a wire does
not change.
There are many factors that affect the EM lifetime of a wire with CMP/EPE
variations. However, instead of investigating those detailed physical effects, we
try to extract the major components that are most important to EM lifetime with
variations and develop a simplified model that can be easily applied to full-chip
interconnect EM analysis.
The ratio between an imperfect and ideal wire height (h’ /h0 ) is the dominant
factor for Cvar of CMP dishing. This dependency is given by (8.15), and shown in
Fig. 8.13, for kCMP D 1 and m D 2.
m
h0
Cvar D kCMP (8.15)
h0
To determine Cvar of EPE necking, we first consider the case where a single
circular dent is present on a wire. The dent can be completely described by its depth
d, length l and location x, and Cvar is a function of these three parameters captured
by (8.16), where b and w are the length of the critical region and the width of a wire;
8 Layout Aware Electromigration Analysis of Power/Ground Networks 159
BL BR
Cvar
12 simulated
d=0.03um, l=1um
10 fitted
2
X(um)
10 20 30 40
kEPE is a fitted parameter equal to 7.2e18. Figure 8.14 shows a sample plot for Cvar
of EPE necking with d D 0.03 m and l D 1 m.
8
< 1x BL or x BR
d l
Cvar D f ; ;x D 1 C kEPE bl dw .x BL / BL < x .BL C BR / =2
w b :
1 kEPE bl dw .x BR / .BL C BR / =2 < x < BR
(8.16)
When multiple dents are present on a wire, their cumulative effect is found to be the
product of Cvar caused by each dent, given by (8.17).
Y
Cvar D Cvar;i (8.17)
i
Vdd Vdd
Vdd Vdd
C C D D
C C D D
A A B B
A A B B
Natural
boundary
4 Full-Chip EM Analysis
Power grid of a VLSI chip carries unidirectional currents of large magnitudes and it
is the most EM-vulnerable part of the on-chip interconnect network. In this chapter,
we focus on power grid wires only.
In Fig. 8.15, we show the circuit-level model of power grid used in our work.
We assume that power is delivered through a controlled collapse chip connection
(C4) package. Wires and vias are modeled as resistors, external voltage supplies are
modeled as ideal voltage sources, and switching transistors are modeled as current
sources.
There are many different power grid EM analysis flows depending on data format
types used to describe the power grid. In this chapter, we use the SPICE format. The
link between the SPICE netlist naming and numbering scheme for the circuit and
the original geometry of the power grid is described below.
• Node name:
n<net-index>_<x-location>_<y-location>
• Data associated with each layer starts from:
* layer: <name>,<net>_net: <net-index>
Each layer/net combination is associated with a unique net-index.
• Vias starts from:
* vias from: <net-index> to <net-index>
Vias are implemented as resistors or as zero voltage sources.
• Current source:
8 Layout Aware Electromigration Analysis of Power/Ground Networks 161
150
VSS Contact
125
100
M2 VDD M2 VSS
75
M1 VDD
50
M1 VSS
25
VDD Contact
0
0 25 50 75 100 125 150
Blech length effect can be used to quickly filter out a great number of EM-immortal
wires which do not require further consideration. The remaining EM-mortal wires
will be processed by a more time-consuming EM lifetime calculation procedures.
The studies of Blech length effect usually consider simple straight point-to-point
wires, for which the values of j and L can be easily determined. For complex
interconnect topologies the accumulated jL product along the longest possible path
is usually taken. However, in [36], the authors show that it is non-trivial to extract
jL product in a complex interconnect topology, and they develop an effective jL
extraction strategy based on EM physics, as stated in (8.18). In (8.18), jk and Lk are
the current density and length of a wire segment. The sum is taken over all possible
paths in the network, with (jL)eff being the maximum of these sums. The effective
162 D. Li et al.
The effective jL product extraction (8.18) can be applied to any complex intercon-
nect structure. But instead of computing all paths in a power grid, the effective jL
can be obtained as follows. For any power wire segment, find the longest consistent
current path that contains it and compute the sum of individual jL products of all
wire segments along the path. For example, in Fig. 8.17, paths A, B and C are
longest consistent current paths in the given power track, so we have (8.19).
The Black’s equation given by (8.2) has its limitations, since it ignores some
physical factors that can substantially affect EM lifetime such as critical stress for
void nucleation, material effective modulus and diffusivity. Many physics-based EM
lifetime calculations are developed in literature. We adopt the method presented in
[36], which obtains EM lifetime based on the void nucleation and growth model. It
also provides EM lifetime calculation for complex interconnect topologies, which
can be applied to a power grid.
Here, we briefly explain how to calculate EM lifetime of a power grid wire
segment. Detailed discussion can be found in [36]. In Fig. 8.17, arrows denote
electron flow in a wire segment. First, each wire segment EM failure is associated
with its cathode via. So we have (8.20).
tL1 D tvia1
tL2 D tvia2 (8.20)
tL3 D tL4 D tvia4
8 Layout Aware Electromigration Analysis of Power/Ground Networks 163
Next, EM lifetime of a via depends on the metal atomic flux divergences contributed
by all the connected wire segments. Due to the flux divergence, a void is nucleated
when threshold stress is reached. The time for a void nucleation comprises the first
part of (8.21). After a void is nucleated, it keeps growing until it reaches a critical
length such that EM failure occurs. The time for void growth comprises the second
part of (8.21).
0 X p 12
r r Di
B void kT i C
tlif e B
D@ X C C Lc kT X 1 (8.21)
eZ 4 B A eZ
D i ji D i ji
i i
In (8.21), void is the critical stress required to nucleate a void, ji and Di are the
current density and the effective diffusivity of each wire segment i connected to
a via, B is the effective modulus of the materials surrounding the metal, Lc is the
critical void length, and T is the temperature. In reality, if the actual chip temperature
profile is available, it should be used for EM lifetime calculation. However, here
we assume temperature is uniformly fixed to 300 K. The effective diffusivity D is
computed using equation (8.22):
Ea w
D D D0 e kT (8.22)
h
In (8.22), w is the wire width and h is its height. When all Di values are the same,
which is typical in a power grid, the lifetime of a wire segment and thus the lifetime
of its cathode via is inversely proportional to the current density through the via.
The parameter values we used are listed in Table 8.2.
Wires are classified into three subgroups based on their analytical lifetime: EM-
safe, EM-sensitive and EM-weak. The classification criteria are listed below.
EM-safe: tlife > (1 C ˇ) treq ;
EM-weak: tlife < treq ;
EM-sensitive: treq tlife (1 C ˇ)treq ,
where treq is the required lifetime, assumed here to be 10 years. ˇ is the lifetime slack
threshold; its value is selected according to the maximum lifetime reduction found
164 D. Li et al.
by the variation-aware analysis. The wires with lifetime tlife > (1 C ˇ) treq are EM-
safe and should not fail even if they experience the worst process variation effects.
Wires with treq tlife (1 C ˇ)treq are considered EM-sensitive and their status is
determined by the variation-aware analysis using the compact model. A user can
either use a pre-estimated upper bound of ˇ, or run several iterations of our flow
to tune the value of ˇ. For example, if initially the value of ˇ is set to 10 %, and
later variation-aware analysis reports a 12 % maximum lifetime reduction, then the
user would adjust ˇ value to a number greater than 12 %. In practice, ˇ is within a
range of 10–15 %, thus the convergence of tuning ˇ is not considered a problem in
our work. All EM-sensitive wires will then be processed by the variation-aware EM
analysis.
CMP/EPE variations not only locally affect EM lifetime, but also cause global
power grid current redistribution from full-chip analysis point of view. However,
we need to know the exact CMP/EPE variations on the power grid to study those
redistribution effects.
For experimental purpose, to create the environment for studying global current
distribution changes due to resistance change, we apply Monte-Carlo simulation.
Monte-Carlo simulation samples are generated by first randomly throwing some
darts on the P/G network. Each such selected resistor, referred to as c-resistor, is a
variation center. Its own resistance and those of a few wires around it are changed.
Each c-resistor randomly picks a value according to a Gaussian distribution
N(, 2 ), where is mean and 2 is variance. This variation then propagates to
its neighbors with decreasing magnitude as the distance to the c-resistor increases.
The assumption is that wires in close layout proximity have similar imperfections
due to similar process conditions. Figure 8.18 gives an example.
In Fig. 8.18, the nominal value of each horizontal resistor is 5 and of vertical
resistor is 4. A is an EM-sensitive wire. B and C are c-resistors in the perturbed
network. The perturbed resistance values are shown in Fig. 8.18b. An EM-sensitive
wire segment may also become a c-resistor.
Currents of the EM-sensitive wires have to be computed in the perturbed
grid. Solving Kirchhoff’s Current Law (KCL) or Kirchhoff’s Voltage Law (KVL)
equations for the whole system each time a perturbed network is generated is
infeasible due to a huge computational cost. Here, knowing the node voltages in the
nominal network, we apply random walk method [38] to compute current flows in
its perturbed variant. Random walk is a technique that allows localized computation;
its principle is briefly described below using Fig. 8.19.
A random walk starts from any node x, and then travels to some adjacent
node according to a certain probability as in (8.23) which depends on electrical
conductance between the two nodes. In (8.23), Vx is the voltage of the node x,
degree(x) is the number of nodes adjacent to x, gi is the electrical conductance
8 Layout Aware Electromigration Analysis of Power/Ground Networks 165
X
degree.x/
gi Ix
Vx D Vi (8.23)
iD1 X
degree.x/
X
degree.x/
gj gj
j D1 j D1
Lifetimes of all EM-sensitive wires are then recalculated for each perturbed
network. Two distributions are obtained: lifetime distribution of each EM-sensitive
wire and cumulative distribution function (CDF) of the full-chip EM reliability.
The variation-aware EM analysis tool VEMA includes four major steps: fast jL filter,
analytical EM lifetime calculation, global current redistribution and local variation
effect analysis. Figure 8.20 shows a flowchart for our variation-aware EM analysis
tool VEMA.
g2
g1
1 3
x
g3
g4
Ix
5 Experimental Results
We implemented our tool in CCC and tested it with several publicly released IBM
P/G benchmarks [35]. Experiments were carried on GUN/LINUX server with Intel
Xeon E5440 2.83 GHz CPU and 16 GB memory. The specification required EM
lifetime (referred to as treq ) was set to 10 years and the temperature was assumed to
be 300 K for all experiments.
In Fig. 8.21, we compare jL filter results from the accumulated and effective
jL product extraction for metal three layer in IBM PG1 benchmark. An obvious
reduction in EM-mortal wires can be observed in effective jL product distribution.
This is because the accumulated jL extraction uses the longest possible path for an
entire power track, thus a large number of wires are unnecessarily classified as EM-
mortal. This imposes an onerous burden on further EM analysis and significantly
diminishes the efficiency of fast jL filter.
Moreover, there is a risk of missing EM-mortal wires using the accumulated
jL product. For example, in Fig. 8.22, two currents with the same magnitude but
opposite directions are flowing from two ends of a power track and meet at the
middle point. The accumulated jL extraction will classify this wire as EM-immortal,
because the accumulated jL product would be zero, whereas the effective jL product
extraction suggests computing jLs for the two consistent paths separately. The
quantitative results of jL filter are listed in Table 8.3.
When considering global effects of process variations, we assume that 10 % of
wires experience variations, with each c-resistor Ri following N(Ri , 0.001Ri 2 ) distri-
bution. We simulated the global effects of process variation on a randomly selected
EM-sensitive wire, named W2 , taken from IBM PG2 benchmark. Figure 8.23 shows
the lifetime histogram with a trend line for W2 . The solid vertical line denotes tsp and
the dotted line marks MTTF for W2 . It can be observed that with probability 0.84,
W2 is safe from EM failure taking into account global effects of process variations;
Psafe,w2 D 0.84.
Table 8.3 Experimental results
Mortal Worst MTTF Avg. prob. no Avg. CMP Avg. EPE
Benchmarks Total #wires A E Miss (year) #EM-sensitive failure (%) tolerance (%) tolerance (%)
IBM PG1 30027 25196 10032 623 4.07 46 32.7 15.74 8.50
IBM PG2 208325 111245 23124 77 4.94 63 89.7 16.24 15.41
IBM PG3 1401572 33872 33309 546 4.90 917 82.0 14.37 12.04
IBM PG4 1560645 198103 66226 0 5.53 554 74.6 12.53 11.06
8 Layout Aware Electromigration Analysis of Power/Ground Networks
167
168 D. Li et al.
Begin
Read in P/G
Network (res, via)
N
jL>jLBlech? EM-immortal
Y
Initialize β N
tlife>(1+β)*treq? EM-weak
N
β>Cvarmax-1?
Y
Output EM
reliability report
End
Y
pq
PCDF .q/ D Psaf e;i (8.24)
iD0
8 Layout Aware Electromigration Analysis of Power/Ground Networks 169
Figure 8.24 shows the CDF curve for IBM PG2 benchmark. Point A on the CDF
curve corresponds to probability 0.95 of no more than eight EM-sensitive wire
failures, and the gray-shaded region beneath point A is highly sloped, elsewhere
it is gently sloped. This can be interpreted that strengthening the worst eight EM-
sensitive wires can boost EM-safe probability to 0.95; but beyond that due to gently
sloped region, the benefit of strengthening more EM-sensitive wires will be very
small.
Using the variation-aware compact EM lifetime model, we perform EM analysis
considering CMP/EPE variations. In Fig. 8.25, we show the EM lifetime distribution
for a randomly selected EM-sensitive wire W4 from IBM PG4 benchmark. For W4 ,
we assume 10 % EPE-caused area loss within the critical region. Local effects
caused by CMP and EPE are similar and manifest themselves as a shift of EM
lifetime distribution which reduces the ideal EM lifetime MTTF0 to MTTF’ and
causes a higher EM failure probability. The maximum achievable distance between
MTTF0 and treq indicates how critical EPE effect on MTTF is.
Fig. 8.21 EM-mortal map for IBM PG1. (a) Accumulated mortal map (b) effective mortal map
J J
Path A Path B
L L
Accumulated jL: jL+(-j)L=0
Modified jL: jLA=jL
jLB=jL
Definition 2
CMP criticality is defined as a ratio of the original wire height over the maximum
wire height loss that a wire can tolerate while keeping tlife > treq . EPE criticality is
defined as the ratio of original total critical region area over the maximum area loss
that a wire can tolerate while maintaining its lifetime greater than the required limit,
tlife > treq .
8 Layout Aware Electromigration Analysis of Power/Ground Networks 171
Fig. 8.26 Variation criticality map for IBM PG4. (a) CMP criticality (b) EPE criticality
In Fig. 8.26, we report CMP/EPE criticality map for metal five layer in IBM
PG4 benchmark. For high criticality EM-sensitive wires process variation induced
printing imperfections should be minimized by techniques such as better OPC or
more careful insertion of dummy metal fill.
All these results are quantitatively summarized in Table 8.3. In Table 8.3, the
sub column labeled A under column Mortal provides the accumulated jL extraction
values, the sub column labeled E refers to effective jL extraction and the sub column
labeled Miss gives the number of EM-mortal wires missed by the accumulated jL
extraction. The column labeled Avg. prob. no fail is the average probability that
EM-sensitive wires will not fail.
Conclusions
As technology scales down, EM is becoming progressively a more serious
problem. It posts huge challenge on analysis tools and limits the application
of conventional EM theories such as Black’s equation and Blech length effect.
Moreover, the process variation brings in more uncertainty to EM analysis.
In this chapter, we introduced the basic background knowledge of EM, and
described the tool VEMA, which is a post-design EM analysis and reliability
evaluator. Fast jL filter and ideal EM lifetime calculation are performed to
quickly filter out EM-immortal wires and classify all EM-mortal wires into
three subgroups: EM-weak, EM-safe and EM-sensitive. Variation-aware EM
analysis is applied on EM-sensitive wires. The variation effects mainly refer
(continued)
172 D. Li et al.
to the changes of wire geometry caused by CMP and EPE that affect the EM
lifetime of individual wires. A compact variation-aware EM lifetime model
is developed. Using this model, variation tolerance of each EM-sensitive wire
can be fed back to designers. As technology is scaling down, process variation
on wires becomes increasingly more significant. Our work shows that its
effect on EM reliability is non-negligible, and a variation-aware EM analysis
such as VEMA may provide a more realistic assessment of the EM reliability
of power grids.
Acknowledgments Research support was provided in part by SRC through IBM Corporation and
in part by NSF grant CCF-1115663.
Many people have contributed at various stages to this book chapter. In particular we would
like to thank our collaborator from IBM, Dr. Baozhen Li for motivating this work and for support.
We also thank Xiang Qiu, Zhong Guan and Ali Abbasinasab from the University of California,
Santa Barbara for their helpful discussions.
There also have been many discussions with professors at the University of California, Santa
Barbara. In particular, we would like to thank Professor Tim Cheng and Professor Li-C Wang.
References
1. http://en.wikipedia.org/wiki/Electromigration.
2. J. Lienig, “Invited Talk: Introduction to Electromigration-Aware Physical Design,” Interna-
tional Symposium on Physical Design, 2006, pp. 39-46.
3. F. Wei, C. L. Gan, T. L. Tan, et.al, “Electromigration-induced extrusion failures in Cu/low-k
interconnects,” Journal of Applied Physics, 104, 023529 (2008).
4. R. S. Sorbello, “Theory of the Direct Force in Electromigration,” Phys. Rev. B, vol. 31, no. 2,
pp. 798-804, 1985.
5. J. R. Black, “Electromigration - A Brief Survey and Some Recent Results,” IEEE Trans. on
Electron Devices, Vol. ED-16 (No. 4), pp. 338-347, April 1969.
6. I. A. Blech, “Electromigration in thin aluminum films on titanium nitride,” J. Appl. Phys.,
vol. 47 (1976), pp. 1203–1208.
7. I. A. Blech and C. Herring, “Stress Generation by Electromigration,” Appl. Phys. Lett., vol. 29,
no. 3, pp. 131-133, 1976.
8. I. A. Blech and K. L. Tai, “Measurement of Stress Gradients Generated by Electromigration,”
Appl. Phys. Lett, vol. 30, no. 8, pp. 387-389, 1977.
9. C. Herring, “Diffusional Viscosity of a Polycrystalline Solid,” J. Appl. Phys., vol. 21,
pp. 437-445, 1950.
10. http://www.iue.tuwien.ac.at/phd/orio/node26.html.
11. E. T. Ogawa, A. J. Bierwag, K.-D. Lee, H. Matsuhashi, P. R. Justinson, and et al., “Direct
Observation of a Critical Length Effect in Dual-Damascene Cu/Oxide Interconnects,” Appl.
Phys. Lett., vol. 78, no. 18, pp. 2652-2645, 2001.
12. D. Ney, X. Federspiel, V. Girault, O. Thomas, and P. Gergaud, “Stress-Induced Electromi-
gration Backflow Effect in Copper Interconnects,” Trans. Dev. Mater. Reliab., vol. 6, no. 2,
pp. 175-180, 2006.
13. L. Doyen, E. Petitprez, P. Waltz, X. Federspiel, L. Arnaud, and Y. Wouters, “Extensive
Analysis of Resistance Evolution due to Electromigration Induced Degradation,” J. Appl.
Phys., vol. 104, p. 123521, 2008.
8 Layout Aware Electromigration Analysis of Power/Ground Networks 173
14. A. S. Oates and M. H. Lin, “Void Nucleation and Growth Contributions to the Critical Current
Density for Failure in Cu Vias,” Proc. Intl. Reliability Physics Symp., pp. 452-456, 2009.
15. J. W. McPherson and P. B. Ghate, “A methodology for the calculation of continuous dc electro-
migration equivalents from transient current waveforms,” in Proc. Symp. on Electromigration
of Metals, New Orleans, LA, pp. 64-74, Oct. 7-12, 1984.
16. C. K. Hu, R. Rosenberg, H. S. Rathore, et.al, “Scaling Effect on Electromigration in On-Chip
Cu Wiring,” International Conference on Interconnect Technology, 1999, pp. 267-269.
17. N. Srivastava, K. Banerjee, K. E. Goodson, “Scaling Analysis of Multilevel Interconnect
Temperatures for High-Performance ICs,” IEEE Trans. on Electron Devices, volume 52, issue
12, 2005, pp. 2710-2719.
18. B. Li, C. Christiansen, C. Burke and et al., “Short Line Electromigration Characteristics and
their Applications for Circuit Design,” International Reliability Physics Symposium, 2013,
pp 3.F.2.1-3.F.2.5.
19. M. Lin, N. Jou, W. Liang and K. C. Su, “Effect of Multiple Via Layout on Electromigration Per-
formance and Current Density Distribution in Copper Interconnect,” International Reliability
Physics Symposium, pp. 844-847, 2009.
20. N. Raghavan and C. M. Tan, “Statistical Modeling of Via Redundancy Effects on Interconnect
Reliability,” International Symposium on the Physical and Failure Analysis of Integrated
Circuits, pp. 1-5, 2008.
21. D. Li, Z. Guan, M. Marek-Sadowska and S. R. Nassif, “Multi-Via Electromigration Lifetime
Model,” International Conference on Simulation of Semiconductor Process and Devices,
pp. 308-311, 2012.
22. http://en.wikipedia.org/wiki/Activation_energy.
23. http://en.wikipedia.org/wiki/Copper_interconnect.
24. R. L. D. Orio, “Electromigration Modeling and Simulation,” doctoral dissertation, Institute for
Microelectronics, TU Wien, 2010.
25. D. Li, M. Marek-Sadowska and S. R. Nassif, “A Method for Improving Power Grid Resilience
to Electromigration-Caused Via Failures,” to appear in IEEE Trans. on VLSI Systems.
26. D. Dalleau, “3-D Time-depending Simulation of Void Formation in Metallization Structures”,
Doctoral thesis, University of Hannover, 2003.
27. W. Li, C. M. Tan and N. Raghavan, “Dynamic simulation of void nucleation during electro-
migration in narrow integrated circuit interconnects,” Journal of Applied Physics 105, 014305
(2009).
28. http://www.ansys.com.
29. M. Lin, N. Jou, W. Liang and K. C. Su. Effect of Multiple Via Layout on Electromigration
Performance and Current Density Distribution in Copper Interconnect. IEEE 47th Annual
International Reliability Physics Symposium, Montreal, 2009, pp. 844-847.
30. B. Li, J. Gill, C. J. Christiansen, et. al. Impact of Via-Line Contact on Cu Interconnect Electro-
migration Performance. IEEE 43rd Annual International Reliability Physics Symposium, San
Jose, 2005, pp. 24-30.
31. http://en.wikipedia.org/wiki/Chemical-mechanical_planarization.
32. F. M. Serry, D. Dawson, “Minimizing Dishing and Erosion in Copper CMP,” http://www.veeco.
com/pdfs/database_pdfs/minimizing_de_in_copper_cmp_45.pdf.
33. https://www.si2.org/openeda.si2.org/dfmcdictionary/index.php/Edge_Placement_Error.
34. S. R. Nassif, “Power grid analysis benchmarks,” Asia and South Pacific Design Automation
Conference, 2008, pp. 376-381.
35. S. P. Hau-Riege, “New Methodologies for Interconnect Reliability Assessments of Integrated
Circuits,” Doctoral thesis, Massachusetts Institute of Technology, 2000.
36. C. W. Chang, Z. –S. Choi, C. V. Thompson, et.al, “Electromiration resistance in a short three-
contact interconnect tree,” Journal of Applied Physics 99, 094505 (2006).
37. H. Qian, S. R. Nassif, and S. S. Sapatnekar, “Power Grid Analysis Using Random Walks,”
IEEE Trans. on CAD, vol. 24, no. 8, August 2005, pp. 1204-1224.
38. S. M. Alam, C. L. Gan, C. V. Thompson, et al, “Reliability computer-aided design tool for
full-chip electromigration analysis and comparison with different interconnect metallizations,”
Microelectronics Journal 38 (2007) 463-473.
Chapter 9
Power-Gating for Leakage Control and Beyond
Abstract The need of reliable nanometric integrated circuits is driving the EDA
community to develop new automated design techniques in which power consump-
tion and variability are central objectives of the optimization flow.
Although several Design-for-Low-Power and Design-for-Variability options are
already available in modern EDA suites, the contrasting nature of the two metrics
makes their integration extremely challenging. Most of the approaches used to
compensate and/or mitigate circuit variability (e.g., Dynamic Voltage Scaling and
Adaptive Body Biasing) are, in fact, intrinsically power inefficient, as they exploit
the concept of redundancy, which is known to originate power overhead.
In this work, we introduce possible solutions for concurrent leakage minimiza-
tion and variability compensation. More specifically, we propose Power-Gating as
a mean for simultaneously controlling static power consumption and mitigating the
effects induced by two of the most insidious sources of variability, namely, Process
Variations (PV) due to uncertainties in the manufacturing and Transistor Aging due
to Negative Bias Temperature Instability (NBTI).
We show that power-gating, when implemented through the insertion of dedi-
cated switches (called sleep transistors), has a double effect: On one hand, when
sleep transistors are enhanced with tunable features, it acts as a natural supply-
voltage regulator, which implements a control knob for PV compensation; on the
other hand, during the idle periods, it makes the circuits immune to NBTI-induced
aging.
We describe optimization techniques for the integration of a new concept of
power-gating into modern sub-45 nm design flows, that is, Variation-Aware Power-
Gating. The experimental results we have obtained are extremely promising, since
they show 100 % timing yield under the presence of PV and circuit lifetime
extension of more than 5 in the presence of NBTI.
1 Introduction
MOS devices are approaching the size of atoms, which is a fundamental barrier for
the scaling process of the bulk technology. The research community is thus actively
pursuing alternative materials, fabrication processes, devices and architectures
to be adopted in mainstream circuit and system manufacturing. While several
new solutions have shown good potential (e.g., carbon nanotubes and graphene,
memristors, spin-based devices, ferromagnetic logic, atomic switches, NEMS), the
debate of which of them will prevail is still open; therefore, nanometer CMOS will
remain the dominant technology on the electronics market for a few years.
The 2009 ITRS Roadmap [17] reported that static power consumption and
variability are the most serious concerns for the design of nanometer ICs below
45 nm. Static power due to internal leakage mechanisms represents the main source
of power consumption in modern CMOS circuits [28], which have shown to be
power-hungry even when not switching. Variability, instead, refers to the marked
tendency of a manufactured circuit to show a deviation from its nominal behavior.
Main sources of variability include random and systematic process variations [4],
environmental condition variations [33] (e.g., temperature and Vdd fluctuations)
and aging effects (e.g., Negative Bias Temperature Instability and Hot Carrier
Injection) [1]. While static power translates to low energy efficiency, variability
originates lower reliability and lower fabrication yield; both factors make electronic
circuits less reliable.
Static power and variability are not new in the EDA field, and various options for
their management are already available. However, while in the past considering the
two as independent variables in the design space was accurate enough to obtain rea-
sonable results, advanced technology nodes are showing the need of holistic design
strategies that could provide concurrent static power optimization and variability
compensation. Unfortunately, this challenge is complicated by the fact that most of
the design solutions for compensating variability are intrinsically power inefficient:
Fault-tolerance approaches, such as “fail and correct”, or adaptive strategies, such as
“dynamic voltage regulation” or “forward body biasing”, are based on the concept
of redundancy, which is in contrast with low-power requirements.
In this work, we address this critical issue and we propose the use of
power-gating [2] as the enabling technology for achieving simultaneous leakage
optimization and variability compensation.
Power-gating (PG) is based on the insertion of a dedicated MOS switch, called
sleep transistor, between the gated block and the actual ground rail. This provides
the circuit with two power modes: A low-power mode, during which the sleep
transistor is turned-off and the leakage power is reduced by more than one order
of magnitude, and an active mode, during which the sleep transistor, which is
turned-on, guarantees a normal connection to the global power-rails. What is
interesting to note is that, in terms of variability, the sleep transistor shows important
and unique properties, that is: During the active periods, when enhanced with
tunable-width features, it can act as a natural supply-voltage regulator usable to
9 Power-Gating for Leakage Control and Beyond 177
Among all the leakage current mechanisms induced by Short Channel Effects
(SCEs), the sub-threshold current Isubth has proven to be the major contributor
to the total static power consumption [28].
Isubth is defined as the drain-to-source current which flows when the transistor
operates in the weak inversion region, i.e., when the gate voltage Vg is below the
threshold voltage Vth . Under this condition, the channel shows a small, but non-
zero concentration of minority carriers that are diffused from the drain to the source
terminal whenever a potential greater than 0 is applied between drain and source,
i.e., Vds > 0.
178 A. Calimera et al.
A well-known model for the Isubth of a single nMOS transistor is given by the
following equation [28]:
W Vg Vth Vds
Isubth D Cox .m 1/v2T e mvT .1 e vT / (9.1)
L
with
Cd m
mD1C (9.2)
COX
Process variations (PV) [4, 5] are mostly due to random fluctuations of dopant
atoms, which result in the mismatching of the electrical characteristics of transistors
in the same die, and to systematic or non-systematic impreciseness of the manu-
facturing process (like lithography, etching, and chemical-mechanical polishing).
Process variations have a significant impact both on the power dissipation and
performance of a design: 20 variation in leakage power for a 1.5 variation in
delay between fast and slow dies has been reported in the literature. Given the
power/performance tradeoff, those figures translate into an increase of dies that
must be discarded because either too slow or too power consuming. Therefore,
as a relevant side effect, increased variability decreases yield, with important cost
implications.
Aging, instead, includes all those wear-out mechanisms that induce time-
dependent degradation of the operating characteristics of devices [1]. Two are the
main sources of aging in active devices: Bias Temperature Instability (BTI), and
Hot Carrier Interface (HCI) [26]. Both these physical/chemical effects result in the
generation of interface traps at the silicon/oxide interface and cause a drift of the
threshold voltage over time. These irreversible effects, and BTI in particular, have
traditionally been regarded as “reliability issues”, and have only recently received
some consideration in the CAD community as a factor affecting performance
of digital circuits. Traditional VLSI design, in fact, bypasses the analysis and
optimization of such dynamic networks by approximating the problem to the
optimization of uniform static networks with certain guard-band. This may induce
unacceptable design overheads.
180 A. Calimera et al.
VDD
Logic Block
VGND
SLEEP nMOS
Clustering
Layout Modification
Validation
To Router
In order to fully integrate clustering and sleep transistor insertion with a state-of-
the art physical design flow, placement and routing information have to be taken into
account within the core clustering algorithm that selects gates to be power-gated, as
well as in sleep transistor sizing and insertion.
Section 3.2 describes a common sleep transistor insertion flow suitable for the
variation-aware clustered strategies proposed in this work.
This section describes a methodology for estimating the maximum current drawn
by a cell cluster [30]. We define as Switching Window, T , of a gate under an
input pattern the interval between the arrival time and the output transition of the
gate. There is one switching window for each path through a particular gate. The
width of the switching window of a gate is equivalent to the propagation delay
of the gate for a rising or falling transition. For a given gate in the cluster, its
Maximum Switching Window (MSW) is the time interval encompassing all its
possible switching windows. If a gate has n switching windows, the set T1 ; : : : ; Tn
is called the Full Switching Window (FSW) of the gate.
The peak current estimation algorithm consists of two steps. The first one
computes the MSW for each gate in the cluster. The motivation behind extracting
the MSW and not all the switching times of a gate is that, for a complex circuit and
for a gate very deep in the logic, there is possibly a very large number of switching
time intervals, each one corresponding to a path being activated through a gate.
Moreover, it is very time consuming to extract all these windows for all the gates,
which form a cluster. Conversely, it is very fast to extract the MSW for each gate. In
S window T1 and
fact, it only requires the calculation of the first (earliest) switching
of the last (latest) one Tn ; the MSW is simply obtained as (T1 Tn ). We thus need
to extract only two switching time intervals for each gate in the cluster, which can
be accomplished very fast.
The second step consists of the construction of a current plot over time which
records the gates that switch at a particular time interval and, hence, the total
discharge current in that time interval. This plot is the superposition over time
of rectangles whose bases correspond to the MSWs of the individual gates of the
cluster (which will, in general, partially overlap), and whose heights correspond to
the currents drawn by the gates (e.g., derived by a technology library). From this
current plot, the time interval during which the maximum current discharge occurs
and the gates that contribute to this current discharge are obtained.
Figure 9.4 shows an example of current plot, in which five MSWs (from a to e)
are shown. The interval in which the current drawn is maximum corresponds to the
overlapping of the c, d , and e MSWs.
The current plot allows the identification of the time interval during which the
maximum current discharge could occur. To tighten this upper bound, we need to
extract the detailed FSW information only for those gates which contribute to this
maximum value, which are normally a very small percentage of the gates in the
cluster.
The algorithm enters then an iterative phase; it proceeds by finding, based on the
current plot, the subset of gates that contribute to this maximum current value. Gate
ordering on this subset of gates is executed first, so as to maximize the probability
of non-overlapping switching of these gates. After ordering the gates, gate-by-gate
FSW extraction is performed on the set of gates which contribute to the maximum
184 A. Calimera et al.
current. Once all the possible switching time intervals for this small subset of
gates are extracted, the current plot is updated and the process is repeated until
convergence is reached.
Convergence is guaranteed since, in the worst case, we have to extract the FSW
for all the gates in the cluster and compute their maximum currents. When full
FSW extraction is computationally too expensive, we may terminate the iteration
early, for example by using a time bound. Since the maximum current estimate
is monotonically non-increasing as the iteration proceeds (i.e., the upper bound is
progressively tightened), the use of a time bound provides us with a fine-grained
control of the accuracy vs. time trade-off.
VDD ˛drop
Rst D (9.3)
Ion
where VDD ˛drop is the allowed voltage drop across the sleep transistor, expressed
as a percentage of the supply voltage, while Ion is the maximum discharge current
9 Power-Gating for Leakage Control and Beyond 185
that power-gated cells inject into the sleep transistor during the active mode. The
estimation of the active current is not a trivial task. In fact, an erroneous estimation
of Ion translates to a sub-optimal sleep transistor sizing, which may result in area
and power increase or, even worse, in timing violations during the active mode [30].
Considering that the sleep transistor operates in the resistive region and knowing
Rst , its size can be properly evaluated using Eq. (9.4):
W 1
D (9.4)
L st Rst st COX .VDD Vthst VDD ˛drop /
where W =L is the ratio between width and length of the transistor channel, st is
the carrier mobility, COX is the oxide capacitance and Vthst is the threshold voltage.
The size of the sleep transistor is strongly influenced by its physical implemen-
tation, where carrier mobility, threshold voltage and gate length are key parameters.
Hence, for a proper sleep transistor sizing, indicating the type of MOS device is
mandatory: pMOS (i.e., header) or nMOS (i.e., footer). Although both devices can
be used as power-switches without distinction (i.e., maintaining the same leakage
reduction), nMOS transistors are usually more suitable. In fact, pMOS devices are
less leaky than nMOS ones, but they show a lower carrier mobility, which limits their
ON-current. As a result, in order to guarantee a certain current capability, pMOS
sleep transistors require more silicon area compared to nMOS.
In a typical power-gating approach, switch devices are high-threshold-voltage
transistors (i.e., HVT). Since a HVT transistor is less leaky, this choice helps
in reducing the total static consumption when the circuit is in stand-by mode.
Unfortunately, a larger threshold voltage reduces the current capability of the
transistor and, as demonstrated by Eq. (9.4), more area is required to achieve the
optimal resistance. On the contrary, with a low-threshold-voltage (i.e., LVT), the
silicon area taken by the switch device can be sensibly reduced, but its leakage
becomes significant.
Another design parameter that should be taken into account is the gate length.
Usually, the sleep transistors are designed with minimum channel length, but
for today’s nanometric technologies, this implies high leakage currents due to
Short Channel Effects (i.e., SCEs) and more power consumption. Increasing the
gate length allows to reduce the power consumption [15], but the current sinking
capability of the sleep transistor is drastically reduced and more area is required.
Moreover, since the majority of the fab processes are optimized for a single channel
length (typically, the minimum length), using devices with multiple lengths may
drastically increase process variation and sensibly reduce fabrication yield.
As discussed above, the sleep transistor has to be sized in order to limit the IR-
drop on the virtual rail when the maximum discharge current of the power-gated
cells is injected into the virtual ground. Under this IR-drop constraint, designers
can trade off channel length L and threshold voltage Vth (see Eq. (9.4)) to achieve
area and power optimization. In principle, maximum area efficiency (i.e., minimum
W ) is obtained using transistors with the largest current density capability, namely,
low-Vth transistors with minimum gate length. Obviously, this causes larger leakage
186 A. Calimera et al.
As anticipated in Sect. 3.1, when power gating is applied to a generic logic block,
an important dimension of the problem concerns whether the gating is applied to:
(1) The entire logic block (block-level or full power-gating); (2) a subset of the cells,
typically, the non-timing critical ones (partial or clustered power-gating); (3) all
the cells individually (cell-level power-gating). The block-level approach has the
drawback of having an high reactivation period since it may have long transition
delay between sleep and active state. The cell-level strategy is characterized by high
area overhead and huge sleep signals buffering. The most promising power-gating
style is then the clustered one, in particular the row-based style, which offers a
reduced overhead with respect to the cell-level approach and limited active/sleep
transition times with respect to the block level strategy.
Fig. 9.5 Example of a complete layout after a row-based sleep transistor insertion
extra rows needed to host the sleep transistor cells are added. As in the block-level
case the pins have to be moved and the SLEEP pin added.
For any clustering strategy, the power grid and its connection to the standard cells
are created. The smallest region that can be power-gated is limited by the distance
from the Vgnd power rail. The maximum distance from the Vgnd is equal to the
distance between two power stripes of the same type divided by two. To avoid
having regions of different sizes, the rows are always cut in the same points and
those are determined by the position of the vertical stripes. Each cut, if needed,
is done at the maximum distance from the Vgnd lines. To avoid wrong substrate
polarizations the distance between two stripes of the same type has been kept equal
to the one without power gating, while the width of the stripes has not been changed
in order to avoid the IR drop increase.
In the remainder of this chapter, we will consider row-based sleep transistor
insertion as the reference power-gating strategy. However, the solutions we present
could be successfully adapted to other kinds of sleep transistor insertion approaches.
188 A. Calimera et al.
WID variations are mostly due to systematic and random variations of several
physical device parameters, such as the concentration of doping atoms in the
substrate (Ni ), the effective channel length (Leff ) and width (Weff ) and the oxide
thickness (Tox ). The resulting effect is the shift of the electrical characteristics of
9 Power-Gating for Leakage Control and Beyond 189
the transistors, like the threshold voltage (Vth ) and the maximum current density
(Ids =W ), with a significant impact on the power dissipation and the performance of
a manufactured circuit [6].
Under these conditions, device parameters (and design metrics) have to be treated
as random variables, whose probability distribution function (PDF) depends on the
actual fabrication process. As pictorially summarized in Fig. 9.6, deviations from
the ideal case (i.e., where a given parameter X has a nominal, deterministic value)
are represented by a PDF with a given shape with average value avg.X / coincident
with the nominal value Xnom of the parameter; variability is related to the variance of
the PDF (e.g., the range of values between ˙3 ). Under the typical assumption that
values slightly exceeding the nominal value (Xmax in the figure) can be considered
as acceptable, we define the timing yield as the probability that X < Xmax , which
can be immediately obtained by the cumulative distribution function of X (area
below the PDF and delimited by Xmax ; striped region in the figure).
The active current Ion , drained by a circuit from the power supply during normal
operation, may represent a suitable metric for quantifying the performance degra-
dation induced by process-variation. In fact, the imperfections of the fabrication
process may affect several electrical parameters (such as channel dimensions and
threshold voltage), which in turn alter the actual current capability of the active
transistors, and thus, the intrinsic speed of the devices. Therefore, the larger the
speed degradation due to process variations, the smaller the resulting Ion .
During the active periods, the sleep transistor, that is turned-on and that operates
in the linear region (in Fig. 9.7, Ron represents the channel resistance in the linear
region), behaves as a current-to-voltage transducer that transforms the flowing
current Ion into a voltage drop (Vdrop in Fig. 9.7). The virtual-ground rail is now
190 A. Calimera et al.
at a potential higher than ground, and the circuit operates at a scaled supply voltage
(Vdd Vdrop ). By modulating Ron , and thus Vdrop , it is therefore possible to change
the operating point of the entire circuit: A lower Ron implies a faster circuit; a larger
Ron , on the contrary, will increase Vdrop thus making the circuit slower.
In summary, circuits made slower by PV can recover their speed by means of
Ron reduction, where Ron is proportional to the width of the sleep transistor.
The key aspect of this strategy is that the voltage regulation obtained through
Ron appears only temporarily while the circuit is switching. Once all the internal
switchings are completed, Ion falls to zero and so does Vdrop . In this sense, the
supply voltage Vdd is automatically adapted to the load, and there is no need of
an external voltage regulator. A quantitative evaluation of the above dependency is
given in Table 9.1, which reports delay values of a sample design (10 power-gated
chains in parallel, each consisting of 10 inverters) as a function of the total sleep
transistor width, in the presence of process variations.
Column dnom reports the nominal delay when the circuit is power gated with a
transistor of a given size W (in the example 20 “fingers” of 0.8 m each for a total
of 16 m) and ignoring the presence of variations. Columns dW C report worst case
delay values for different values of W . By worst case, we mean the extreme value
of the delay distribution obtained by randomly selecting 1,000 different instances of
the circuit with Monte-Carlo. The column dW C –W shows then the slowest circuit
instance (152:7 ps) when a transistor of size W is used for gating, i.e., the same
conditions for which dnom is measured. This instance is approximately 17 % slower
than the nominal one. If we then upsize the sleep transistor for such circuit instance,
we see that, as expected, we can progressively speed up the instance, asymptotically
reaching the nominal delay dnom .
9 Power-Gating for Leakage Control and Beyond 191
4.3.2 Architectures
A clustered sleep transistor insertion, on the other hand, may represent the
most appropriate architecture. In fact, tunability is required only for the cells that
determine the critical paths (CPs). On the contrary, all the other cells (i.e., cells for
which the delay increase due to process variations do not slow down the critical
paths) can be power gated with a regular, non-tunable, small-sized sleep transistor.
We call this approach Clustered-Tunable Power-Gating (right configuration in
Fig. 9.8). Under this scheme, two distinct sets of cells (i.e., clusters) use separate
sleep transistors: The critical cells identify the critical cluster C , which is gated by
a tunable transistor of size WC ; the rest of the cells form the non-critical cluster
NC , which is gated by a regular transistor of size WNC . Concentrating tunability
only where needed allows to reduce the total sleep transistor width (WNC C WC
Wmax ), and guarantees power savings while keeping the same delay compensation
capability.
The design methodology for the automated implementation of clustered TPG does
not differ from that described in Sect. 3. Starting from a (row-based) placed design,
clustering, peak-current estimation, sleep transistor sizing and sleep transistors
insertion are the main phases of the flow. However, working with clustered tunable
power-gating architectures requires some changes and additions to the algorithms.
Design of the Tunable Sleep Transistor Cell Tunable sleep transistor cells are
not available in standard CMOS libraries. Then, it is important to support design
kits with new customized cells that contain parallel sleep transistors of different
sizes and driven by dedicated control signals. Figure 9.9 shows the schematic of
a possible architecture, as described in [14] and [32]. Each parallel sub-transistor
is driven by a NAND gate that receives an external configuration bit (b3, b2, b1,
b0, in Fig. 9.9), and whose value can enable (in case of 1-logic) or disable (in case
of 0-logic) the corresponding transistor. An additional sleep signal provided by an
external power-management unit is in charge of defining the operating mode of the
gated circuit.
Statistical Static Timing Analysis At design time, characterizing the effects
induced by process variations is key. In fact, this allows the identification of
the critical paths in presence of process variations. To do so, an option consists
of integrating probabilistic models into standard Static Timing Analysis (STA)
engines. For instance, it is possible to resort to Monte Carlo statistical sampling,
where device parameters are stochastic variables described by process-dependent
PDF. During each sample, the circuit timing is computed using traditional STA
tools. The output of a Monte Carlo analysis includes the path delay distributions.
From that, we can extract the list of statistical critical paths, i.e., the list of paths that
show a certain probability to have a slack smaller than a user defined threshold.
9 Power-Gating for Leakage Control and Beyond 193
Clustering and Sleep Transistor Sizing The clustering phase is crucial, since it
defines which portion of the circuit has to be assigned to which cluster (critical C or
non-critical NC ). At this stage, it is important to identify the granularity at which
the sleep transistors are inserted into the layout. As described in Sect. 3, different
options are available, but row-based insertion can guarantee the finer granularity
at the minimum cost in terms of area and layout disruption. Hence, the clustering
problem comes down to the selection of which rows have to be considered as critical
or non-critical. An effective solution to this problem is to mark as critical those
rows that host at least one gate belonging to a statistical critical path [32]. Once
the clusters C and NC are formed, it is possible calculating their maximum active
currents, IC and INC ; respectively, and performing the sleep transistor sizing which
returns the actual width of the two transistors WC and WNC .
Sleep Transistor Insertion During this stage, the sleep transistor cells belonging
to a customized power-gating library are placed into the layout. If a row-based
approach is adopted, one can follow the strategy proposed in [31], where the
sleep transistors cells are placed in dedicated layout rows, called sleep rows. It is
worth mentioning that the sleep transistor of the critical cluster C is implemented
by means of modular tunable sleep transistor cells connected in parallel. All the
programmable sleep transistor cells are centered in their middle configuration,
namely, the configuration word is set in the middle of the dynamic range (1000
for the 4-bits cell shown in Fig. 9.9). This guarantees the maximal extension of the
compensation range. For the non-critical cluster one can use the same scheme, but
using non-tunable sleep transistor cells of fixed size [8].
Figure 9.10 offers a quantitative analysis of the timing yield and leakage
savings that a clustered tunable power-gating architecture (CLUSTERED-TPG)
may guarantee w.r.t. standard power-gating (FULL-PG) and block-level tunable
194 A. Calimera et al.
power-gating (FULL-TPG). The results are obtained from the simulation of a subset
of the ITC’99 benchmarks mapped onto an industrial 45 nm technology. Only the
averages are reported.
The tunable approach (i.e., CLUSTERED-TPG and FULL-TPG) originates a
timing yield increase from 80.67 % (FULL-PG) to 100 %; in other terms, tunable
power-gating allows to fully recover any speed degradation induced by process
variation, thus making the number of circuits that must be discarded because they
incur timing violations fall to zero.
Concerning leakage power, we observe that a FULL-TPG architecture can
successfully tackle the problem of variability compensation at the price of a drastic
reduction of the achievable leakage savings; due to a larger sleep transistor (leakage
savings go from almost 100 % for the FULL-PG to a mere 24.36 % for FULL-TPG).
On the contrary, a clustered scheme (i.e., CLUSTERED-TPG) guarantees the same
timing yield as FULL-TPG, while maintaining reasonable leakage savings (72.1 %).
Clearly, applying a clustered architecture is what makes the tunable approach
applicable to real-life circuits.
the generation of traps at the interface between the silicon and the oxide, resulting
into an increase over time of the threshold voltage of the transistors.
BTI affects both pMOS (Negative BTI—NBTI) and nMOS transistors (Positive
BTI—PBTI); in current technologies, the impact of PBTI is much lower than that of
NBTI, although its importance is expected to increase with the adoption of high-k
dielectrics in the gate-oxide interface [24]. Conversely, HCI effects are much more
significant in nMOS transistors, and are at least two orders of magnitudes larger
than for pMOS devices [12].
Between NBTI and HCI, NBTI is regarded as the most significant effect, because
the surface along which interface traps are created (i.e., the whole silicon-oxide
interface) is much larger than that of HCI (i.e., in the neighborhood of the drain
area) [1].
We can thus consider NBTI as the dominant source of aging of transistors
in current sub-45 nm bulk CMOS technologies. As we will show in the rest of
this section, the peculiar properties of NBTI and, in particular, its state-based
manifestation will allow to use power-gating as a powerful knob for aging control
while keeping the usual benefits in static power reduction.
NBTI effects occur when a pMOS is negatively biased (i.e., negative Vgs , or a
logic ‘0’ is applied—the stress state) and originate an increase of the threshold
voltage. When a zero-bias voltage is applied (i.e., a logic ‘1’), NBTI stress is
actually removed, resulting in a partial recovery (i.e., a decrease) of the threshold
voltage (the recovery state). While there is no full consensus on the exact quantum-
mechanical mechanisms that govern the NBTI effects, the reactivation-diffusion
model is accepted as accurate enough for pMOS NBTI aging [1]. A simplified
version of such a model is the following:
Ea 1
Stress State W Vth D ks e .t tst r / 4
kT (9.5)
r
t trec
Recovery State W Vth D kr .1 / (9.6)
t
where ks and kr are two parameters whose magnitude depends on few technological
parameters (such as channel strain and nitrogen concentration), k is the Boltzmann
constant, T is the device temperature, Ea is the activation energy, and tst r and trcv
denote stress and recovery times, respectively. t is the free variable and expresses
the temporal evolution of the threshold voltage drift.
Equations (9.5) and (9.6) show the qualitative behavior of NBTI stress and
1
recovery: Vth increases
p during stress with t 4 dependency, it decreases during
recovery with 1 1=t dependency, and it depends exponentially on temperature
during stress. The most relevant feature of the model, however, is that there are
196 A. Calimera et al.
two different behaviors based on the state of the device. This observation, coupled
with the experimental evidence that NBTI aging is frequency-independent [1, 22],
implies that it is the total stress time that matters rather than the actual dynamics of
stress-recovery.
A generic signal applied to a pMOS transistor can then be modeled as a periodic
waveform with equivalent amount of stress time, paving the way to a probabilistic
modeling of NBTI aging. This allows to lump the models of Eqs. (9.5) and (9.6) into
a single macromodel suitable to circuit-level simulation:
1
Vth D K ˇ t 4 (9.7)
CL Vdd
dD (9.8)
.Vgs Vth /˛
where CL is the load capacitance, Vgs is the gate voltage and ˛ is a technology-
related exponent that can be approximated to 1 for sub-90 nm technology.
Because of the NBTI-induced threshold voltage increase, the “aged” delay d 0 >
d then becomes:
CL Vdd
d 0 .t / D (9.9)
.Vgs .Vth C Vth .t ///
K .ˇ t /1=4
d 0 .t / D d .1 C / (9.10)
VGT K .ˇ t /1=4
where VGT D Vgs Vth;0 and Vth;0 is the threshold voltage at time 0.
Equation (9.10) allows translating circuit operations (in terms of signal probabil-
ities) to aging (i.e., delay increase over time) for the individual gates, similarly to
what it is done for estimating dynamic power based on switching probabilities.
9 Power-Gating for Leakage Control and Beyond 197
Equation (9.7) clearly shows that, for a specific manufactured device and for specific
operating conditions (temperature and Vdd ), there is one single knob that can be used
for mitigating the aging effects: The stress probability ˇ.
Ideally, one would like to make ˇ as small as possible. This would imply having
as many 1’s in the logic network as possible for the largest possible fraction of time.
Obviously, under normal conditions, this is infeasible, because: (1) To implement
meaningful functions, circuits require logic inversion, which by definition prevents
achieving arbitrarily small probability for a predetermined signal value (0 or 1).
(2) Circuit structure affects probability. On the other hand, information theory
suggests that a signal probability of 0 or 1 carries no information—entropy is 0, and
it is maximum for a 0.5 probability. Therefore, to implement a realistic function,
there must be a fair distribution of 0’s and 1’s.
Although the ideal objective is impossible to reach, technology-independent
and technology-dependent synthesis techniques can be used to minimize the
0-probability of internal signals. Kumar et al. [23] proposed multi-level synthesis
and technology mapping algorithms that adopt a modified, NBTI-oriented metric,
while Wu and Marculescu [19] used logic restructuring and pin reordering to exploit
the fact that not all transistors are identically important in determining the delay of
a gate in the pull-up network. These strategies can reduce the aging in the typical
state of the circuit (i.e., determined by the implementation and by the most common
input patterns) and achieve a sort of local optimum.
We propose power-gating as a knob for improving over these results, aiming
at a global optimum. We suggest leveraging a well-known weakness of power-
gating: The logic values of the nodes of a power-gated block are lost when the
block is disconnected from the power supply or the ground nets. This poses serious
problems when storing values in memory elements and when interfacing power-
gated to non power-gated regions. Solutions based on the usage of special types of
memory elements [3] or by proper design of the sleep transistor cell [16] do exist.
For the combinational portion of a circuit, the behavior of the internal nodes
depends on the type of switch (footer or header) implementing power-gating. If
a footer switch is used (and thus the block is disconnected from the ground), as
described in Sect. 3, an interesting behavior occurs. Nodes that are at logic value
‘1’ before opening the switch do keep their values, whereas nodes with value ‘0’
become floating. Both the virtual ground line and the ‘0’ nodes then get charged to
‘1’ by the leakage current of the pull-up network of the cells [27]. The speed of this
charge process depends on the design of the sleep transistor cell, and can be speed
up by using a proper pull-up boosting mechanism, as shown in [7].
For the sake of illustration, Fig. 9.11 plots the signals of a simple two-inverter
circuit connected to a footer switch after the latter has been turned off (0.0 on the
timescale) [10]. We observe that the signals a and c, originally at logic value ‘1’,
stay unchanged, while signal b, at ‘0’ when the sleep signal is activated, goes to ‘1’
quite abruptly, and it reaches about 85 % of Vdd in a few tens of nanoseconds.
198 A. Calimera et al.
a b c
Virtual
Sleep Ground
What it matters for our purposes is that all the nodes inside the gated block
region (more or less quickly) reach a logic value of ‘1’, that is, the gated block
recovers from aging during the standby intervals. This is clearly a non-logical state
that cannot be obtained by any mechanism that operates on the circuit function or
the input data, and is therefore orthogonal to such design approaches.
From the analysis of the previous section, two key issues deserve a deeper
evaluation. First, it is evident that the benefit in terms of aging goes together with the
potential leakage reductions: If the gated block is put in a standby state sporadically,
the aging reduction will be marginal. Clearly, the amount of time spent in the
standby state is not a quantity that is controllable by the designer and it is determined
by the environment. Therefore, it is important to parameterize the aging in terms of
this quantity. Second, the implementation of power-gating does not come for free.
As discussed in Sect. 3, the addition of a footer switch in series with the pull-down
network increases the on-resistance of each cell and results into a (nominal, i.e., at
time zero) delay penalty. Since by tackling aging we are trying to compensate the
9 Power-Gating for Leakage Control and Beyond 199
1.15E+01
1.13E+01 Psleep=0.7
1.10E+01
1.08E+01 Psleep=0.8
Delay
1.05E+01 Psleep=0.9
1.03E+01 No PG
1.00E+01
9.75E+00 β=0.5
9.50E+00
00
06
07
07
07
07
07
07
07
07
07
E+
E+
E+
E+
E+
E+
E+
E+
E+
E+
E+
00
50
75
75
75
75
75
75
75
75
75
0.
7.
1.
2.
3.
4.
5.
6.
7.
8.
9.
Time (s)
increase of delay over time, it is fundamental considering the fact that in a power-
gated circuit we start from a larger nominal delay value.
Equation (9.10) can be adapted so as to incorporate the above described effects
by adding two more parameters: Psleep , the probability of the sleep signal (0 means
always active, 1 always in standby), and , the delay penalty the due to the addition
of the sleep transistor (i.e., ds D d .1 C / is the time-zero delay of the gated
block).
The new expression of the delay model of a gated block becomes:
K .ˇ .1 Psleep / t /1=4
ds0 .t / D ds .1 C / (9.11)
VGT K .ˇ .1 Psleep / t /1=4
Notice that the stress probability ˇ is now modulated by the complement of the
sleep probability .1 Psleep /; this is equivalent to an “effective” stress probability
(ˇ 0 D ˇ .1 Psleep /, with ˇ 0 t is now the effective stress time.
Figure 9.12 plots Eqs. (9.10) and (9.11); the curve for ds0 .t / is parameterized with
respect to Psleep , and it refers to a specific value of . Both curves for d.t / and ds0 .t /
refer to a value of ˇ D 0:5. The plot is qualitative just for the sake of illustration of
how the aging profiles change based on the various parameters.
The aging curves for power-gating start at a higher time-zero delay value (ds ), but
they exhibit an amount of recovery that is proportional to the percentage of standby
time Psleep . The diamonds in correspondence to the intersections between the non
power-gated curve and the power-gated ones show the breakeven points, that is, the
points in time after which the power-gated circuits start aging more slowly than
their non power-gated counterparts. Clearly, the intersection moves earlier in time
as Psleep increases, denoting better aging profiles.
200 A. Calimera et al.
There are three parameters that affect the above analysis: Psleep ; ˇ; and . The
first two are not under the designer’s control and depend on the functional and
environmental characteristics of the circuit; for a given implementation, they assume
a well defined value. On the contrary, is a design variable; its magnitude is related
to the sleep transistor size [2–31]: A larger (smaller) implies a smaller (larger)
sleep transistor. The ideal case of D 0 would correspond to a transistor of infinite
size.
5.3.2 Architectures
The discussion above assumes the entire logic block being power-gated by a single
sleep transistor. But, the aging benefits of power-gating are proportional to ; a
large value of ds may result in a breakeven point which might be beyond the
typical lifetime horizon of the circuit. Therefore, it is essential to keep as small as
possible.
One way for achieving this is to selectively apply power-gating. A first option is
partial power-gating, in which power-gating is applied only to the non-critical gates
of a circuit (left schematic of Fig. 9.13). Assuming a coarse-grain implementation
of power-gating, gating only a portion of the circuit requires the interaction with
the physical placement of the cells [11, 31]. By not slowing down critical gates,
this solution has the same delay as the nominal one (ds
d ) and it requires a
sleep transistor width, W , smaller than that required for power-gating the entire
circuit. However, from the aging viewpoint, partial power-gating is almost identical
to the non power-gated case. In fact, keeping the critical gates un-gated means that
they will age as in the original circuit. Then, partial power-gating allows achieving
leakage reductions at zero delay overhead, but it does not yield any aging benefit.
To get concurrent leakage and aging benefits, we can adopt a clustered architec-
ture similar to the one of Sect. 4. The entire circuit is power-gated using two sleep
transistors (see the right part of Fig. 9.13): One for the critical gates (of size WC )
and another one for the non-critical gates (of size WNC WC ) [10]. The idea
is to decouple the sleep transistor problem by using a small transistor for the non-
critical gates (with the only purpose of saving leakage) and a larger one with smaller
performance penalty for critical cells. If WC W , with the whole circuit gated, we
can achieve leakage savings comparable to the case of full power-gating.
9 Power-Gating for Leakage Control and Beyond 201
power-gated circuit). The results in the chart represent an average over a set of
standard benchmarks and some medium-sized industrial designs, and refer to a
45 nm industrial CMOS technology.
We observe that the PG scheme does not fully exploit the huge leakage reduction
(about 88 %) for lifetime extension; in fact, only a 57 % increase is achieved.
Conversely, the CLUSTERED architecture yields a 5 increase with respect to the
non-gated version, with a negligible loss in leakage reduction (about 0.5 % penalty
with respect to PG). These numbers demonstrate experimentally the effectiveness
of clustered power-gating as a tool for simultaneous leakage power minimization
and aging effects mitigation.
Conclusions
There is wide consensus within the electronics industry that the scaling
process of the CMOS technology will continue as far as the fundamental
barrier of the atom size will be reached. Therefore, CMOS-based circuits and
systems represent the future vehicle for digital applications of the next decade.
Unfortunately, nano-scale CMOS technologies show intrinsic mechanisms,
like internal leakage consumption and parametric variations, which make their
use extremely challenging.
In this work, we explored the possibility of exploiting low-power tech-
niques for concurrent leakage optimization and variability compensation,
and both static (i.e., due to process variation) and dynamic (i.e., due to
NBTI-induced aging mechanisms) variability have been considered. More
specifically, we have shown how power-gating, when implemented through
a clustered strategy, offers a suited performance control knob to compensate
process variations, as well as a natural solution for reducing NBTI effects.
(continued)
9 Power-Gating for Leakage Control and Beyond 203
References
15. D. Flynn, R. Aitken, A. Gibbons, K. Shi (2007) Low Power Methodology Manual, Springer-
Verlag.
16. S. Henzler, G. Georgakos, M. Eireiner, T. Nirschl, C. Pacha, J. Berthold, D. Schmitt-Landsiedel
(2006) “Dynamic state-retention flip-flop for fine-grained power gating with small design and
power overhead,” IEEE Journal of Solid-State Circuits, 41(7):1654–1661.
17. ITRS (2009) “Process integration, devices & structures,” in International Technology Roadmap
for Semiconductors, ITRS.
18. T. Jhaveri, V. Rovner, L. Liebmann, L. Pileggi, A. Strojwas, J. Hibbeler (2010) “Co-
optimization of circuits, layout and lithography for predictive technology scaling beyond
gratings,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
29(4):509–527.
19. W. Kai-Chiang, D. Marculescu (2009) “Joint logic restructuring and pin reordering against
NBTI-induced performance degradation,” IEEE Design, Automation and Test in Europe
(DATE’09), pp. 75–80.
20. J. Kao, S. Narendra, A. Chandrakasan (1998) “MTCMOS hierarchical sizing based on
mutual exclusive discharge patterns,” ACM/IEEE Design Automation Conference (DAC’98),
pp. 495–500.
21. V. Khandelwal, S. Srivastava (2007) “Leakage control through fine-grained placement and
sizing of sleep transistors,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 26(7):1246–1255.
22. S. Kumar, C. Kim, S. Sapatnekar (2006) “An analytical model for negative bias temperature
instability,” IEEE/ACM International Conference on Computer-Aided Design (ICCAD’06),
pp. 493–496.
23. S. Kumar, C. Kim, S. Sapatnekar (2007) “NBTI-aware synthesis of digital circuits,” ACM/IEEE
Design Automation Conference (DAC’07), pp. 370–375.
24. S. Kumar, C. Kim, S. Sapatnekar (2009) “Adaptive techniques for overcoming performance
degradation due to aging in digital circuits,” IEEE Asia and South Pacific Design Automation
Conference (ASPDAC’09), pp. 284–289.
25. M. Lavin, F. L. Heng, G. Northrop (2004) “Backend CAD flows for restrictive design rules,”
IEEE/ACM International Conference on Computer-Aided Design (ICCAD’04), pp. 739–746.
26. S. Mahapatra, D. Saha, D. Varghese, P. Kumar (2006) “On the generation and recovery of
interface traps in mosfets subjected to NBTI, FN, and HCI stress,” IEEE Transactions on
Electron Devices, 53(7):1583–1592.
27. E. Pakbaznia, F. Fallah, M. Pedram (2008) “Charge recycling in power-gated CMOS cir-
cuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
27(10):1798–1811.
28. K. Roy, S. Mukhopadhyay, H. Mahmoodi-Meimand (2003) “Leakage current mechanisms
and leakage reduction techniques in deep-submicrometer CMOS circuits,” Proceedings of the
IEEE, 91(2):305–327.
29. A. Sathanur, A. Pullini, L. Benini, A. Macii, E. Macii, M. Poncino (2008) “Optimal sleep
transistor synthesis under timing and area constraints,” ACM/IEEE Great Lakes symposium on
VLSI (GLSVLSI’08), pp. 177–182.
30. A. Sathanur, L. Benini, A. Macii, E. Macii, M. Poncino (2011) “Fast computation of discharge
current upper bounds for clustered power-gating,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, 19(1):146–151.
31. A. Sathanur, L. Benini, A. Macii, E. Macii, M. Poncino (2011) “Row-based power-gating:
A novel sleep transistor insertion methodology for leakage power optimization in nanome-
ter CMOS circuits,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
19(3):469–482.
32. L. D. Silva, A. Calimera, A. Macii, E. Macii, M. Poncino (2011) “Power efficient variability
compensation through clustered tunable power-gating,” IEEE Journal of Emerging and
Selected Topics in Circuits and Systems, 1(3):242–253.
9 Power-Gating for Leakage Control and Beyond 205
33. D. Sylvester, K. Agarwal, S. Shah (2008) “Variability in nanometer CMOS: Impact, analysis,
and minimization,” Integration - The VLSI Journal, 41(3):319–339.
34. J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, V. De (2002)
“Adaptive body bias for reducing impacts of die-to-die and within-die parameter vari-
ations on microprocessor frequency and leakage,” IEEE Journal of Solid-State Circuits,
37(11):1396–1402.
35. J. Tschanz, S. Narendra, R. Nair, V. De (2003) “Effectiveness of adaptive supply voltage and
body bias for reducing impact of parameter variations in low power and high performance
microprocessors,” IEEE Journal of Solid-State Circuits, 38(5):826–829.
Chapter 10
Soft Error Rate and Fault Tolerance
Techniques for FPGAs
1 Introduction
Well-known Flash-based FPGAs are from Microsemi, which presents the fami-
lies PROASIC3 and SmartFusion [7]. The reconfigurable array is composed of
VersaTiles and routing resources that are programmable by turning ON or OFF
switches implemented by floating gate (FG) transistors (NMOS transistor with a
stacked gate). The FG switch circuit is a set of two NMOS transistors: (1) a sense
transistor to program the floating gate and sense the current during the threshold
210 F. Kastensmidt and R. Reis
Fig. 10.1 Example of an SEU occurrence in a LUT and in the routing of an SRAM-based FPGA
Word
Switch Out
Fig. 10.4 Probability of a bit-flip as a function of the mean deposited charge by an incoming
particle for a 6T-SRAM cell before and after stress
Table 10.2 List of mitigation techniques that can be applied by the User in
Designs targeting FPGAs
Mitigation technique Abstraction level SET SEU
Local TMR HDL X
Global TMR HDL X X
Large grain TMR HDL X X
Voter insertion HDL X X
Reliability-Oriented place and Route Algorithm FPGA Flow X X
Temporal redundancy HDL X
Embedded processor redundancy HDL X X
Scrubbing System X
FPGA
TMR Output Voter
REF
R0 R1 R2 MAJ
0 0 0 0
0 0 1 0
0 1 0 0
TMR flip-flop
0 1 1 1
1 0 0 1
1 0 1 0
tr0 MAJ 1 1 0 0
clk0 R0
R0 R1 R2 MAJ 1 1 1 0 3-state_0
O_voter
0 0 0 0 LUT: 00011000_00011000 R0
0 0 1 0
tr1 MAJ 0 1 0 0
clk1 R1
0 1 1 1 O_voter 3-state_1
1 0 0 0
1 0 1 1 R1
tr2 MAJ 1 1 0 1
clk2 R2
1 1 1 1 3-state_2
O_voter
LUT: 00010111_00010111 R2
CLB flip-flop
to correct the bit-flip of the flip-flops to avoid accumulation of errors. The voter used
in the output is based on a minority voter, where it blocks the output if this one is
different to the other two.
10 Soft Error Rate and Fault Tolerance Techniques for FPGAs 215
140
Full device scrub time (ms)
120
100
80
60
40
20
0
Virtex Virtex E Virtex-II Virtex-II Virtex-4 Virtex-5 Virtex-6 Virtex-7 Spartan-3 Spartan-3 Spartan-6
(250nm) (180nm) (150nm) Pro (90nm) (65nm) (40nm) (28nm) E/A (90nm) (45nm)
(130nm) (90nm)
Fig. 10.7 Full configuration or scrubbing time of different SRAM-based FPGAs from Xilinx
FPGA
TMR Majority Voter
REDUNDANT
TMR register with
Majorty Voter
LOGIC (tr0)
TMR Output
REDUNDANT
LOGIC (tr1) tr1 tr1 tr1
REDUNDANT
LOGIC (tr2) tr2 tr2 tr2
generates the fault-free p-output, n error status flags (ESF), and a non-masked fault
signal (NMF). In this scheme, the system allows for the accumulation of defective
modules, while remaining at least two modules without fault. SAv is a majority
voter, considering as population fault-free modules.
The test of FPGAs under radiation depends on a test plan developed for each type
of FPGA and design architecture. Here we will detail the radiation test for SRAM-
based FPGAs. There are two types of tests: the static test and the dynamic test.
218 F. Kastensmidt and R. Reis
TR0 TRV0
combinational logic D0 clk0 TR1 voter OK
TR2
E0
D1 TR1 TRV1
combinational logic clk1 X TR0 voter OK
TR2
E1
D2 TR2 TRV2
combinational logic clk2 TR0 voter OK
TR1
E2
clk+2d clk+2d
Fig. 10.13 Temporal Redundancy or SET Filtering Technique with delay d applied in the clock
trees
The static test experiment consists of configuring the FPGA with a golden bitstream
containing the test-design and then constantly read back the FPGA configuration
memory with the Xilinx iMPACT tool through the JTAG interface. In the experiment
control computer, the golden bitstream is compared against the readback bitstream.
If differences are found, the FPGA is reconfigured with the golden bitstream and
the differences are stored in the computer. Faults are defined as any bit-flip in the
configuration memory detected by the readback procedure. In this case, it is possible
to calculate the upset rate in the configuration memory bits for that specific particle
flux, expressed by particles/s/cm2 .
The cross-sections (¢) is the sensitive area of a circuit, where one particle
may cause an upset event, and it is calculated using Eq. (10.1), where flu-
ence D flux time, expressed by particles/cm2 . Cross-section is expressed in cm2 .
#events
¢D (10.1)
fluence
10 Soft Error Rate and Fault Tolerance Techniques for FPGAs 219
There are two types of cross-section: static and dynamic cross-section. Static
cross-section, also known as device cross-section (¢ device) is defined as the ratio
between the number of upsets in the configuration memory bits of the SRAM-based
FPGA (events) and the fluence of hitting particles. Usually, it is used a normalized
cross-section or bit cross-section (¢ bit), where the cross-section is divided by
the total amount of configuration memory bits. Static cross-section quantifies the
sensitivity of the FPGA technology to a specific radiation source.
On the other hand, dynamic cross-section is defined as the ratio between the
number errors observed at the output at design configured into the SRAM-based
FPGA (events), divided by the fluence of hitting particles. Dynamic cross-section
quantifies the sensitivity of the implemented design application to any specific
radiation source. The rate at errors is defined as the soft error rate (SER). In this
case, the expected error rate is much lower than the static test. Based on the Xilinx
Reliability Report [15], in average it is necessary 20 upsets in the configuration
memory bits to provoke one error in the design output. This relation may of course
vary according to the logic density, mapping, routing and the chosen architecture
for the design. This number can triplicate when XTMR is used. And the relation
can increase even higher, 6 or more times, if DTMR or nMR is used.
Notice that SER is proportional to both device size sensitivity (cross-section) and
flux as shown in Eq. (10.2).
When a charged particle (as neutron, protons or heavy ions) hits a device, part of
its charge is deposited in the device. This is known as linear energy transfer (LET)
and expresses the energy loss per unit length (dE/dx) of a particle and is a function of
the mass and energy of the particle as well as the target material density. The units of
LET are commonly expressed as MeV–cm2 /g. Radiation experiments with charged
particles commonly relate the relation between cross-section and LET, which also
depends on the incidence angle.
Fig. 10.14 illustrates a setup experiment under neutron at ISIS Facility in United
Kingdom, where we could measure the static and dynamic cross-section of SRAM-
based FPGAs and later on by knowing the sea level neutron flux, we can infer the
SER of the device and circuit application running at ground level.
For example, the device was exposed for 22 min in a neutron flux of 4.11 104
and 70 events were counted. The fluence can be calculated and consequently by
using Eq. (10.1), the cross-section obtained is 1.29 10-6 cm2 . And by using Eq.
(10.2), the SER is 5.3 10-2 errors per second. By knowing particles flux at see
level is 13 neutrons/cm2 /h, the expected error rate at sea level can be infer to
1.29 10-6 cm2 13 neutrons/cm2 /h D 1.67 10-5 errors per hour.
220 F. Kastensmidt and R. Reis
Conclusions
Each type of FPGAs presents a different susceptibility to soft errors and
according to its upset rate and architecture, distinct fault tolerance techniques
may be chosen. For experiments under neutron-induced faults, it is possible
to calculate the static and dynamic cross-section of SRAM-based FPGAs and
to analyze its tolerance to multiple and accumulated upsets. The static cross-
section can be measured for each SRAM-based FPGA device and the dynamic
cross-section must be measured for each fault tolerant design implemented in
the FPGA. For a certain device with a specific static cross-section, there will
be many dynamic cross-sections according to the fault tolerance technique
chosen.
References
1. J. Barth, “Applying Computer Simulation Tools to Radiation Effects Problems”, in: IEEE
Nuclear Space Radiation Effects Conference Short Course, NSREC, 1997.
2. O. Flament, J. Baggio, C. D”hose, G. Gasiot, J.L. Leray, “14 MeV neutron-induced SEU in
SRAM devices,” In Nuclear Science, IEEE Transactions on, vol. 51, no. 5, pp. 2908–2911,
Oct. 2004.
3. M. Berg, “Fault tolerance implementation within SRAM based FPGA designs based upon the
increased level of single event upset susceptibility,” On-Line Testing Symposium, IOLTS 2006.
10 Soft Error Rate and Fault Tolerance Techniques for FPGAs 221
1 Introduction
The past few decades have seen the evolution of semiconductor industry driven
by technology scaling. Miniaturization of bulk Metal Oxide Semiconductor Field
Effect Transistors (MOSFETs) along with scaling of power supply voltage (VDD )
has provided the benefits of higher performance, lower power and larger integration
density. However, for the past few technology generations, scaling of transis-
tors has been becoming increasingly difficult. Short channel effects (SCEs) like
drain-induced barrier lowering [1], device threshold voltage (VT ) roll-off [1] and
degradation in the ratio of ON current (ION ) to OFF current (IOFF ) have started
to manifest themselves to a larger extent in scaled MOSFETs. Another critical
challenge in scaled technologies is the increasing impact of process variations on
the device, circuit and system performance due to random dopant fluctuation, line
edge roughness etc. [2]. Such detrimental effects in scaled technologies make the
design and optimization of devices and circuits extremely challenging.
Fig. 11.1 Device structures with single gate viz. (a) bulk MOSFETs (b) partially depleted (PD)
SOI MOSFETs (c) fully depeleted (FD) SOI MOSFETs and multiple gates viz. (d) planar double
gate (DG) MOSFETs (e) FinFETs and (f) gate-all around MOSFETs. The arrows show the electric
field lines in the structures
FinFETs exhibit reduced short channel effects (compared to bulk MOSFETs) viz.
lower drain induced barrier lowering (DIBL), lower sub-threshold swing (SS) and
higher ratio of ON current to OFF current (ION /IOFF ) due to ultra-thin silicon body.
Lower fin thickness (tSi ) limits the electric field penetration from drain to source and
enhances the scalability of FinFETs. However, tSi cannot be scaled indefinitely. As
tSi decreases, quantum confinement effects start to play a bigger role, increasing
the threshold voltage (VT ) of the transistors and leading to loss in the device
performance. Moreover, device variability increases at lower tSi .
Ultra-thin body (UTB) in FinFETs along with channel control by two gates make
it feasible to use undoped (intrinsic) body, while still achieving excellent device
electrostatics. Due to undoped body in FinFETs, the variations due to random
dopant fluctuations (a major source of random variations in bulk MOSFETs) is
eliminated. However, due to UTB, an additional source of variation viz. body
thickness variation is introduced in FinFETs.
Short channel effects (SCEs) in FinFETs can also be controlled by reducing gate
dielectric thickness (tOX ). Lower tOX results in superior gate control of the channel,
but leads to increase in the gate leakage (IG ). Use of high-k dielectrics expands the
device design space by allowing the scaling of effective oxide thickness (EOT) with
higher physical thickness of the dielectric, limiting the gate leakage current in scaled
FinFETs.
Another device parameter which plays an important role in determining the char-
acteristics of FinFETs is the gate-source/drain (S/D) overlap (LOVS/D ). Larger LOVS/D
leads to decrease in effective channel length (LCH ) for the same gate length (LG )
which worsens SCEs in the device. In case the electrostatic integrity of the device
cannot be maintained by scaling tOX and/or tSi , a negative overlap between gate
and S/D, called underlap (LUN ) may be introduced to increase the effective channel
length without increasing LG . This achieves improved electrostatics and higher ION -
IOFF ratio. Gate workfunction (˚ G ) is another important design parameter which
can be changed to obtain the desired VT of the transistor [10].
The width of FinFET is along the height of the fin (HFIN -Fig. 11.1e) and for
ensuring mechanical stability of the device, the aspect ratio HFIN :tSi is limited [11].
For a particular technology, HFIN is fixed and hence, the width of the device can
only be increased in quanta of HFIN . This effect is known as width quantization.
Restricted device sizing due to width quantization in FinFETs leads to constrained
design space of FinFET-based circuits.
The effect of width quantization becomes even more critical for circuits like
SRAMs and flip-flops in which transistor sizing has a significant effect on the
226 S.K. Gupta and K. Roy
circuit functionality [12]. Conflicting requirements for transistor sizes for stable read
and write operations in SRAMs further aggravates the challenge posed by width
quantization.
The adverse effects of width quantization on FinFET-based circuits can be
mitigated by employing device-circuit co-design techniques [13–24]. Figure 11.2
illustrates different technology options provided by FinFETs. The two gates of
FinFETs can be tied together or controlled independently to enhance the design
flexibility of FinFET-based circuits. In addition, device parameters like fin thickness,
gate-source/drain underlap, gate workfunction, fin orientation and fin height can be
optimized considering the circuit and system requirements. This chapter explores
such device-circuit interactions with a focus on FinFET-based SRAMs. Before we
present the discussions on the device-circuit co-design methodologies, let us review
the fundamentals of 6 T SRAMs.
Suppose the cell stores ‘0’ i.e. VQ D 0 and VQB D VDD . (Here, VQ and VQB are
voltages at nodes Q and QB). During the read operation, bit-lines BL and BLB are
pre-charged to VDD and word-line WL is asserted. VQ rises to some positive voltage
VREAD determined by the relative strengths of AXL and NL. At the same time BL
starts discharging while BLB remains at VDD . The difference in the voltages of BL
and BLB is amplified by the sense amplifier to obtain the value stored in the cell.
Since the storage node voltages are disturbed during the read, read stability of the
cell is less than the hold stability. If VREAD becomes greater than the trip point of
the inverter formed by AXR, PR and NR during read, the contents of the cell can
flip, causing a read failure [25]. During the write operation, BL is charged to VDD
and BLB is discharged to GND. On asserting WL, VQB is discharged to VWRITE ,
determined by the relative strengths of PR and AXR. Successful write occurs if
VWRITE is less than the trip point of the inverter formed by PL, AXL and NL.
With the basic understanding of FinFETs and cell operation of 6 T SRAMs, let
us discuss the design techniques for FinFET-based SRAMs. We describe the co-
optimization of fin ratio and tSi in Sect. 2, joint optimization of HFIN -tSi -tOX -VDD -VT
in Sect. 3, analysis of fin orientation in Sect. 4, spacer thickness optimization in
Sect. 5 and independent gate FinFETs in Sect. 6.
Table 11.1 Comparison of FinFETs with different fin thickness (tSi ) at iso-
IOFF [13]. (IOFF D 25 nA/m)
tSi (nm) ˆG (eV) IDSAT (mA/m) IDLIN (mA/m) DIBL (mV/V)
7 4.61 2.18 0.88 51
9 4.73 1.95 0.92 115
11 4.80 1.56 0.8 194
tSi under iso-IOFF conditions, which is achieved by appropriately adjusting the gate
work-function (˚ G ). At iso-IOFF , lower short channel effects (SCE) in devices
with a reduced tSi tend to increase drain current (ID ). However, lower tSi results
in mobility degradation [26] and increase in source/drain (S/D) resistance, which
tends to decrease ID . At high VDS (saturation region), the effect of lower SCE
has a dominant effect on ID due to which saturation current (IDSAT ) increases with
decreasing tSi (Table 11.1 [13]). However, at low VDS (linear region), the impact of
SCE, mobility degradation and S/D resistance on ID are comparable. Due to this,
maximum value of linear current (IDLIN ) is obtained for tSi D 9 nm in Table 11.1.
With this discussion in mind, let us describe the effect of tSi on the cell stability,
performance and leakage of 6 T SRAMs.
During read, NL (Fig. 11.3) operates in the linear region while AXL is in
saturation. Due to increase in IDSAT at lower tSi and comparable IDLIN for different
tSi at iso-IOFF , the relative strength of AXL with respect to NL increases as tSi is
reduced. This results in higher VREAD which tends to lower the read static noise
margin (SNM). However, lower tSi also leads to higher ION -IOFF ratio, which tends
to increase the read SNM. Depending on the relative impact of the two effects
on read stability, read SNM might increase or decrease with tSi at iso-IOFF . The
access time of SRAM reduces with decreasing tSi due to increase in the strength
of the access transistor. Hold SNM increases for lower tSi due to higher ION -IOFF
ratio of the transistors. Write-ability of the cell reduces with decreasing tSi due to
reduction in the relative strength of AXR with respect to PR. Moreover, higher ION -
IOFF ratio at reduced tSi makes it difficult to flip the contents of the cell and lowers
the write-ability. Since the comparison is performed under iso-IOFF conditions, the
cell leakage is similar for different tSi .
Let us, now, discuss the impact of fin ratio on SRAM characteristics. We define
NAC , NPD and NPU to be the number of fins in the access, pull-down and pull-up
FinFETs, respectively. Increasing NPD :NAC results in lower VREAD and higher read
stability. Improvement in read stability can also be achieved by increasing NPU :NPD
which leads to higher logic threshold voltage (VM ) of the inverter formed by PR and
NR (see Fig. 11.3). Write-ability of the cell can be increased by increasing NAC :NPU ,
which results in lower VWRITE or by increasing NPU :NPD which results in higher
VM of the inverter formed by PL and NL. Hold stability improves with increasing
NPU :NPD . Note, the conflicting requirements for the fin ratios for achieving high
read stability, write-ability and hold stability. As an example, if NAC is decreased
to increase NPD :NAC and improve read stability, then NAC :NPU decreases, resulting
11 Low Power Robust FinFET-Based SRAM Design in Scaled Technologies 229
Fig. 11.5 Layouts of FinFET-based 6 T SRAM showing increase in area with increasing number
of fins: (a) layout with using one/two fins in the transistors (b) layout for a high performance cell
Fig. 11.6 (a) Read SNM and (b) Write margin as a function of fin ratio and fin thickness [13]
is shown in Fig. 11.6 [13] under iso-IOFF conditions. Increase in read SNM with
decreasing tSi can be observed and is attributed to improved ION -IOFF ratio for
low tSi , as explained before. However, higher ION -IOFF ratio for low tSi results in
degradation in write margin (WM). Read SNM and WM for different fin ratios
are also shown in Fig. 11.6. Fin ratio NP U W NAC W NPD D 1 W 1 W 2 provides
the optimal balance between read stability, write-ability and cell area. The plots
in Fig. 11.6 can be used for the co-optimization of tSi and fin ratio to achieve the
optimal cell stability of FinFET-based SRAMs.
Fig. 11.7 (a) Cell leakage (b) read SNM (c) read and write access time and (d) cell charge for
different design points for Device-VT and Device-VT -VDD co-optimizations [14]
Fig. 11.10 (a) Read SNM (b) write margin (c) hold SNM and (d) access time for different fin
orientations [15]
Fig. 11.11 (a) Conventional FinFET (b) Symmetric Spacer Extension (SSE) FinFET (c) LgDev
FinFET with larger gate length and (d) Asymmetric Drain Spacer Extension (ADSE) FinFET
Figure 11.11b shows the FinFET structure with symmetric gate underlap (Sym-
metric Spacer Extension (SSE) FinFET) [16]. Gate sidewall spacer thickness is
increased by
LSP on both sides of the gate to introduce an equal gate underlap
on the source and drain sides. Gate underlap results in (a) increase in the effective
channel length of the device and (b) change in the device electrostatics due to weaker
gate control in the underlapped portion of the channel. Due to these two effects, ION
degrades with increasing
LSP . However, at the same time, FinFETs with higher
LSP exhibit lower capacitance due to the reduction in the gate overlap. ION and
236 S.K. Gupta and K. Roy
capacitance versus
LSP is shown in Fig. 11.12. It can be observed that for small
LSP , ION degradation is small and is due to increase in the effective channel length.
However, for large
LSP , sharp decrease in ION is observed and is attributed to
a significant loss of gate control over the channel, especially on the source side.
This effect is explained in a greater detail in the next sub-section. Figure 11.12
also shows a decrease in gate and drain capacitance (CG and CD ) with increasing
LSP . The net effect of decreasing ION and capacitance on intrinsic device delay
(CD V/ION ) can be observed in Fig. 11.12 which shows a minimum in CD V/ION for
an optimal
LSP . Thus, symmetric spacer optimization can lead to lower intrinsic
device delay. Moreover, increase in the effective channel length leads to lower
short channel effects viz. lower DIBL, lower sub-threshold swing and reduced sub-
threshold leakage with increasing
LSP (Fig. 11.13). Reduced gate overlap also
results in lower edge tunneling current (IET ), as shown in Fig. 11.13.
Let us now explain the impact of symmetric underlap in FinFETs on SRAM
stability, leakage and performance. Improvement in short channel effects of the
transistors with increasing
LSP results in higher read and hold SNM, leading to
reduction in the data retention voltage (DRV). At the same time, lower intrinsic
11 Low Power Robust FinFET-Based SRAM Design in Scaled Technologies 237
Table 11.3 Comparison of SSE FinFET SRAM with conventional (Conv.) and LgDev FinFET
SRAMs [16]
Metric Conv. (LSP D 0) SSE (LSP D 12 nm) LgDev (LSP D 12 nm)
Read SNM (mV) 129 132 137
Write Time (ps) : Nominal 25.4 25.7 23.8
:Worst Case 54.2 42.3 52.9
Worst Case Data 220 160 150
Retention Voltage (mV)
Leakage Power (nW): PSUB 13.05 2.52 2.10
(T D 27 ı C) : PGATE 34.05 13.78 60.9
:PTOT 47.1 16.3 63
delay of the optimized devices (as explained earlier) and lower storage node
capacitance result in lower write time. Moreover, lower sub-threshold leakage leads
to significant reduction in the cell leakage. This comes at the cost of (a) increase
in cell area due to increase in the device footprint and (b) higher access time due
to lower strength of the access transistors. Table 11.3 shows the comparison of
conventional FinFETs and SSE FinFETs [16]. Comparison is also performed with
a device which has the same spacer thickness as conventional FinFETs but with
channel length equal to that of SSE FinFET (see LgDev in Fig. 11.11c). Note, LgDev
has a larger gate length compared to SSE FinFET. Larger read SNM and lower DRV
is observed in LgDev based SRAM, compared to SSE FinFET SRAM because ION -
IOFF ratio in LgDev is higher due to superior gate control. However, since the overlap
capacitance is higher in LgDev, the worst case write time is larger compared to SSE
FinFET based SRAM. Moreover, large gate length results in higher gate current in
the ON state due to larger cross-sectional area of the gate. This leads to higher cell
leakage in LgDev FinFET SRAM.
In the previous sub-section, we discussed the pros and cons of SSE FinFETs. One of
the drawbacks of introducing gate underlap on the source side is that carrier injection
from the source to the channel is significantly affected by the underlap, which results
in a large degradation in ION for large
LSP . Instead of introducing underlap on both
sides, if the underlap is introduced only on the drain side, the degradation in ION
can be mitigated. Figure 11.11d shows the FinFET structure with asymmetric gate
underlap (Asymmetric Drain Spacer Extension (ADSE) FinFET) [17]. Asymmetry
in the structure of ADSE FinFETs results in unequal ID for VDS > 0 and VDS < 0.
This asymmetry is used to mitigate the read-write conflict in 6 T SRAMs. In order
to explain the benefits of ADSE FinFET based SRAM, let us discuss the impact of
asymmetric gate underlap on the device characteristics.
238 S.K. Gupta and K. Roy
Fig. 11.14 Conduction band profile of ADSE FinFETs (a) at different VGS (b) for different
LSP
(c) at different positive VDS and (d) at different negative VDS [17]. The thick line on the device
symbol indicates the underlapped terminal
Fig. 11.15 Drain current versus
LSP of ADSE and LgDev FinFETs for (a) VDS > 0 and
(b) VDS < 0 [17]
negative VDS (Fig. 11.14d). The difference in the dependence of extra potential
barrier and hence, LKP , on the polarity of drain bias creates an asymmetry in current
for positive and negative VDS .
In addition, ADSE FinFETs show improved short channel effects due to increase
in the channel length. Reduction in IET is also observed with increasing
LSP for
positive VDS [17]. The comparison of ADSE FinFETs with conventional FinFETs
and iso-device footprint SSE FinFETs is summarized in Table 11.4. ADSE FinFETs
show improvement in short channel characteristics and lower gate-drain capacitance
compared to iso-area SSE FinFETs. In addition to superior device characteristics,
ADSE FinFETs exhibit asymmetry in ID for VDS > 0 and VDS < 0. This unique
feature of ADSE FinFETs is exploited to mitigate the read-write conflict in 6 T
SRAMs, which we discuss next.
Figure 11.16 shows the ADSE FinFET based 6 T SRAM. The underlapped
terminal is indicated by the thick lines. During the read operation, VDS across
AXR is negative, which results in lower ID due to extra potential and hence, lower
strength of AXR relative to NR. During the write operation, VDS across AXL is
positive which reduces the effect of the extra potential barrier on ID and increases the
strength of AXL relative to PL. This leads to the mitigation of the read-write conflict
(Fig. 11.17). Moreover, improved short channel effects in ADSE FinFETs results in
lower cell leakage (ILEAK ) and hold stability. At the same time, lower storage node
capacitance results in lower write time. However, at large
LSP , degradation in ION
240 S.K. Gupta and K. Roy
Fig. 11.16 Schematic of ADSE FinFET based 6 T SRAM showing current direction during
(a) read and (b) write [17]
becomes dominant over the reduction in capacitance which leads to increase in write
time. Weak access transistors during read leads to increased access time. ADSE
FinFETs also have larger cell area due to increased device footprint. Figure 11.17
also shows comparison between ADSE and LgDev FinFET SRAMs. ADSE FinFET
based SRAM exhibits superior read and write stability compared to LgDev based
SRAM due to asymmetry in ADSE FinFETs which mitigates the read-write conflict.
Moreover, ADSE FinFET SRAM shows lower cell leakage. IGATE (gate leakage
current) decreases with
LSP for ADSE FinFET based SRAM. This is because of
reduced edge tunneling current and negligible change in direct tunneling current
(since gate length LG does not change with
LSP ). On the other hand, LgDev based
SRAM shows an increase in IGATE due to increase in LG which results in larger
direct tunneling current. Also, larger storage node capacitance in LgDev leads to
larger write time. However, LgDev FinFET SRAMs show lower access time than
ADSE FinFET SRAMs due to higher strength of the access transistors. Moreover,
superior short channel control in LgDev results in larger hold stability.
In the previous section, we discussed that transistors with asymmetric gate underlap
leads to the mitigation of the read-write conflict in SRAMs. In this section, we
discuss another technique of introducing asymmetry in the transistor by unequally
doping the source and drain of the transistors, as shown in Fig. 11.18. Similar to
the ADSE FinFETs, AD FinFETs exhibit unequal currents for positive and negative
VDS [23]. The current flowing from the terminal with lower doping (i.e. when the
terminal with larger doping acts as the source) is higher than the current flowing
in the other direction (i.e. when the terminal with lower doping acts as the source).
11 Low Power Robust FinFET-Based SRAM Design in Scaled Technologies 241
Fig. 11.17 (a) Read SNM (b) write margin (c) hold SNM (d) access time (e) write time and (f) cell
leakage versus
LSP for ADSE FinFET and LgDev FinFET SRAMs [17]
The asymmetry in current increases as the doping concentration of the drain terminal
is reduced, with the source doping kept fixed. In order to achieve the mitigation of
the read-write conflict, the terminal with lower doping is connected to the storage
nodes (Fig. 11.18). During the read operation, the strength of the access transistor
reduces, increasing the read stability. During the write operation, higher drive of the
242 S.K. Gupta and K. Roy
Fig. 11.18 (a) Asymmetrically Doped (AD) FinFETs (b) AD FinFET based SRAM and (c)
comparison of the metrics of AD FinFET SRAM with standard FinFET SRAM (Std.) [23]
access transistors leads to increase in the write-ability. In addition lower cell leakage
is achieved at the cost of increase in the access time. Unlike ADSE FinFET SRAM,
there is no area penalty associated with this technique.
Fig. 11.20 (a) Schematic and (b) thin-cell layout of BMIG FinFET SRAM [20]
Figure 11.20 shows the IG FinFET SRAM proposed in [20], in which the two gates
of the access transistors are controlled by two word-lines. The strength of the access
transistor can be modulated between two values in the ON state by independently
asserting/de-asserting the two wordlines. (Henceforth, the IG FinFET SRAM in
Fig. 11.20 is called bi-mode independent gate (BMIG) FinFET SRAM). The two
ON state modes for BMIG FinFETs are, henceforth, called high strength (HS)
and low strength (LS) modes. During the read operation, only RWL is asserted
(LS mode). Due to weak access transistor, read stability increases. This allows
one fin to be used in the pull down transistors (instead of two, as in standard
FinFET—see Sect. 2), thus reducing cell leakage. During the write operation,
244 S.K. Gupta and K. Roy
Fig. 11.21 Two BMIG SRAM cell showing pseudo-read problem during write
both WL and RWL are asserted (HS mode), increasing the strength of the access
transistor. Hence, the conflicting requirements for the access transistor for high read
stability and write-ability are mitigated. However, this technique incurs a cell area
penalty (Fig. 11.20b), compared to TG FinFET SRAM due to the back gate contact
of the access transistors. Moreover, weak access transistor results in significant
degradation in the access time and therefore, limits the application of BMIG FinFET
SRAM, as proposed in [20], to low power low throughput applications. Further,
decreasing the number of fins in the pull-down transistors to 1 while using the access
transistor in the HS mode during write worsens the pseudo-read issue [29, 30] in
BMIG FinFET SRAM array, as illustrated in Fig. 11.21. Suppose write operation
needs to be performed on cell A for which the bitlines are charged to the appropriate
voltages. For another cell B in the same row, belonging to a different word, the
bitlines are precharged to VDD so that no new value is written on that cell. On
asserting the wordlines, the node of cell B storing ‘0’ gets charged to some positive
voltage VPSEUDO-READ depending on the resistive divider action of AXL and NL,
similar to the read operation. In the presence of process variations, VPSEUDOREAD
may become larger than the trip point of the inverter formed by AXR, PR and NR
causing the contents of cell B to flip. This is known as the pseudo-read or half-select
problem. In BMIG FinFETs, the relative strength of access transistors with respect
to pull-down transistor is higher than TG FinFET SRAM since a single fin is used in
pull-down FinFET. This worsens the pseudo-read problem in BMIG FinFET SRAM
compared to TG FinFET SRAMs. Mitigation of the pseudo-read problem can be
achieved by architectural techniques like write-back scheme [31] in which the whole
row is written simultaneously. Table 11.5 summarizes the differences between TG
and BMIG FinFET SRAMs [20].
In addition to low throughput applications, BMIG FinFETs are also suitable for
dynamic voltage frequency (DVF) scalable SRAMs [22]. During high workload
conditions (high voltage - VDDH operation), both the wordlines are asserted (HS
mode) during read and write to achieve fast operations. At low workload conditions,
11 Low Power Robust FinFET-Based SRAM Design in Scaled Technologies 245
the cell is operated at low voltage (VDDL ). Although the frequency requirements
are low, VDD scaling may not be possible in conventional TG FinFET SRAM due
to increase in read and write failures. However, in BMIG FinFET SRAM, only
one wordline is asserted (LS Mode) during read at VDDL which increases the read
stability and allows low voltage operation. During write at VDDL , both the wordlines
are asserted (HS mode) to increase the strength of the access transistors and achieve
low write failures. Since both the wordlines are asserted during read at VDDH , two
fins need to be used in the pull-down FinFETs to achieve high read stability. This
makes the operation of BMIG FinFET SRAM at VDDH identical to TG FinFET
SRAM. Also, the cell leakage of BMIG FinFET SRAM is the same as TG FinFET
SRAM. Moreover, the pseudo-read issue in BMIG FinFET SRAM with a single
fin in the pull-down FinFET discussed above is mitigated in DVF scalable BMIG
FinFET SRAM since the write operation of TG and BMIG FinFET SRAMs are
identical.
Figure 11.22 shows VDD versus the inverse of access time for BMIG FinFET
at worst case process corner for read stability, illustrating the switching between
LS and HS modes during the read operation. In the frequency band FB4, access
FinFET is operated in the HS mode since the frequency requirements are high. In
FB3, VDD scaling is precluded by read failures in the HS mode. The cell is still
operated in the HS mode since LS mode of operation needs higher VDD to meet the
frequency requirements in FB3. As transition is made from FB3 to FB2, the access
transistor is switched to the LS mode, which allows further VDD scaling. Finally,
VDD scaling is limited by read failures in the LS mode in FB1. Note, in TG FinFET
SRAM, VDD scaling is not possible at low frequencies due to read failures; however
due to bi-modal operation during read, BMIG FinFET SRAM exhibits significantly
246 S.K. Gupta and K. Roy
lower minimum supply voltage (VMIN ) at low frequencies. The write operation of
BMIG FinFET is identical to TG FinFET SRAM, hence write-ability of the cell is
not affected. Thus, read-write conflict is mitigated in BMIG FinFET SRAM at low
frequencies, at the cost of larger cell area due to back gate contacts of the access
transistors [22].
In Sect. 7.1, we discussed the benefits of BMIG FinFETs for DVF scalable
SRAMs. The bi-modal operation of BMIG FinFET SRAMs results in significant
reduction in VMIN at low frequencies, compared to TG FinFET SRAM. However, no
improvement in write-ability is achieved. In this sub-section, we discuss tri-mode
independent gate (TMIG) FinFET SRAM which achieves improvement in write-
ability along with increase in read stability at low frequencies and similar access
time for high frequency operation [22].
This technique exploits the fact that spacer lithography technology used for
fabrication of FinFETs [26] offers a single-fin and two-fin transistors in the same
lithographic pitch. Thus, depending on the contact size and the fin pitch, the
difference between the device footprint of FinFETs with one and two fins may be
minimal. In TG and BMIG FinFET SRAMs, one fin is used in the access transistor,
which requires etching of one of the fins after the two fins are fabricated using
spacer lithography technique. However, in TMIG FinFET SRAM, the second fin
is not etched away (Fig. 11.23). Instead, the gate metal is etched from the top
of Fin2 (Fig. 11.23) to allow independent control of the two gates of Fin2. One
of the gates of Fin2 is connected to the two gates of Fin1 (FG) while the other
gate of Fin2 is controlled by BG. This structure allows three-way modulation
of the strength of the access transistor, as shown in Fig. 11.24. Let us define
VFG and VBG to be the voltages applied at the FG and BG of TMIG FinFET.
Therefore, (VFG ,VBG ) D (VDD ,VDD ) leads to the highest strength of TMIG FinFET
(high strength (HS) mode) followed by (VFG ,VBG ) D (VDD ,0) (medium strength
(MS) mode) and (VFG ,VBG ) D (0,VDD ) (low strength (LS) mode).
Figure 11.25 shows the schematic and layout of TMIG FinFET based 6 T SRAM.
An increase in area in TMIG FinFET SRAM compared to TG FinFET SRAM is
Fig. 11.25 (a) Schematic and (b) thin-cell layout of TMIG FinFET SRAM
observed due to the back gate contacts of the access transistors, similar to BMIG
FinFET SRAM. Also note, an increase in x-dimension of TMIG FinFET SRAM
cell allows three fins to be used in the pull down transistor without further increase
in the area. The two wordlines modulate the strength of the access transistor between
three values and achieve reduction in cell VMIN along with the mitigation of read-
write conflict, as explained next.
248 S.K. Gupta and K. Roy
During write operation, TMIG FinFETs are operated in the HS mode by asserting
both the wordlines. This leads to increase in the write-ability of the cell. During the
read operation at high workload conditions, the access transistors are operated in
the MS mode by asserting only WL_WRH to achieve similar access time and read
stability as TG FinFET SRAM. Read operation during low workload conditions is
performed by operating TMIG FinFETs in the LS mode to increase read stability
and allow low voltage operation. The switching between MS mode and LS modes
of operation during read in TMIG FinFET SRAM is similar to that between HS and
LS modes in BMIG FinFET SRAM, as described in Sect. 7.1 (see Fig. 11.22). Due
to larger number of fins the TMIG FinFET SRAM, cell leakage increases. However,
note, MS mode in TMIG FinFET SRAM has a larger strength compared to a single
fin access transistor in TG FinFET SRAM. Therefore, VT of TMIG FinFET SRAM
can be increased to achieve reduction in cell leakage while still achieving similar cell
access time at high VDD compared to TG FinFET SRAM. Hold stability in TMIG
FinFET SRAM is slightly degraded compared to TG and BMIG FinFET SRAM due
to increase in fin ratio of pull-down and pull-up FinFETs (NPD :NPU ).
Figure 11.26 shows the iso-leakage comparisons of VDD versus frequency curves
for TMIG and TG FinFET SRAMs at the process corners that are worst from the
point of view of read stability and write-ability. It can be observed that dynamic
switching between MS and LS modes of operation during read achieves significant
reduction in VMIN at low frequencies while still maintaining similar performance
at high VDD . The plot in Fig. 11.26 is obtained by considering the four frequency
bands similar to Fig. 11.22 in case of BMIG FinFET SRAM. However, unlike
BMIG FinFET SRAM, TMIG FinFET SRAM achieves significant improvement in
write-ability and reduction in VMIN for the process corner that is the worst for write
operation (Fig. 11.26). Hence, tri-modal operation of TMIG FinFET SRAM allows
significant reduction in active VMIN of the cell across different process corners at the
cost of slight degradation in hold VMIN [22].
Fig. 11.26 Comparison of VDD versus 1/cell access time of TG and TMIG FinFET SRAMs for
process corners that are the worst for (a) read stability and (b) write-ability [22]
11 Low Power Robust FinFET-Based SRAM Design in Scaled Technologies 249
Fig. 11.27 Schematics of (a) BMIG [19] and (b) TMIG FinFET SRAMs with pass gate feedback
(PGFB) FinFET 6 T SRAM [24] and (c) comparison with TG FinFET SRAM
(WWL) which is charged to VDD during write. This weakens the pull-up transistors,
making it easier to flip the contents of the cell. During the read operation, WWL
is de-asserted which increases the strength of the pull-up transistors and raises
the logic threshold voltage of the inverter formed by NR, PR and AXR, thereby
increasing the read stability of the cell. However, the extra back gate contact can
lead to significant increase in the area of the bit-cell. Other techniques like gate-
workfunction engineering can be used to increase the write-ability. However, this
leads to lower hold stability and increase in the cell leakage [24].
Pass-gate feedback can also be employed in SRAMs using TMIG FinFETs as the
access transistors, as shown in Fig. 11.27b. During the read operation, the strength
of AXL (with BG connected to Q D ‘0’) is comparable to a single-fin FinFET [24].
In other words, AXL in TMIG FinFET SRAM is slightly stronger than AXL in TG
FinFET SRAM (since single-fin FinFETs are typically used as access transistors in
TG FinFET SRAM, as discussed previously). This tends to mildly decrease the read
stability of the cell. On the other hand, the strength of AXR (with BG connected to
QB D ‘1’) is twice of AXR in a TG FinFET SRAM. This increases the trip point
of the inverter formed by PR. NR and AXR and therefore, tends to increase the
read stability. The overall effect is that read stability of TMIG FinFET SRAM is
comparable to that of TG FinFET SRAM.
During the write operation, the strength of AXR (with BG connected to
QB D ‘1’) in TMIG FinFET SRAM is higher than that in TG FinFET SRAM. As a
result, large improvement in write-ability is achieved. As the voltage at node QB
discharges during write, the strength of AXR decreases, but is still sufficient to
provide high write-ability.
Hence, TMIG FinFET SRAM with pass-gate feedback achieves significant
improvement in the write-ability at comparable read-stability. In addition, co-
optimization of the gate workfunction leads to lower cell leakage along with
comparable hold stability at the cost of a mild increase in the access time. The
cell-level comparison of BMIG and TMIG FinFET SRAMs employing pass-gate
feedback is shown in Fig. 11.27c.
Connecting the back gate of the access transistor to the storage node may
have issues with the leakage and stability of the unaccessed cells during read
and write operations [19]. During the read operation, cell leakage and leakage
through AXL may increase significantly for the bit-cell belonging to an unaccessed
row. Considering the worst case, suppose the accessed bit-cell stores ‘0’ and all
the unaccessed bit-cells in the same column store ‘1’ (value opposite to that of
the accessed bit-cell), as shown in Fig. 11.28a. During the read operation, BL
discharges to some voltage (VDD -
V). Hence, VGS for the back gate of AXL of all
the unaccessed cells becomes positive (
V), leading to an increase in cell leakage
and bitline leakage through AXL. Increase in bitline leakage delays the development
of voltage differential
V between BL and BLB, resulting in increased access time.
During the write operation, the bit-cells belonging to the unaccessed rows may be
disturbed if they store the same value as the accessed node (see Fig. 11.28b—note
both the accessed and unaccessed cells store ‘0’ i.e. VQ D 0). Since the back gate
of AXR is turned ON, the storage node VQB of the unaccessed cells is disturbed
11 Low Power Robust FinFET-Based SRAM Design in Scaled Technologies 251
Fig. 11.28 Array-level implications of using pass gate feedback in SRAMs during (a) read and
(b) write. The access transistors shown in the figure are BMIG FinFETs or Fin2 of TMIG FinFETs
(Fin1 of TMIG FinFETs not shown)
8 Summary
References
1. Yuan Taur and Tak. H Ning, “Fundamentals of Modern VLSI Devices”, Cambridge University
Press, pp: 140–144.
2. Shekhar Borkar, Tanay Karnik, Siva Narendra, Jim Schantz, Ali Keshavarzi and Vivek De,
“Parameter variations and impact on circuits and microarchitecture”, Proc. IEEE Design
Automation Conference, 2003, pp: 338–342.
3. G.G. Shahidi, T.H. Ning, T.I. Chappell, J. H. Comfort, B. A. Chappell, R. Franch, C. J.
Anderson, P. W. Cook, S. E. Schuster, M.G. Rosenfield, M.R. Polcari, R.H. Dennard and B.
Davari, “SOI for a 1-volt CMOS technology and application to a 512 Kb SRAM with 3.5 ns
access time”, Int. Electron Device Meeting, Dec 1993, pp: 813–816.
4. K. K. Young, “Short-Channel Effect in Fully Depleted SOI MOSFETs”, IEEE Trans. Electron
Devices, vol. 36, no. 2, Feb. 1989, pp: 399–402.
5. M. Vinet, T. Poiroux, J. Widiez, J. Lolivier, B. Previtali, C. Vizioz, B. Guillaumot, Y. Le Tiec,
P. Besson, B. Biasse, F. Allain, M. Casse, D. Lafond, J.-M. Hartmann, Y. Morand, J. Chiaroni
and S. Deleonibus, “Bonded planar double-metal-gate NMOS transistors down to 10 nm”,
IEEE Electron Device Letters, vol. 26, no. 5, May 2005, pp: 317–319.
6. J.P. Colinge, M.H. Gao, A. Romano-Rodriguez, H. Maes, C. Claeys, “Silicon-on-insulator
‘gate-all-around device’”,Int. Electron Device Meeting, Dec. 1990, pp: 595–598.
7. J. Widiez, F. Dauge, M. Vinet, T. Poiroux, B. Previtali, M. Mouis, S. Deleonibus, “Experi-
mental gate misalignment analysis on double gate SOI MOSFETs”, Proc. Int. SOI Conf., Oct.
2004, pp:185–186.
8. D. Hisamoto, W. –C. Lee, J. Kedzierski, H. Takeuchi, K. Asano, C. Kuo, E. Anderson, T-J
King, J. and C. Hu;, “FINFET- a self-aligned double-gate MOSFET scalable to 20 nm”, IEEE
Trans. Electron Devices, vol. 47, no. 12, pp: 2320–2325, Dec 2000.
9. B.S. Doyle, S. Datta, M. Doczy, S. Hareland, B. Jin, J. Kavalieros, T. Linton, A. Murthy, R.
Rios, R. Chau, “High performance fully-depleted tri-gate CMOS transistors,” IEEE Electron
Device Letters, vol. 24, no. 4, April, 2003, pp: 263–265.
10. P. Ranade, H. Takeuchi, T. –J King and C. Hu, “Work Function Engineering of Molybdenum
Gate Electrodes by Nitrogen Implantation,” Electrochemical and Solid State Letters, vol. 4, no.
11, 2001, pp: G85-G87.
11. B. Yu, L. Chang, S. Ahmed, W. Haihong, S. Bell, Y. Chih-Yuh, C. Tabery, H. Chau, Q. Xiang,
T. –J. King, J. Bokor, C. Hu, M. –R. Lin and D. Kyser, “FinFET scaling to 10 nm gate length”,
Int. Electron Device Meeting, Dec 2002, pp: 251–254.
12. T. Ludwig, I. Aller, V. Gernhofer, J. Keinert, E. Nowak, R. V. Joshi, A. Mueller and
S. Tomaschko, “FinFET Technology for Future Microprocessors“, IEEE SOI Conference,
October 2003, pp. 33–34.
11 Low Power Robust FinFET-Based SRAM Design in Scaled Technologies 253
13. D. Lekshmanan, A. Bansal and K. Roy, ”FinFET SRAM: Optimizing Silicon Fin Thickness
and Fin Ratio to Improve Stability at iso Area,“ IEEE Custom Integrated Circuits Conference,
Sept. 2007, pp. 623–626.
14. H. Ananthan and K. Roy, ”Technology and circuit design considerations in quasi-planar
double-gate SRAM,“IEEE Transactions on Electron Devices, vol. 52, no. 2, Feb. 2006, pp.
242–250.
15. S. Gangwal, S. Mukhopadhyay and K. Roy, ”Optimization of Surface Orientation for High-
Performance, Low-Power and Robust FinFET SRAM,“ IEEE Custom Integrated Circuits
Conference, Sept 2006, pp. 433–436.
16. A. Bansal, S. Mukhopadhyay and K. Roy, ”Device-Optimization Technique for Robust and
Low-Power FinFET SRAM Design in NanoScale Era,“ IEEE Transactions on Electron
Devices, vol. 54, no. 6, June 2007, pp. 1409–1419.
17. A. Goel, S. K. Gupta and K. Roy, “Asymmetric Drain Spacer Extension (ADSE) FinFETs
for Low Power and Robust SRAMs”, IEEE Trans. Electron Devices, vol. 58, no. 2,Feb 2011,
pp:296–308.
18. K. Endo, S. –I. O’uchi, Y. Ishikawa, Y. Liu, T. Matsukawa, K. Sakamoto, J. Tsukada, K.
Ishii, H. Yamauchi, E. Suzuki, M. Masahara,” Enhancing SRAM cell performance by using
independent double-gate FinFET”, Int. Electron Device Meeting, Dec. 2008.
19. A. Carlson, Z. Guo, S. Balasubramanian, R. Zlatanovici, T. J. King Liu, B, Nikolic, “SRAM
Read/Write Margin Enhancements using FinFETs”, IEEE Trans. VLSI, vol. 18, no. 6, June
2010, pp:887–900.
20. Z. Liu, S.A. Tawfik, V. Kursun, “An independent-gate FinFET SRAM cell for high data
stability and enhanced integration density”, IEEE Int. SOI Conf., 2007, pp: 63–66.
21. S. A. Tawfik, Z. Liu, and V. Kursun, “Independent-Gate and Tied-Gate FinFET SRAM
Circuits: Design Guidelines for Reduced Area and Enhanced Stability”, Int. Conf. Microelec-
tronics, 2007, pp: 171–174.
22. S. K. Gupta, S. P. Park and K. Roy, “Tri-mode Independent Gate (TMIG) FinFETs for dynamic
voltage/frequency scalable 6 T SRAMs,” to appear in IEEE Trans. Electron Devices.
23. F. Moradi, S. K. Gupta, G. Panagopoulos, D. T. Wisland, H. Mahmoodi and K. Roy,
”Asymmetrically Doped FinFETs for Low-Power Robust SRAMs,“ IEEE TED, vol.58, no.12,
pp.4241,4249, Dec. 2011.
24. S. K. Gupta, J.P. Kulkarni, and K. Roy, ”Tri-Mode Independent Gate FinFET-Based SRAM
With Pass-Gate Feedback: Technology–Circuit Co-Design for Enhanced Cell Stability,“ IEEE
TED, vol.60, no.11, pp.3696,3704, Nov. 2013
25. Jan M. Rabaey, Anantha Chandrakasan and Borivoje Nikolic, “Digital Integrated Circuits, A
Design Perspective”, Prentice Hall Electronics and VLSI series, pp:657–661.
26. D. Esseni, A. Abramo, L. Selmi, E. Sangiorgi, “Physically based modeling of low field electron
mobility in ultrathin single- and double-gate SOI n-MOSFETs”, IEEE Trans. Electron Devices,
vol. 50, no. 12, Dec. 2003, pp:2445–2455.
27. D. Lekshmanan, A. Bansal, K. Roy, “Body Thickness Optimization and Sensitivity Analysis
for High Performance FinFETs,” Device Research Conf., June 2007, pp: 91–92.
28. Y. –K.Choi, T. –J. King, C. Hu, “A Spacer Patterning Technology for Nanoscale CMOS”, IEEE
Trans. Electron Devices, vol. 49, no. 3, March 2002, pp:436–441.
29. F. Bauer. G. Georgakos, D. Schmitt-Landseidel, “A Design Space Comparison of 6 T and 8 T
SRAM Core-Cells”, Lecture Notes in Computer Science, vol. 5349, 2009, pp: 116–125.
30. M. –L. Fan, Y. –S Wu, V P –H Hu, C –Y Hsieh, P. Su, C. –T. Chuang, “Comparison of 4T and
6T FinFET SRAM Cell for Subthreshold Operation Considering Variability – A Model-Based
Approach”, IEEE Trans. Electron Devices, vol. 58, no. 3, March 2011, pp:609–616.
31. K. C. Chun, P. Jain, J. H. Lee and C. H. Kim, “A 3 T Gain Cell Embedded DRAM Utilizing
Preferential Boosting for High Density and Low Power On-Die Caches”, IEEE, Journal Solid
State Circuits, vol. 46, no. 6. June 2011, pp: 1495–1505.
32. M –L Fan et al, “Comparison of 4T and 6T FinFET SRAM Cell for Subthreshold Operation
Considering Variability – A Model-Based Approach”, IEEE Trans. Electron Devices, vol. 58,
no. 3, March 2011, pp: 609–616.
Chapter 12
Variability-Aware Clock Design
Abstract High-performance clock network design has been a challenge for many
years due to the drastically increasing effect of process variability. In addition,
tight power budgets have lowered supply voltage levels which make designs more
sensitive to noise. Together, variability and noise present a colossal challenge to
clock designers in order to meet timing, yield, and power simultaneously. This
chapter discusses the different strategies that designers use to ameliorate variability
and noise problems in clock network design.
1 Introduction
The clock is the most important signal in any synchronous design, because it
controls all synchronous communication. A failure in the clock signal can lead to
errors in one or more sequential elements. Strict constraints on clock timing must
be used to guarantee the correct behavior of synchronous digital blocks. The clock
period must be defined in such a way that data will always be ready and stable before
clock edge arrives at the clock pin of the sequential elements which are also called
clock sinks.
Figure 12.1 shows the timing parameters that must be considered to safely
determine clock frequency and ensure correct operation. Assume that TCK .n/0 is
clock arrival time at flip-flop A, at clock cycle n and TCK .n C 1/00 is clock arrival
time at flip-flop B during clock cycle .n C 1/. Data propagation time through flip-
flop A is represented by TPFFA . The minimum (maximum) delay associated with
the combinational logic is represented by TC;MI N (TC;MAX ). Flip-flop B setup and
hold times are represented respectively by T SFFB and THFFB . Given that TCLOCK
Tpffa Tc TSffb
A B
Tck(n)’ Tck(n)’’
Clock
is the clock period, Eqs. (12.1) and (12.2) represent the timing constraints that must
be satisfied to guarantee the correct data storage for the case illustrated by Fig. 12.1.
Equation (12.1) represents a lower bound to the clock period. The clock period
has to be chosen such that Eq. (12.1) is going to be respected for any path connecting
any two flip-flops in the design. Equation (12.2) limits the minimum delay TC;MI N
allowed for any combinational path. Both equations have to be simultaneously
respected to guarantee the correct circuit behavior.
The main challenge faced by clock designers is how to design robust clock
distribution networks respecting Eqs. (12.1) and (12.2) within the allowed power
budget. Traditional worst-case analysis is known to be robust but is usually too
pessimistic and leads to over-designed circuits that waste power. In the last decade
statistical techniques have begun to aid designers to safely reduce the amount of
pessimism in IC design. Statistical design, however, is taking longer than expected
to be applied to clock design, because the importance of clock timing on the circuit
robustness makes designers conservative.
An alternative to traditional clock tree based architectures began to appear
in microprocessor designs in the late 1990s. To decrease the effect of process
and environmental variability, microprocessor designers adopted clock meshes and
clock spines in many designs [3, 27, 35, 40]. Due to the redundant structure of
meshes and spines, fast and slow paths are able to compensate each other and
reduce the clock arrival time variability [18]. The main disadvantage presented by
these architectures, however, is the large power penalty caused by the extra wiring
resources and short circuit currents.
The remainder of this chapter discusses the impact of variation sources on the
clock design and presents different techniques to make the clock network less
sensitive to these variations. In the remainder of this section, we will present some
12 Variability-Aware Clock Design 257
important definitions and discuss the parameters that make a clock network robust.
In Sect. 2, the different sources of variability and their impact on clock networks are
discussed. Sections 3 and 4 present design strategies to create clock networks that
are less sensitive to variability.
1.1 Definitions
The clock network is designed so that the time it takes for a clock edge to travel from
the clock root to any sink is the same. This delay is defined as the clock arrival time
(At ). Due to design mismatch, the clock arrival time at the clock sinks is often not
exactly the same. The difference between the clock arrival time at different sinks is
defined as clock skew. Given clock sinks si ; sj 2 S and At .si / which represents the
clock arrival time at the clock sink si , the clock skew (S k) for a set of clock sinks S
can be defined in a more formal manner as follows:
1.2 Robustness
Clock robustness is key for the correct operation of any circuit. Unlike the other
signal lines, a fault in the clock signal can easily propagate to hundreds of sequential
elements generating errors in multiple logic bits. These logic errors can occur due
to two reasons: delay violations and glitches. Both types of faults can in turn create
logic errors.
Clock delay violations are caused by the non-robust budgeting of clock margins.
When the clock skew violates the budgeted margins due to process or environmental
variations, Eqs. (12.1) and (12.2) are also violated. This situation leads to the storage
of invalid values in some sequential elements. If the clock arrival time at the desti-
nation flip-flop is delayed or the clock edge at the transmitter flip-flop is expedited,
the data hold time constraints can be violated by the fastest paths in combinational
258 M.R. Guthaus and G. Wilke
logic (Eq. (12.2)). When the inverse situation happens, i.e., transmitter clock is late
or receiver clock is early, the time for data to propagate through combinational logic
is shortened leading to setup time violations (Eq. (12.1)).
Clock glitches are the another potential cause of logic errors. It can be cause by
two main issues, radiation [28] and poor synchronization of clock gating control
signals on latch-free clock gaters. A clock glitch is a spurious transition in the
clock signal that can lead to invalid values stored in affected sequential elements.
Although invalid data can be stored in memory elements due to a clock glitch, it
is possible that the failure is corrected when the next clock edge arrives. In case
the invalid data is not corrected by the following clock edge (e.g., write enable
signal was turned off) the error may propagate to multiple sequential elements in
the following clock cycles. Even when the invalid data is quickly corrected it can
still cause errors. During the time that a flip-flop stores an invalid value a data glitch
is generated. The data glitch will cause errors if it has enough time to propagate to
a memory element.
A robust clock network has to minimize the probability that a delay fault will
occur and minimize the chances of a clock glitch reaching several clock sinks. In
the next sections, techniques to design robust clock networks are described.
2 Variability
Understanding the different variability sources and properly modeling them is a very
important task during the clock design. When process and environmental effects
were not the main cause of clock skew, the analysis of clock network performance
could be done using a corner-based approach. Corner-based analysis bounds all
the parameters in the design by minimum and maximum values. Corner analysis is
robust if the worst case corner can be identified, but it also became overly pessimistic
as the impact of variability increases.
Statistical analysis resurged in the early 2000s as an alternative to reduce
pessimism under scenarios where the impact of variation sources is large. The use
of statistical methods do not guarantee by itself an accurate estimation of clock
skew values. The way that each source of variation is modeled strongly affects the
accuracy in the clock skew estimation. The correct modeling of each variability
source is fundamental for the proper estimation of the delay margins. In this section,
we discuss the characteristics of the different types of variation and how they should
be modeled.
(ignoring the X). PVTX variations, however, are a significant challenge in the
synthesis of clock networks, because they can add skew and jitter which reduces
timing margins.
Each form of variation can be modeled as either a statistical distribution or
as a bounded corner value. The distribution and, more importantly, the correla-
tion of parameters must be considered in order to model a source of variation
[29]. Variation that is observed from die-to-die (D2D) is called inter-die variation
while variation that occurs within a die (WID) is called intra-die variation. Often
the WID variation exhibits significant spatial correlation so that nearby locations
are correlated but distant locations are not. Many sources of variation are systematic
which can be predicted in terms of design or process details. Systematic variation
is distinctly different from correlated variation because it is not a random variable
when the cause of variation is considered [15].
2.1.1 Process
2.1.2 Voltage
On-chip power supply variation is one of the most significant sources of variation in
a clock network [39,41]. Due to the high power consumption during peak switching,
large L ddti voltage drops can occur in the power grid. Chip-level clock distributions
include decoupling capacitors with the large clock drivers to reduce supply-
related variation. The decoupling capacitors low-pass filter the supply variation
and remove sharp current peaks. The remaining supply variation has high spatial
correlation [41].
2.1.3 Temperature
On-chip temperature fluctuations are also very significant for both device and
interconnect variations. Temperature directly affects the mobility of silicon and the
resistance of metal wires [2, 22]. The effect on interconnect capacitance, however,
is relatively small. The spatial correlation of temperature is highly dependent on the
260 M.R. Guthaus and G. Wilke
package and heat sink model. For high-performance designs with large heat sinks,
there can be temperature gradients of 30–40 ı C on-chip. Whereas, for GPUs (with
high activity in all regions) or low-performance designs without active heat removal,
the ambient chip temperature will be higher yet gradients will be lower [41]. Despite
the high spatial correlation, significant temperature-induced skew can be present
between the initial (cool) chip and a steady-state thermal map due to mismatch
in clock sensitivities to temperature. The temperature map can also dynamically
change depending on activity patterns.
2.1.4 Crosstalk
The clock network is the largest cross-talk aggressor due to its large capacitance and
span. However, signal nets also affect the jitter of individual clock branches in trees.
Adjacent layers have different routing orientations and thicker oxides, so within-
layer coupling is the most significant problem. Clock wires are usually shielded to
reduce these crosstalk effects [39].
The law of large numbers suggests that independent WID variation can be reduced
simply by increasing the number of random variables. For example, the mean and
standard deviation of a sum of independent Gaussian distributions is
X sX
D i i nd D i2 (12.4)
8i 8i
which implies that increasing ji j will also decrease the total ind . As a limit, this will
approach zero with infinite Gaussians. As an example, consider a single transistor
which is known to have Vth variation due to random dopant fluctuation. Pelgrom’s
model [31] demonstrates this with the relation
1
Vth / p (12.5)
WL
that implies that larger transistors will have less variation. Clock buffers in general
are large devices ranging from 10 to 64 minimum size for tree-type distributions
to huge 29 cm drivers for clock grids [4]. The previous models correctly imply that
Vth is in fact not very significant for clock buffers due to the averaging over very
large device sizes.
12 Variability-Aware Clock Design 261
The problem with the previous observation is that the law of large numbers only
holds true if the variations are uncorrelated. In the case of full correlation, the mean
and standard deviation become
X X
D i corr D i (12.6)
8i 8i
There are several approaches used in industry to model variation. The oldest
approach is to increase the tbudget with a timing margin to “guard band” the setup
and hold times without specifically analyzing the variation. Most frequently, timing
sign-off will use worst-case corners that analyze the worst local skew when portions
of a clock distribution are “fast” and others are “slow”. The number of corners
in such approaches were traditionally only for PVT variations, but additional
corners have been added for individual metal layers, Negative Bias Temperature
Instability (NBTI), and other sources of variation. This is to prevent the process (P)
term from becoming overly pessimistic when all sources are assumed worst-case
simultaneously.
Despite an exponentially increasing number of corners [44], multi-corner timing
analysis is the current industry standard practice. Incremental algorithms are used
to perform common path pessimism removal (CPPR) [20, 49] due to shared clock
buffers that cannot be simultaneously fast and slow in a single corner. Most CPPR
algorithms are iterative and worst-case exponential time complexity. Other methods
have used data from multiple corners and normalized delay spreads to quantify the
effects of variation on clocks [33].
Statistical static timing analysis (SSTA) solves the previous problems [1, 7, 45],
but has a number of daunting challenges of its own such as obtaining accurate
variation models, efficiently considering non-Gaussian and non-linear sources of
variation, and gaining designer acceptance. There are several surveys and books that
cover the recent advances in SSTA [5,13,36] and are applicable to clock researchers.
We now summarize the requirements that are needed to accurately assess the
robustness of a clock distribution with respect to variation.
Both device and interconnect models must be considered. Ignoring device
variation implies that one can over-buffer a design making interconnect variation
less significant. Similarly, without interconnect variation, one can under-buffer
the design which makes the delay of the devices less significant. The relative
importance of device and interconnect variability also depends on whether the
synthesis algorithm is considering local, regional, or global distributions. Local
distributions are affected more significantly by device variability whereas global
distributions are affected more significantly by interconnect variability.
The physical manifestation of variability must be considered in the model.
Blindly selecting variation as a percentage of delay or a parameter is not correct.
Most variations (Vth , LER, etc.) are not a percentage. The correct model requires
careful consideration of the specific source of variation.
12 Variability-Aware Clock Design 263
Assuming that only WID variation matters is not correct. D2D variation also
matters because parameter mismatch (between layers, implants, etc.) can lead to
different insertion delay sensitivities and therefore skew.
Ignoring correlation for WID variation is not correct. If WID variation is
assumed to be completely uncorrelated the law of large numbers can always be
applied, but this may not be appropriate. It is necessary to consider the right
correlation model for each source of variability.
Both skew and jitter must be simultaneously considered. Otherwise, insertion
delay and jitter sensitivity can be increased to reduce skew or skew can be increased
to reduce jitter. In high-performance designs, skew and jitter are on the same order
of magnitude.
Variability must be analyzed statistically. Simple corner-based approaches are
too pessimistic and do not allow for the cancellation of correlated variation or the
reduction of variation due to the law of large numbers (when appropriate) [44].
Multi-corner methods do not scale since the number of corners will become too large
[44]. Therefore, statistical methods should be used with the appropriate variation
models.
3 Clock Trees
The simplest style of clock distribution network is an H-tree. The H-tree, as shown in
Fig. 12.2, uses a recursive “H” routing pattern and inserts buffers at regular intervals
in the hierarchy. H-trees are advantageous because they are simple and fairly power
efficient. They are also symmetric and therefore resistant to inter-die sources of
variation. However, H-trees can still be plagued by intra-die sources of variation as
shown in [19]. Besides their robustness, H-trees are only typically useful for the
top levels of clock distribution since the sink loads and locations are never uniform.
Loads can be balanced with dummy gates, but this can increase the clock tree power
consumption considerably. In addition, individual sinks may have differing clock
edge requirements according to a useful skew budget.
In response to the non-uniformity of clock sinks, balanced clock trees offer
a low-power solution with equal delay to clock sinks as illustrated by Fig. 12.3.
The diagonal routes are later embedded into traditional rectilinear layers. Several
researchers have presented exactly zero skew [6, 8, 11, 43], bounded skew [10, 21]
and heuristic methods [23, 24] to construct these trees. Most balancing techniques
rely on simplified delay models such as Elmore delay [12], which are known to be
only approximations of more complex delays [32,34]. However, they are still useful
in generating initial solutions that can be refined with more accurate models later.
The main disadvantage of balanced trees is that they often assume process
parameters are nominally equal for different interconnect layers and do not consider
metal 6
net A
net B
metal 6
net A
net B
the impact of process variation on skew. The different variability sources described
in Sect. 2 can introduce skew in the clock tree even when it has zero-skew under
the nominal case. The effect of variations sources is very significant in today’s
technologies and its effect has to be account on every design.
One way to make a clock network more robust is by equalizing the way it affects
the clock network. Die-to-die variability effects on the clock network, for instance,
can be completely eliminated by equalizing the sensitivity of each parameter on the
delay on each path. Consider the two nets shown in Fig. 12.4, net A and B that
are routed using metal 5 (vertical) and metal 6 (horizontal) wires. Assuming that
the delay of nets A and B must be equalized, it is important that both nets have
exactly the same wire length. The routing shown in Fig. 12.4 guarantees nominal
zero skew between them, but if during fabrication metal 5 is faster while metal 6 is
slower, net B would become faster than net A. This situation can be avoided if both
nets are routed using the same amount of routing in each layer in the same order as
shown in Fig. 12.5. By doing so, if one metal layer gets faster while the other gets
slower both nets are affected in the same fashion. Sensitivity matching algorithms
have attempted to do this by sizing buffers and wires to equalize the sensitivities of
different process parameters using statistical timing analysis [17].
On the other hand, symmetrical clock trees are very efficient to reduce the
effect of die-to-die variations. Custom symmetrical clock trees can be built using
symmetrical routing algorithms such as [16, 30]. In general symmetrical clock trees
are more robust to die-to-die variations at the expense of higher power consumption
due to the longer total wirelength.
266 M.R. Guthaus and G. Wilke
CK_FFA CK_FFB
Original Available Tc
New Available Tc
CK_FFA d
CK_FFB d
4 Clock Meshes
A clock mesh is a grid of orthogonal wires to which clock buffers are connected.
The grid structure allows mesh buffer outputs to interact to each other. Whenever a
buffer switches before the others, it is slowed down by the late switching buffers.
This behavior reduces the overall difference between early and late arrival time. The
capability to compensate for any variability source makes clock meshes popular in
high-performance designs. Meshes and spines are widely used in the design of the
clock distribution for microprocessors [3, 27, 35, 40].
The buffers that drive clock meshes are typically driven by a clock tree. Non-
homogeneous clock sink distribution and obstructions may make the clock mesh
irregular requiring the clock tree to be custom designed for it. Clock sinks are
usually not directly connected to the clock mesh. Local buffered clock trees are
designed to cluster clock sinks and decouple the actual sink capacitance from the
clock mesh. The total clock mesh power consumption is reduced by decoupling
clock sink capacitance.
The high performance offered by clock meshes comes at the cost of high
power consumption. The redundant routing that shorts all mesh buffers represents
extra capacitance that must be charged/discharged every clock cycle. Clock gating
techniques are not as effective on clock meshes as they are on clock trees. Clock
gaters must be placed in the local cock buffers after the clock mesh. The mesh must
be divided into smaller sub-meshes in order to gate the mesh itself.
Short circuit current is another important contributor to total mesh power. As
usual, internal short circuit current flows between PMOS and NMOS devices when
the clock signal is switching. However, the clock mesh also has significant short
circuit currents flowing between different mesh buffers. Moreover, during the time
that mesh buffers are compensating for variation, short circuit currents flow between
buffers as illustrated in Fig. 12.7.
The contribution of short circuit current to total power consumption is dependent
on two design parameters: the clock skew at the input of the mesh buffers and
the total load driven by the clock mesh. When the input clock signal is poorly
synchronized at the input of the mesh buffers, short circuit current can have a long
duration which increases the power consumption. The contribution of short circuit
current to the clock mesh power consumption as a function of the input clock skew
is shown in Fig. 12.8. Figure 12.8 shows power consumption on the y-axis while the
x-axis shows the maximum clock skew at the input of mesh buffers normalized to a
1 GHz clock period. A roughly linear increase in the power consumption is observed
when the input skew increases.
The other main contributor to the short circuit power consumption are the mesh
buffer sizes. Larger mesh buffers produce larger short circuit currents. Mesh buffer
sizes are determined by the load connected to the clock mesh, the higher the load
the larger mesh buffers are required to be so timing constraints can be met. The load
driven by the clock mesh can be reduced by adding more level of buffers between
268 M.R. Guthaus and G. Wilke
T1 T2
VDD
OFF
CLK A
ON
GND
T1 < t < T2
VDD
ON
CLK B
OFF
GND
Fig. 12.7 Short-circuit current caused by different arrival time at mesh buffers
Fig. 12.8 Short-circuit current caused by different arrival time at mesh buffers [47]
12 Variability-Aware Clock Design 269
the clock mesh and the clock sinks. By decoupling clock sink capacitances from the
clock mesh the contribution of the clock mesh power to the total power of the clock
network is minimized.
Besides the clear performance advantage offered by clock meshes, they also
facilitate fast clock design turn-around. Due to the tight skew requirements, clock
trees must be redesigned every time changes are made at the clock sink level. Clock
meshes, on the other hand, are less sensitive since every sink is driven by a large
set of nearby buffers. Therefore, the effect of changes at the clock sinks is shared
among many buffers. If many random changes are performed the overall effect tends
to average out. This property allows the design of the clock network to be kept
unchanged when minor changes happen at the clock sink level.
The clock mesh conveniently shields the top-level clock tree from any changes
in the clock sink distribution. However, the clock mesh requires that mesh buffers
be spread over the entire region covered by the clock mesh. Any large macro within
this region must accommodate space for mesh buffers. Leaving buffer-less regions
in the clock mesh drastically reduces variability tolerance and increases clock slew.
Clock spines are frequently used when macros cannot accommodate mesh buffers.
Clock spines are essentially one dimensional clock meshes with buffers placed
along the line so that the outputs are shorted and the inputs are driven by a clock
tree. Figure 12.9 illustrates a clock spine. Although the clock spine also allows
faster switching buffers to compensate for late switching ones, its performance is
significantly inferior than a clock mesh. The two dimensional structure of clock
meshes reduces the equivalent resistance between the mesh buffers and enables
more effective compensation for variability. Compensation between the spine
buffers is almost exclusively adjacent and leads to high skew between distant points.
Clock Root
Clock Sinks
Conclusion
Clock robustness is a critical part of chip design as any timing fault can
propagate and produce multiple logic errors. A robust clock network must
properly account for the effect of the process and environmental variation
to ensure that timing margins are not violated. The sensitivity of the clock
network becomes even more evident as technology scales. Shrinking devices
amplify the effect of process variability due to fabrication challenges and
random dopant fluctuation in transistor threshold implants. Reduced supply
voltages are critical to reduce power consumption but also lead to reduced
noise margins which make designs more susceptible to power supply noise,
crosstalk and radiation effects. As discussed in the previous chapter, it is clear
that clock networks must be designed with robustness in mind. Variation-
tolerant clock trees, clock meshes and clock spines will continue to gain
importance in high performance designs.
References
1. Agarwal, A., Zolotov, V., Blaauw, D.T.: Statistical timing analysis using bounds and selective
enumeration. IEEE Transactions on Computer-Aided Design 22(9), 1243–1260 (2003)
2. Ajami, A., Banerjee, K., Pedram, M.: Modeling and analysis of nonuniform substrate
temperature effects on global ULSI interconnects. IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems 24(6), 849–861 (2005)
3. Bailey, D., Benschneider, B.: Clocking design and analysis for a 600-MHz Alpha micropro-
cessor. Journal of Solid-State Circuits (JSSC) 33(11), 1627–1633 (1998)
4. Benschneider, B.J., Black, A.J., Bowhill, W.J., Britton, S.M., Dever, D.E., Donchin, D.R.,
Dupcak, R.J., Fromm, R.M., Gowan, M.K., Gronowski, P.E., Kantrowitz, M., Lamere, M.E.,
Mehta, S., Meyer, J.E., Mueller, R.O., Olesin, A., Preston, R.P., Priore, D.A., Santhanam,
S.: A 300-MHz 64-b quad-issue CMOS RISC microprocessor. Journal of Solid-State Circuits
(JSSC) 30(11), 1203–1214 (1995)
5. Blaauw, D., Chopra, K., Srivastava, A., Scheffer, L.: Statistical timing analysis: From basic
principles to state of the art. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems 27(4), 589–607 (2008)
6. Boese, K., Kahng, A.: Zero-skew clock routing trees with minimum wirelength. In: ASIC
Conf., pp. 1.1.1–1.1.5 (1992)
7. Chang, H., Sapatnekar, S.: Statistical timing analysis considering spatial correlations using
a single PERT-like traversal. In: International Conference on Computer-Aided Design,
pp. 621–625 (2003)
8. Chao, T.H., Hsu, Y.C., Ho, J.: Zero skew clock net routing. In: Design Automation Conference
(DAC), pp. 518–523 (1992)
9. Chen, H., Yeh, C., Wilke, G., Reddy, S., Nguyen, H., Walker, W., Murgai, R.: A sliding
window scheme for accurate clock mesh analysis. In: IEEE/ACM International Conference
on Computer-Aided Design, ICCAD, pp. 939–946 (2005)
12 Variability-Aware Clock Design 271
10. Cong, J., Koh, C.K.: Minimum-cost bounded-skew clock routing. In: International Symposium
on Circuits and Systems (ISCAS), pp. 215–218 (1995)
11. Edahiro, M.: Minimum path-length equi-distant routing. In: Asia-Pacific Conf. on Circuits and
Systems, pp. 41–46 (1992)
12. Elmore, W.C.: The transient response of damped linear networks. Journal of Applied Physics
19, 55–63 (1948)
13. Forzan, C., Pandini, D.: Statistical static timing analysis: A survey. Integration, the VLSI
Journal 42, 409–435 (2009)
14. Geannopoulos, G., Dai, X.: An adaptive digital deskewing circuit for distribution networks.
International Solid-State Circuits Conference (1998)
15. Gupta, P., Heng, F.L.: Toward a systematic-variation aware timing methodology. In: Design
Automation Conference, pp. 321–326 (2004)
16. Guthaus, M., Sylvester, D., Brown, R.: Clock tree synthesis with data-path sensitivity
matching. In: Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific,
pp. 498–503 (2008)
17. Guthaus, M.R., Sylvester, D., Brown, R.B.: Process-induced skew reduction in deterministic
zero-skew clock trees. In: Asia and South Pacific Design Automation Conference (ASPDAC),
pp. 84–89 (2006)
18. Guthaus, M.R., Wilke, G., Reis, R.: Revisiting automated physical synthesis of high-
performance clock networks. ACM Trans. Des. Autom. Electron. Syst. 18(2), 31:1–31:27
(2013)
19. Hashimoto, M., Yamamoto, T., Onodera, H.: Statistical analysis of clock skew variation
in H-tree structure. In: International Symposium on Quality Electronic Design (ISQED),
pp. 402–407 (2005)
20. Hathaway, D., Alvarez, J.P., Belkbale, K.P.: Network timing analysis between signals travers-
ing a common circuit path. United States Patent 5,636,372 (1997)
21. Huang, D.J.H., Kahng, A.B., Tsao, C.W.A.: On the bounded-skew clock and Steiner routing
problems. In: Design Automation Conference (DAC), pp. 508–513 (1995)
22. Im, S., Srivastava, N., Banerjee, K., Goodson, K.E.: Scaling analysis of multilevel interconnect
temperatures for high performance ICs. IEEE Transactions on Electron Devices 52(12),
2710–2719 (2005)
23. Jackson, M.A.B., Srinivasan, A., Kuh, E.S.: Clock routing for high performance ICs. In: Design
Automation Conference (DAC), pp. 573–579 (1990)
24. Kahng, A.B., Cong, J., Robins, G.: High-performance clock routing based on recursive
geometric matching. In: Design Automation Conference (DAC), pp. 322–327 (1991)
25. Kapoor, A., Jayakumar, N., Khatri, S.P.: A novel clock distribution and dynamic de-skewing
methodology. In: International Conference on Computer-Aided Design (ICCAD), pp. 626–631
(2004)
26. Kim, S.D., Wada, H., Woo, J.C.S.: TCAD-based statistical analysis and modeling of gate
line-edge roughness effect on nanoscale MOS transistor performance and scaling. IEEE
Transactions on Semiconductor Manufacturing 17(2), 192–200 (2004)
27. Kurd, N.A., Barktullah, J.S., Dizon, R.O., Fletcher, T.D., Madland, P.D.: A multigigahertz
clocking scheme for the pentium 4 microprocessor. IEEE Journal of Solid-State Circuits
36(11), 1647–1653 (2001)
28. Mallajosyula, A., Zarkesh-Ha, P.: A robust single event upset hardened clock distribution net-
work. In: Integrated Reliability Workshop Final Report, 2008. IRW 2008. IEEE International,
pp. 121–124 (2008). DOI 10.1109/IRWS.2008.4796101
29. Nassif, S.R.: Modeling and forecasting of manufacturing variations (embedded tutorial). In:
Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 145–150 (2001)
30. Ozdal, M.M., Hentschke, R.F.: Exact route matching algorithms for analog and mixed signal
integrated circuits. In: Proceedings of the 2009 International Conference on Computer-Aided
Design, ICCAD ’09, pp. 231–238. ACM, New York, NY, USA (2009)
31. Pelgrom, M., Duinmaijer, A., Welbers, A.: Matching properties of MOS transistors. Journal of
Solid State Circuits (JSSC) 24(5), 1433–1439 (1989)
272 M.R. Guthaus and G. Wilke
32. Pillage, L.T., Rohrer, R.: Asymptotic waveform evaluation for timing analysis. IEEE Transac-
tions on Computer-Aided Design 9, 352–366 (1990)
33. Rajaram, A., Damodaran, R., Rajagopal, A.: Practical clock tree robustness signoff metrics.
International Symposium on Quality Electronic Design (ISQED) pp. 676–679 (2008)
34. Ratzlaff, C.L., Gopal, N., Pillage, L.T.: RICE: Rapid interconnect circuit evaluator. In:
Proceedings of the 28th conference on ACM/IEEE design automation, pp. 555–560 (1991)
35. Restle, P., Carter, C., Eckhardt, J., Krauter, B., McCredie, B., Jenkins, K., Weger, A., Mule, A.:
The clock distribution of the POWER4 microprocessor. In: International Solid-State Circuits
Conference (ISSCC), pp. 144–145 (2002)
36. Sapatnekar, S.: Timing. Springer (2004)
37. Steinhogl, W., Schindler, G., Steinlesberger, G., Traving, M., Engelhardt, M.: Impact of
line edge roughness on the resistivity of nanometer-scale interconnects. Microelectronic
Engineering 76, 126–130 (2004)
38. Stine, B.E., et al., D.S.B.: The physical and electrical effects of metal-fill patterning practices
for oxide chemical-mechanical polishing processes. IEEE Transactions on Electron Devices
45(3), 665–679 (1998)
39. Sze, C.: Personal communication (2010). IBM Austin Research Lab, Austin, TX
40. Tam, S., Rusu, S., Desai, U.N., Kim, R., Zhang, J., Young, I.: Clock generation and distribution
for the first ia-64 microprocessor. IEEE Journal of Solid-State Circuits 35(11), 1545–1552
(2000)
41. Teodorescu, T.: Personal communication (2010). ATI Radeon 5870 Clock Designer, Sunny-
vale, CA
42. Tsai, J.L., Zhang, L., Chen, C.C.P.: Statistical timing analysis driven post-silicon tunable clock-
tree synthesis. In: International Conference on Computer-Aided Design (ICCAD), pp. 575–581
(2005)
43. Tsay, R.S.: Exact zero skew. In: International Conference on Computer-Aided Design
(ICCAD), pp. 336–339 (1991)
44. Visweswariah, C.: Death, taxes and failing chips. In: Design Automation Conference,
pp. 343–347 (2003)
45. Visweswariah, C., Ravindran, K., Kalafala, K., Walker, S.G., Narayan, S.: First-order
incremental block-based statistical timing analysis. In: Design Automation Conference,
pp. 331–336 (2004)
46. Wang, R., Koh, C.K.: A frequency-domain technique for statistical timing analysis of
clock meshes. In: IEEE/ACM international conference on Computer-aided design, ICCAD,
pp. 334–339. Piscataway, IEEE Press, San Jose, CA (2007)
47. Wilke, G., Fonseca, R., Mezzomo, C., Reis, R.: A novel scheme to reduce short-circuit power
in mesh-based clock architectures. In: Symposium on Integrated Circuits and System Design
(SBCCI), pp. 117–122 (2008)
48. Ye, X., Li, P., Zhao, M., Panda, R., Hu, J.: Analysis of large clock meshes via harmonic-
weighted model order reduction and port sliding. In: IEEE/ACM international conference on
Computer-aided design, ICCAD, pp. 627–631. Piscataway, IEEE Press, San Jose, CA (2007)
49. Zejda, J., Frain, P.: General framework for removal of clock network pessimism. In: Interna-
tional Conference on Computer-Aided Design (ICCAD), pp. 632–639 (2002)
50. Zhang, L., Yu, W., Zhu, H., Zhang, W., Cheng, C.K.: Clock skew analysis via vector fitting
in frequency domain. In: International Symposium on Quality Electronic Design, ISQED, 9.,
pp. 476–479. Los Alamitos, IEEE Computer Society, San Jose, CA (2008). DOI 10.1109/
ISQED.2008.4479780