An ASIC Primer: LSI Logic Corporation 1988 - 2000
An ASIC Primer: LSI Logic Corporation 1988 - 2000
Table of Contents
Preface - Preface ..........................................................................................0-1
Complexity ..................................................................................1-5
NRE .............................................................................................1-6
Performance ................................................................................1-10
Section 3 - Feed Late-Occurring Signals into the Last Logic Levels ...........3-3
Section 5 - Use Duplicate Logic to Reduce Fanout and Increase Speed ......3-5
Speed ...........................................................................................4-28
Subtleties .....................................................................................5-23
lsints ............................................................................................5-29
Summary .....................................................................................8-3
List of Figures
Figure 2.1 - Comparison of 2-input NAND and NOR Gates ........................2-2
Figure 2.2 - Crowbar Current ......................................................................2-4
Figure 2.3 - I/O Slot Used as Internal Buffer ...............................................2-5
Figure 2.4 - Repeater Buffers .......................................................................2-7
Figure 2.5 - Split Fanout ..............................................................................2-8
Figure 2.6 - A Bad Place for a Memory .......................................................2-12
Figure 3.1 - Capacitance Due to Cells vs. Capacitance Due to Wires ........3-2
Figure 3.2 - Comparison of 2-input NAND and NOR Gates ........................3-3
Figure 3.3 - Comparison of 2-input NAND and NOR Gates ........................3-3
Figure 3.4 - 3-stage Johnson Counter with Count Sequence .......................3-4
Figure 3.5 - Duplicate Logic to Increase Speed ...........................................3-5
Figure 3.6 - Better Wire Models After Cell Placement ................................3-7
Figure 4.1 - Classic Prop-Delay Based Pulse (Glitch) Generator ..............4-2
Figure 4.2 - Synchronous Pulse Generator ..................................................4-3
Figure 4.3 - Synchronous Pulse Generator Timing Diagram ......................4-3
Figure 4.4 - Glitch Behavior ........................................................................4-4
Figure 4.5 - Glitch Generating Mux .............................................................4-4
Figure 4.6 - Timing Diagram of Mux Output Glitch ....................................4-5
Figure 4.7 - The Glitch Doesn’t Matter Here ..............................................4-6
Figure 4.8 - A Gated Clock ..........................................................................4-7
Figure 4.9 - Controlling Data Instead of Clock is Safer ..............................4-8
Figure 4.10 - A Race Condition ....................................................................4-9
Figure 4.11 - Waveforms for Figure 4.10 ....................................................4-9
Figure 6.12 - Input Test Patterns Using NRZ Waveforms Only ...................6-17
Figure 6.13 - Adding RZ to NRZ Reduces the Number of Patterns ..............6-17
Figure 6.14 - Simulation Inputs Replicated by Timing Generators .............6-18
Figure 6.15 - Simulation Inputs that Can’t be Replicated by Timing
Generators ...............................................................................6-19
Figure 6.16 - Tester Tolerance .....................................................................6-21
Figure 7.1 - The Vil/Vih NAND Tree ............................................................7-5
Figure 7.2 - Example NAND Tree Input Patterns ........................................7-7
Figure 7.3 - A NAND Tree with Bidirects ....................................................7-8
Figure 7.4 - Test Patterns for NAND Tree with Bidirects ............................7-9
Figure 7.5 - Using Output Strobe #2 ............................................................7-11
Figure 7.6 - Long Path Delay, Dominated by Chip Circuitry ......................7-12
Figure 7.7 - Short Path Delay, Dominated by Tester Capacitance .............7-12
Figure 7.8 - NOR Gate Output Edge Asymmetry .........................................7-13
Figure 7.9 - NOR Output Asymmetry Generating a Spike/Glitch ................7-14
Figure 7.10 - Avoiding the Glitch .................................................................7-15
Figure 7.11 - Avoiding Bus Contention ........................................................7-16
List of Tables
Table 1.1 - Technology Comparisons ...........................................................1-11
Preface
Preface
This book has been written to serve as a general guide for the ASIC
design engineer. It covers a number of important fundamentals related
to design methodology relevant to application specific integrated
circuits (ASICs). Although different silicon technologies are
mentioned, this book does not focus on any one technology type.
1
What is an ASIC?
CHAPTER 1 - SECTION 1
Uses of ASICs
To save chip area, ASIC technology integrates the logic and much of
the memory formerly distributed among multiple ICs, thus improving
reliability, optimizing PC Board space, and reducing component
costs. In addition, the higher integration and smaller size results in
significantly better system performance. ASICs were originally used
solely to replace or consolidate TTL “glue” logic and consisted of
relatively low complexity logic. Improvements in design tools and
implementation software, in process technology, and in large pin
count packages, now integrate much more of the logic formerly
The two most popular types of mask programmable ASICs that are
being used in a great many System on-a-Chip designs today are:
- Array-Based -Cell-Based
Array-Based ASICs
Cell-Based ASICs
CHAPTER 1 - SECTION 3
The Characteristics of ASICs
The remarks that follow further discuss some trade-offs of ASICs
with respect to the following categories:
Complexity
Silicon Efficiency
NRE
Inventory Risks
Design Risks
The evolution of EDA tools over the last 20 years has been mind
boggling! In the 70s, vendors were struggling with the idea of
automated layout. You may have seen things like high speed color
plotters, digitizing tables, light tables for checking the electronic
version against the hand drawings against the plots, and so on. And
of course, companies were “cutting rubies” to send to mask
vendors. Well, most of the layout tasks have been automated with
very sophisticated place and route software, automated checking
tools, and automated flows between the silicon manufacturers and
the mask vendors.
The difficulties today are that the design sizes are becoming so
large that the run times for many EDA tools is becoming
prohibitive. This is spurring interest in hierarchical EDA tools.
Performance
CHAPTER 1 - SECTION 4
Array-Based vs. Cell-Based Trade-offs
As a short comparison of Array-Based and Cell-Based
Technologies, we can note the following:
expensive mask sets, longer total design cycles, and much higher
NREs. LSI Logic and our customers are forced to consider only
higher volume designs to offset the additional NRE, so the higher
NRE of a cell-based design is no longer an impediment when
compared to the total cost of the project. Also, the longer cycle
times mean that the turn-around-time advantage of a Gate Array is
no longer very large compared to the total design cycle. The higher
volume of the design means that it is imperative to make the die
size as small possible; the die area penalty of a Gate Array is no
longer acceptable. All these factors have led LSI Logic to abandon
the Gate Array concept upon which the company was founded.
CHAPTER 1 - SECTION 5
Standard Product ASIC Products (ASSPs)
One would think that many of the Standard Product companies
would have a big edge over ASIC companies in the area of ASSPs.
And the reason would be of course because of the wealth of chips
already designed and characterized. It should be just a matter of
taking off the chip I/Os and voila - an instant Core! Unfortunately,
it's not as easy as all that. Most of the standard products that were
designed over the last 20 years were designed as hand crafted chips.
In some cases every single transistor was individually designed.
And to go along with this “customization”, trade-offs were made
that are now unable to be overcome. For example, many of the
microprocessor chips that are available were designed over a 2 to 3
year period. But as they were designed, logic models or simulation
models were not built along with them. So although the transistors
and subcircuits were run through SPICE for circuit analysis, there is
basically no way today to provide a model of these standard
microprocessors to the Verilog and VHDL simulators. These
standard products were also hand laid out in many cases. So no
“auto place& route” topocell was built.
CHAPTER 1 - SECTION 6
Embedded Arrays
We have covered the fundamental features and benefits of Array-
Based technology and Cell-Based technology, and we have
introduced a new type of integrated product called the Application
Specific Standard Product or ASSP. There is one other design
concept that has existed in the ASIC world - it is called the
Embedded Array approach. Although this type of product is
different than either Array-Based or Cell-Based, it is not totally
different than either one of these two types of ASIC. As a matter of
fact, it exists as a marriage of the two technologies.
So the trade-offs are that you give up some of the random logic
performance vs. that available in a 100% Cell-Based design so that
you can gain turnaround time and reduce overall costs. These trade-
offs can offer an attractive alternative to designers that need the
capability of getting their designs into silicon fast, but who also
need the capacity offered by high density memories and core
technology. However, it is also important to note that as the volume
goes up and the performance requirements get tougher, the
Embedded Array will again fall short of the advantage offered by
Cell-Based ASIC technologies.
CHAPTER 1 - SECTION 7
Summary
Now that we have introduced the major categories of ASICs, we
need to cover some of the methodology required to design a circuit
in the ASIC world. The following Chapters cover various pieces of
ASIC methodology starting with “Basic Building Blocks” in
Chapter 2.
Figure 1.1
2
Design Trade-offs
- I/O Cells
- ESD Cells
Of course, these days most of the issues presented here are taken
care of by the design tools, but it’s still good for the designer to be
aware of what’s going on. Rather than suggest hard and fast rules
for every design, the discussions in this chapter are meant more to
make designers aware of the pros and cons of choosing a particular
course of action.
CHAPTER 2 - SECTION 1
Use NAND Gates Rather than NOR Gates
Because the mobility of the electrons in N-channel transistors is
approximately twice as fast as the mobility of the holes in P-
channel transistors, N-channel transistors exhibit about one-half of
the ON channel resistance of P-channel transistors of the same size.
Figure 2.1 shows that the P-channel transistors of the NAND gate
are in parallel. The P-channel transistors of the NOR gate are in
series.
B A B A
“NAND” “NOR”
Figure 2.1
Comparison of 2-input NAND and NOR Gates
Vdd
time Vss
Both Transistors
ON!
Figure 2.2
Crowbar Current
CHAPTER 2 - SECTION 2
Proper use of Buffers
Buffers are required when a signal is routed either to a large number
of destinations, or to a destination far away, or both. In the case of
the far away destinations, the long metal run will add significant
capacitance to the path parasitic model. In the case of the high
fanout, the sum of the input capacitances will also cause a large
capacitive parasitic component. In either case, there are several
techniques which could be used to effectively "buffer" the
capacitive fanout, each with its own advantages and drawbacks.
Long wires
High fanout
Figure 2.3
I/O Slot Used as Internal Buffer
In the LSI Logic G12 technology, the I/O slot buffer has been
replaced by a dedicated “megacell” having large geometry
transistors, but located within the core. You don’t lose an I/O slot,
but you still need to give the buffer extra power and ground routing,
which in many cases will limit its location to the periphery of the
die, so Figure 2.3 would still be more or less accurate.
Since most ASIC design toolsets used today rely on feedback from
the layout tools to some degree, the decision as to what type of
buffer scheme to use to resolve a delay or ramptime problem, may
in fact be made by the layout software given the correct user input.
For example, using a series of buffers along a wire (referred to as
repeater buffers) might be the overall best solution for driving a
fanout which is distributed all across the chip. The wire will be long
simply because the destination locations are all over the chip. But
driving a long wire from a single source, no matter how big the
driver is, will not be the most elegant solution. The concept of
repeater buffers will allow a signal to drive destinations across the
chip, and it will guarantee that no single piece of the wire in the
path is larger than a certain value. This means that none of the
repeater buffers in the path is likely to "see" a ramptime problem. It
also means that with a smaller capacitive load due to shorter wires
and smaller component fanout on any segment of the wire, the
repeater buffers can be smaller cells. This may mean that internal
cells are OK to use where a larger fanout may have even called for
the use of an I/O buffer slot. This tradeoff means that a designer
could save area and power dissipation by using repeater buffers.
The key here is that since the repeater buffers would be distributed
along the wire, they must have x-y coordinate information attached
to them. This means that cell placement information is necessary
for a tool or a human to be able to make this type of decision. See
Figure 2.4 for an example of the use of repeater buffers.
Module A
A Die
Repeater
Wire Segment Buffers
lengths Do Not
exceed the
technology
maximum lengths Module B
which helps eliminate B
ramptime problems
Figure 2.4
Repeater Buffers
Many of the synthesis tools used today have a feature called "IPO”
or in-place optimization. The basic mechanism here is that the
software can choose a larger or smaller buffer based on fanout. The
difficulty is that the fanout is made up of both the destination
capacitance of the load plus the wire capacitance. But the wire
capacitance won't be known until the layout is complete.
Unfortunately, given the long run times associated with full layouts
for designs of today's size, engineers cannot always wait until the
layout is finished to make many of the decisions that will affect
design performance. The better ASIC toolsets provide designers
with an intermediate step to get layout type of feedback without
actually doing full layout. This step involves a chip floorplan. The
floorplan not only allows designers a means of refining wire delay
information, it can be tuned to a stage where cell placement
information is very close to that of the final layout. The only step
which is not performed is the actual wire routing. But given a good
cell placement, the details of the actual wire length can be
extrapolated with a fairly high degree of accuracy. This provides the
designer with the feedback in the form of a custom wireload model
which in many cases is close enough to the final routed wire model
that a designer can begin to make performance decisions without
actually routing the design! Then, tools like the IPO algorithm can
decide whether or not to adjust the size of buffers needed to drive
specific high fanout lines.
BUF1
A B
B
A
High Fanout of
Fanout BUF2 BUF1
BUF1
for BUF1
C Fanout C
of BUF2
Original Split Fanout
Die Die
Figure 2.5
Split Fanout
CHAPTER 2 - SECTION 3
On Chip memory vs Off chip memory
ASIC technologies were originally designed to replace
combinatorial logic which was spread out over many small TTL
chips. In many of the very early systems, memory was not a big
consideration.
Today almost every single chip design includes some type and
amount of memory. So when does it make sense to put memory on
an ASIC chip? The basic decision is based on the criticality of the
design's performance. If it is crucial that the memory be right next
to the logic it interfaces with, then the ideal situation is to have both
pieces of the design on the same IC. But ASIC technologies do not
usually support the very high memory densities one might find in
DRAM chips. Even though the density of ASIC memories is
growing very fast, it will still never quite catch the density that can
be attained in the most dense memory technologies. One of the
1. the memories usually have a fixed size and aspect ratio based
on configuration;
RAM Corner
Edge
RAM 2
I/O Regions
Bad Placement Acceptable RAM
of RAM Placement
Figure 2.6
A Bad Place for a Memory
CHAPTER 2 - SECTION 4
The Use of a PLL
In the best of all worlds, all chip designs would be totally
synchronous and only have one clock signal. Alas, we live in the
real world. And unfortunately, even those designs that appear to be
synchronous can still suffer clock problems. Just about any scheme
But the serial delay issue can cause a problem in a system where
multiple chips need to be synchronized. So in effect, we need a way
to synchronize the clock that reaches the flip flops on chip with the
incoming clock. A very effective way to do this is to use a Phase
Locked Loop module. This can basically guarantee that a designer
can adjust the phase of the clock on one chip so that the overall
system is synchronous. The PLL cell itself is relatively small, but it
introduces an annoying issue into the design: analog circuitry! This
requires that analog power and ground busses be added into the
design and brought out to chip pins properly so that they do not
interact with the digital power and ground busses on chip.
In general, the PLL will be able to adjust for just about any amount
of serial delay that could result from long wires and clock trees, but
the extra work required to handle the PLL cell correctly adds one
more item to a designers list of things to do.
CHAPTER 2 - SECTION 5
Hard Coded vs. Soft Coded Logic
Hard coded logic refers to logic that has been pre-layed out before
the actual place and route of the design itself begins. Some types of
logic almost require hard coded layouts just due to their nature.
Entities like memories are always hard coded simply because of
their regular structure which cries out for a fixed layout for all of the
bit cells due to their unique relationship to each other. Also, the
control signals for the memory are of such a regular nature that it
would be terribly inefficient to route them if the memory bit cells
were layed out in a random fashion.
All designs will have many signals that travel some distance across
the chip in order to reach all of their destination points. Given that
ASIC chips have regular orthogonal “tracks” for routing wires,
there are only so many routing tracks available on a given die size.
So it's important to make sure that routing "space" is used wisely.
When signals have to take detours around blockages, the bends or
corners actually use up more die area than a straight piece of metal
would, especially since changing direction usually means changing
routing layers. So if possible it would be desirable not to have to
bend any signal wires at all. Of course this is not possible, so the
goal is to try and minimize the overall number of bends in wires as
they route to and from their points of contact. The more hard coded
logic on the die, the more detours signal wires are likely to have to
take. This tends to cause longer wires, larger die sizes, and can
create congestion in areas where it would not have existed. So the
use of hard coded logic should be minimized.
Here are some cases where hard coded logic is actually required:
- memories
- special Cores
- macrocells
- critical paths
- datapaths
You may have a need for hard coded cells for reasons unique to
your design, but it is best if you can minimize the needs for cells
with fixed layouts.
CHAPTER 2 - SECTION 6
Cores and Megacells
LSI Logic has created a library of large system level building
blocks which implement many industry standard functions. For
some detailed information about LSI Logic’s CoreWare® modules,
visit the LSI Logic WEB page and specifically the section which
describes the CoreWare program. It is located at:
http://www.lsilogic.com/products/coreware/coreware.html#library
The types of cores that are generally available are functions that
have become very popular as building blocks for large systems. The
ATM core is very popular because many systems today deal with
asynchronous data which has to be moved from one place to
another. The JPEG and MPEG Cores allow designers to use
standard logic to handle still pictures and moving pictures without
having to design logic from scratch to implement these standards.
Many of the Cores handle bus interface issues for systems that have
to communicate with standard bus protocols. LSI provides Cores
for the PCI interface, the Fibre Channel interface, the USB
interface, the Ethernet interface, and others. In fact, as new and
more popular interfaces come along, LSI will introduce Cores to
allow designers to incorporate these standard protocols into their
system level chips.
More and more Mixed Signal applications are in vogue today. And
LSI provides the most obvious Cores for the Mixed Signal
interfaces: the A/D and D/A converters. And many of the new I/O
cells that have been designed and are being designed, address the
area of Mixed Signal interface.
Much work has been done at LSI Logic in the area of DSP and
video compression. So in addition to Cores such as JPEG and
MPEG, you will find video compression chips, error detection and
correction chips, and new coding chips to allow for even denser
compression algorithms.
But it doesn't make sense to design these large Cores from scratch.
LSI provides these Cores as part of our growing Coreware library.
They are meant to be used in many cases much like designers use
basic low level cells - as is. In other words, the Core itself is already
built as a cell. In many cases it may include a hard coded topocell
which can be used by designers as a black box through the
synthesis process.
CHAPTER 2 - SECTION 7
Some Limitations of Synthesis Tools
Synthesis tools have improved dramatically over the last few years.
In fact, the improvements almost make one wonder if there is still a
need for engineers to make critical logic decisions! Well, as it
happens, even the best synthesis tools cannot know everything, and
some of the things that they don't know CAN HURT YOU!
Generally speaking, the two most well known synthesis tools used
by engineers are perhaps the Synopsys Design Compiler tool and
the Cadence Build Gates tool. And these vendors advertise that
these tools can synthesize up to about 200,000 gates in one run. No
doubt the tools actually can do this, but the question remains: "is it
wise to synthesize such a large piece of logic in one run"? LSI
Logic recommends that designers synthesize blocks to the 30K -
50K gate range in one run. When the blocks get much larger than
this, they may be creating logic which will occupy large sections of
the die. If these synthesized areas are too big, then guiding the
synthesis process with useful custom wireload information can be
difficult.
CHAPTER 2 - SECTION 8
I/O Cells
Ten years ago, I/O cells consisted basically of input buffers, output
buffers of various current drive capabilities, and three-statable and
Bidirectional buffers. Today, the numbers of choices has grown
beyond our ability to memorize all of the types or names. We now
have mixed signal I/O with differential signal capability. We have
level translators to go from one voltage level to another on chip. We
have specialized GTL and NTL buffers, PCI buffers, the RAMBUS
interface I/O, very high speed cells, LVTTL cells, LVDS cells, and
on and on.
CHAPTER 2 - SECTION 9
ESD Cells
It has always been true that CMOS ICs in particular have needed
special protection in the I/O area to prevent electrostatic damage to
the very thin gate oxide regions of the die. As chips have gotten
bigger, the types of I/O cells have gotten more varied, mixed
voltage buses now share regions over the I/O cells, and design
speeds have increased causing more simultaneous switching, and as
package materials and styles have become more diverse, the ESD
protection circuitry has had to evolve also to keep pace with a much
larger variety of options at the periphery of the die. LSI Logic
currently has two different types of ESD cells located in the I/O
region of the chip. These cells provide protection for both the I/O
and Core regions of the chip. Since ESD damage can occur when
the chip is powered down sitting on the storage shelf, it is even
necessary to protect against ESD damage between two ground pins
which are normally assumed to both be at 0 volts! One would not
intuitively know that protection circuitry was needed between two
different ground pins on the package (VSS and VSS2). But of
course when we consider that in a power down situation there may
be sneak paths through oxides when an ESD voltage builds up
between two ground pins, then we have to design circuitry to
protect against this case. And indeed, we do keep many signals such
as VSS and VSS2 separate on the die for good reasons. But when
we do this, it introduces the potential for damage that is non
obvious.
LSI Logic has designed two different types of ESD cells for the two
major regions of the die - the core and the I/O area. And of course
there are many variations of the I/O cell to accommodate the many
types of I/O cells which could all possibly be tied to their own cut
section of power bus.
The ESD cells occupy space and must be planned for in an ASIC
design because they add necessary reliability to the chip. In this
sense, they act as another type of I/O cell, since they occupy space
around the edge of the die, and one must obey certain rules about
spacing between ESD cells and to local power and ground pads.
There must be enough of them on the die to prevent parasitic
resistance from nulling out their effectiveness.
3
Design for Speed
CHAPTER 3 - SECTION 1
Reduce the Effect of Wire Delays
In current deep-submicron technologies, delays resulting from
wirelengths have considerably greater relative effect on effective gate
delay than loading due to other cells. Thus it is imperative that you: 1)
properly floorplan your chip, 2) optimize drive strengths by using
high-drive cells when necessary, and 3) partition your logic in such a
way as to minimize the length of interconnect between cells.
The total loading capacitance is the sum of the output capacitance, the
wire capacitance, and the sum of input capacitances of the driven
functions. In both case (a) and case (b) in Figure 3.1, the sum of the
input loads is 12; but CW2 in case (b) represents the wire capacitance
to 12 places. (The term “input load” does not mean input capacitance,
IL=6
COUT CIN
IL=6
CW1
(a) CIN
IL=1
12 /
COUT CIN
IL=1
CW2
CIN
(b)
Figure 3.1
Capacitance Due to Cells vs. Capacitance Due to Wires
CHAPTER 3 - SECTION 2
Use complex cells where appropriate
There is no such thing as a non-inverting CMOS logic gate. There
are only double inverting logic gates, and a few four or six stage
gates. There are, however, complex cells which perform multiple
levels of logic gating with only one inversion. With so much delay
in the wires, they can be faster than interconnected gates, but their
size, compactness, and high pin count can cause routing congestion.
CHAPTER 3 - SECTION 3
Feed Late-Occurring Signals into the Last Logic Levels
Place early-occurring signals in the first levels of your logic and
late-occurring signals in the last levels, as shown in Figure 3.2. This
allows the early-occurring signals time enough to set up on the last
stage. The later signal needs to propagate through only one low
fanin gate.
Early A ABC
occurring B
signal C ABC+D+E=ABCDE
D
E
ABCDEF
Late
occurring F
signal
Figure 3.2
Comparison of 2-input NAND and NOR Gates
A ABCD
B ND4
C ABCD+EFG=ABCDEFG
D
ND8 E ABCDEFGH
F ND3
EFG
G
H
Figure 3.3
Comparison of 2-input NAND and NOR Gates
CHAPTER 3 - SECTION 4
Use Shift Counters in Place of Binary Counters
A Johnson counter is simply a shift register with the QN output of
the last stage connected to the data pin of the first stage. Johnson
counters are fast because there is no gating between the flip-flops in
the counter. However, a Johnson counter will use more flip-flops
than a binary counter. An n-stage Johnson counter gives a count
sequence of 2 x n counts. Thus, for a modulo eight counter, a three-
stage Johnson counter won’t be adequate and a fourth stage is
needed. Thus, the trade-off is speed versus gate count.
Qa Qb Qc
D Q D Q D Q
FD2 FD2 FD2
QN QN QN
CLEAR
CLK
Qa Qb Qc
0 0 0
1 0 0
1 1 0
1 1 1
0 1 1
0 0 1
Figure 3.4
3-stage Johnson Counter with Count Sequence
CHAPTER 3 - SECTION 5
Use Duplicate Logic to Reduce Fanout and Increase Speed
You can reduce fanout and increase the speed of your circuit by
allowing one of the outputs of the redundant logic to drive the
critical path while the other output drives the non-critical path, as
shown in Figure 3.5. Use this technique only if the timing skew
introduced by separating the two paths will not create race
conditions in the circuit.
D Q Fanout=10
speed critical
ND3 Fanout=10 FD2
speed critical
D Q Fanout=9
non-speed
ND3 Fanout=9 critical
non-speed FD2
critical
ND3 Fanout=1
speed critical Fanout=1
D Q
speed critical
FD2
Gates
Flip-Flops
Figure 3.5
Duplicate Logic to Increase Speed
CHAPTER 3 - SECTION 6
Use Input of Floorplanning
As newer technologies emerge, component speeds get faster and
faster. This is good. However, die sizes are getting larger and larger
and the average length of any given wire in a design can potentially
be longer. Smaller geometries carry with them increased channel
resistance which tends to reduce charge current. In addition to this,
power supply values are dropping quickly. After almost 30 years,
the standard 5.0v volt supply is now giving way to 2.5v, 1.8v, 1.2v,
and yes even 1.0v! Once again this is good in that the power
dissipation can be reduced. However this reduced supply voltage
also has the side effect of reducing available charge current.
The above changes demand that designers take more care in the
layout of their designs. The best way to manage a design layout at
the design level is to be “hands on” involved with the design
floorplan.
4 A
Floorplan with no Floorplan with Cells
Cells Placed Placed
a b
Figure 3.6
Better Wire Models After Cell Placement
4
Designing for Reliability
CHAPTER 4 - SECTION 1
Use Synchronous Design
Asynchronous circuits may cause problems, especially when (spike-
prone) combinational logic is used to clock or reset storage elements.
LSI Logic strongly recommends that your designs by synchronous.
A Z
A
Z
Waveforms
(a) from Glitch
Pulse Generator
(b) Width in (a)
Figure 4.1
Classic Prop-Delay Based Pulse (glitch) Generator
Note that the circuit in Figure 4.1 should not be used in LSI Logic
ASIC designs. The pulse width in the above circuit would appear to
be equal to the cumulative propagation delay of the string of
inverters. However, a distributive-capacitance model of the circuit
reveals that the pulse width on the output of the NAND gate is
equal to the difference in propagation delay between the inverter
string and the wire beneath that inverter string. During pre-layout
simulation, this circuit may perform satisfactorily. However, if the
wire should be longer than expected after layout, the resulting
increase in capacitance will cause the delay in that wire to increase
proportionately. Increased voltage, decreased temperature, and
process variation will decrease the delay of the inverter string. If the
wire delay equals that of the inverter string, no output pulse will be
generated; asynchronous pulse generators such as this can cause
glitches. Use a synchronous pulse generator as shown in Figure 4.2
in place of an asynchronous one.
Z
A
D Q D Q
FD1 FD1
QN
CP
Figure 4.2
Synchronous Pulse Generator
CP
Figure 4.3
Synchronous Pulse Generator Timing Diagram
CHAPTER 4 - SECTION 2
Avoid “Glitch” Generators
An inconsistency in signal timing produces what is conventionally
called a “glitch”. Figure 4.4 shows how a glitch might look
compared to a signal that is behaving predictably.
Expected Behavior
Glitch Behavior
Figure 4.4
Glitch Behavior
D0
G1 Z
SEL
G2
D1
Figure 4.5
Glitch Generating Mux
Figure 4.6 is the timing diagram for the circuit in the next example.
To see how the glitch is generated, assume that inputs D0, SEL, and
D1 are initially high. The output therefore is low. When set makes a
high-to-low transition, neither D0 nor D1 is selected for a period of
time. During this time, the output Z will glitch high regardless of
the states of D0 and D1. The glitch width is equal to the time it
takes G1 in Figure 4.5 to propagate a 1 to its output.
Note that the low-to-high transition of SEL in Figure 4.6 will not
cause a glitch on G2’s output.
HIGH
D0
HIGH
D1
SEL
SEL
Z
GLITCH
Figure 4.6
Timing Diagram of Mux Output Glitch
D0
SEL D Q
FD1
D1
CP QN
CLK
Figure 4.7
The Glitch Doesn’t Matter Here
CHAPTER 4 - SECTION 3
Use Care with Gated Clocks
Designers must use great care in the use of gated clocks since
improper timing of the gated signal can generate a false clock,
causing the flip-flop to clock in the wrong data. The circuit shown
in Figure 4.8 has a gated clock signal.
DATA
D Q
FD1
GATE
CLK CP
Figure 4.8
A Gated Clock
Again, gated clocks must be used with extreme caution, and require
very restrictive timing tolerances between the clock and the gate
signal. If you are unable to guarantee such tolerances, do not use
gated clocks. Moreover, because of skew between different local
clocks, gated clocks may cause hold-time violations. Interfaces
between local clock signals should be carefully monitored.
GATE
D Q
FD1
DATA
CLK
CP QN
Figure 4.9
Controlling Data Instead of Clock is Safer
CHAPTER 4 - SECTION 4
Avoid Race Conditions
Race conditions occur when the same signal follows two or more
paths having different delays to a common circuit element, or when
two signals caused by the same event (say a clock) follow different
paths to the same circuit element. (Race conditions sometimes
create a condition called “reconvergent fanout.”) Avoid designs
which function efficiently only if there is a predictable interval
between signals traversing different delay paths. This interval is
seldom predictable; pulse widths and delays vary widely due to
variations in layout, processing, power supply, and temperature.
Even where multiple paths have the same number of elements,
variations in fanout and wire delay will cause timing differences in
the different paths. As a result, a circuit that is subject to race
conditions is bound to be unreliable. Examples of race conditions to
be avoided are shown in Figures 4.10 through 4.13.
SD
DATA
D Q
FD4
CLK CP
Figure 4.10
A Race Condition
DATA
CLK
Q Unpredictable
Figure 4.11
Waveforms for Figure 4.10
In Figure 4.10, the Q output sets the flip-flop when a zero is clocked
into the flip-flop, creating the narrow negative pulse on the Q output
as shown in Figure 4.11. The output pulse width depends on the Q
output loading. Under heavy loading, the pulse may set the flip-
flop, which is close to its source, but then be so attenuated by a long
wire that it is invisible to its other fanouts.
Vdd
D Q
FD2
CLK
CD
IV
R
CLK
Figure 4.12
Another Race Condition
Vth(IV) Vth(FF)
CLK
tPD(IV)
Q LOW
Figure 4.13
Waveforms for Figure 4.12
CHAPTER 4 - SECTION 5
Avoid Floating Nodes on Internal Three-State Buses
A floating node occurs when all of the buffers driving a bus are
disabled, as shown in Figure 4.14. In general, floating nodes should
not be used in LSI Logic ASICs. (Note, however, that a node
floating for a period of time equal to the delay of a 2-input NAND
gate with a fanout of four IV gates and two millimeters of wire is
acceptable.) You can avoid floating nodes on internal three-state
buses by adding a buffer to drive the bus when all of the other
buffers are off. (See Figure 4.15.)
S0=1
S1=1
S2=1 “Floating”
Figure 4.14
Avoid Floating Nodes
S0
S1
S2
S0
S1
S2
Figure 4.15
Floating Node Avoided w/ Extra Buffer/Decode
Again, to correct this situation, you can add another buffer to drive
the bus, as shown in Figure 4.15.
The enable signal for the extra buffer in Figure 4.15 is equal to the
NAND function of the three original enable signals. Decoding these
multiple enable signals ensures that at least one buffer drives the
bus at all times. Take care to control the enable signals so as to
prevent bus contention. In some logic schemes, a bus may float for
a brief time, but this should be minimized as much as possible. LSI
Logic does not allow bus contention on our test hardware.
CHAPTER 4 - SECTION 6
Use Bidirectional Output Buffers with Pull-Up or Pull-Down Resistors to
Increase the Noise Margin
Figure 4.16 illustrates the use of bidirectional output buffers with
pull-up or pull-down resistors.
Pull-up and pull-down resistors tend to pull the signal away from
the threshold of the pad in question making noise less likely to
propagate back into a design, and thereby tending to reduce the
possibility of data corruption in storage elements.
Figure 4.16
Bidirects with Pullup and Pulldown
CHAPTER 4 - SECTION 7
Avoid Excessive Fanout
A logic element driving many elements is said to have a high fanout
and drive a heavy load. In an HCMOS circuit there is no theoretical
limit to how heavy a load a gate can drive. If a gate fans out even to
1000 places, the output will eventually reach its final state, albeit
after a very long delay. For practical purposes, the rise or fall or an
internal node ought not to exceed about 1.0 ns under nominal
conditions.
IV
Fanout=FO
VIN
VOUT
FO=50
FO=20
FO=10
VOUT TL TL TL
Time
Figure 4.17
Excessive Power Consumed Due to Excessive Fanout
The graph in Figure 4.17 shows that the ramp rate of a gate
becomes longer as the fanout increases. As the slope becomes
longer, the driven macrocells spend more time (TL) in the active
region, where both N- and P- channel transistors are ON, and
CHAPTER 4 - SECTION 8
Avoid “Dangerous” Decoding
D Q
Qa Qb Qc Qd
CLK
Qa
Qb
Qc
Qd
Figure 4.18
Dangerous Decoding of Terminal Count
D Q
FD2
QN
D Q D Q D Q D Q
QN QN QN QN
CLEAR
CLK
Figure 4.19
Decoding a Johnson Counter
D Q
FD1
Qan Qb Qc Qd
QN
Binary Counter
CLK
Figure 4.20
Alternate Glitch-Free Terminal Count Decode
CHAPTER 4 - SECTION 9
Do Not Use Unbuffered Transmission Gates
Transmission gates have the advantage of higher speeds into light
loads, but they must be used judiciously. Input signals of the
transmission gate are briefly shorted together, and when at least one
of the input signals to a transmission gate multiplexer is the output
of an unbuffered storage element (see Figure 4.21), this may result
in corruption of data in the storage element. All LSI Logic
proprietary flip-flops and latches have buffered inputs and outputs
to prevent such a malfunction. Customer-designed storage devices,
however, may not have this buffering protection. Do not build this
reliability problem into your design. So, if you must use unbuffered
storage elements, do not use transmission gate multiplexers.
Conversely, if you use the transmission gate MUX’s, then use
buffered outputs on storage devices that feed into these MUX’s.
Figure 4.21
Faulty Use of Transmission Gate
CHAPTER 4 - SECTION 10
Avoid Bus Contention
Bus contention occurs when a bus is driven from different sources
at the same time. There are two kinds of bus contention - internal
and external. Internal bus contention can occur on internal three-
state buses. External bus contention occurs when an output buffer is
enabled, and is outputting a logical value that differs from that of
the external driving source (see Figure 4.22, Case I).
VDD VDD
OE1 OE2
Q1 Q3
F
Q2 Q4
A
B
Gate 1 Gate 2
VDD
Gate 3
Figure 4.22
Avoid Bus Contention
In Case I of Figure 4.22, with Gate 1 driving low and Gate 2 driving
high, a DC current path exists through the upper transistor of Gate 2
and the lower transistor of Gate 1 to VSS (ground). Both transistors
handle abnormal current loads, and the voltage at point F is not
well-defined. The output state of Gate 3 is questionable. External
bus contention can be avoided by monitoring three-state control
signals to determine when the external source is allowed to drive
the pin.
CHAPTER 4 - SECTION 11
Buried Buses: Sometimes Yes, Sometimes No
“Buried” buses that route to many places tend to increase parasitic
capacitance, and result in poor floorplans, inefficient chip
utilization, routing problems for surrounding logic, and faulty die
size estimates. Note that bit-slicing datapaths will help minimize
such buses. Keep buses that route to many places at the top levels of
your design hierarchy so they can be easily seen and planned for.
B C
A D
E 128
E1 E2
F G
Figure 4.23
Two Types of Buses
CHAPTER 4 - SECTION 12
Follow I/O Guidelines Correctly
A generic LSI Logic rule is that every VSS pad can support only a
certain number of standard output buffers (4 mA equivalents),
depending on what package you are in. But this rule should not be
misinterpreted. The unacceptable design in Figure 4.24 below
shows 2xN 4 mA equivalents between the two VSS pads. In the
G12 technology, it may not be unrealistic to have one VDD and VSS
pad pair for every six to eight 4mA buffers.
VSS
VSS
Figure 4.24
Too Many Outputs Between Power/Ground Pads
N/2
VSS
N/2
Figure 4.25
Enough Ground for the Outputs
Do not place clock signal pins, reset pins, preset pins, or other
major control signals between high-drive output buffers and VSS
(see Figure 4.26).
4 mA equivalents
VSS
Clock
Figure 4.26
Poor Placement of a Clock
In Figure 4.27 the clock pin is properly placed, with a VSS pin
between the clock pin and the high-drive output buffers, or with the
clock pin between two VSS pads. Note that inputs do not cause
nearly as much noise as outputs because they do not drive huge
loads, so having other inputs near the clock is not a problem.
Clock
Clock
Figure 4.27
Proper Placement of a Clock
CHAPTER 4 - SECTION 13
Power Considerations
In some ASIC designs, minimizing power consumption is a high
priority. In others, it is not. In any case, it is always necessary to
know in advance how much power a chip will consume, since this
affects the operating temperature, which in turn affects:
❑ Speed
❑ Package selection
❑ FIT rate
Speed
CMOS silicon chips slow down as they get hotter. Proper delay
prediction must therefore take junction temperature into account.
And, make no mistake: any temperature specified to any delay
prediction tool is a junction temperature, not an ambient
temperature. Junction and ambient temperature are related by the
equation Tj = (PD)(θ) + Ta, where PD is power in Watts and θ
(Theta) is the thermal resistance of the package in degrees C per
Watt. Since speed is a function of Tj rather than Ta, and the
program does not know Power or Theta, the temperature cannot be
ambient.
Package Selection
FIT rate
Contributing factors
‘1’ ‘0’
ON OFF
1/2 OFF/ON
Switching C
No Current No Current
Crowbar fcv2
Figure 4.28
CMOS Power Consumption
Conserving power
only gates clocks, but also ensures that scan testability issues are
addressed.
Power distribution
When asked to name the most important interconnection on a chip,
most logic designers will immediately answer, “clock.” Actually,
the answer is power and ground. Logic designers tend to take power
and ground for granted; it doesn't even show up on most
schematics. However, when routing an ASIC, power and ground
come first.
Power isolation
All LSI Logic ASICs are tested for leakage current, both for
reliability reasons (a chip with elevated leakage current will fail
after fewer hours of operation compared to a chip with normal
leakage), and as a means of defect detection. A typical leakage
5
Designing for Testability
CHAPTER 5 - SECTION 1
Testing Redundant Logic
Redundant logic, such as that shown in Figure 5.1, may be used to
enhance reliability. However, unless the redundant logic can be tested
uniquely, the intended enhancement may prove illusory. Figure 5.2
shows a testable version of this circuit.
Logic A
IN OUT
Logic B
Figure 5.1
Untestable Redundant Logic
TEST A
Logic A
IN OUT
TEST B Logic B
Figure 5.2
Testable Redundant Logic
CHAPTER 5 - SECTION 2
Use Combinational Logic
Combinational logic is easier to test than sequential logic. in
combinational logic, the outputs are always a function of the
current inputs. In sequential logic, the outputs depend on both the
current inputs and the outputs of storage elements driven by inputs
preceding the current ones. For example, in the multiplexer in
Figure 5.3, all possible states can be exercised by presenting
permutations of 1’s and 0’s on the input pins. The test vectors will
not exceed 29 power (512), and will cover all possible detectable
stuck-at faults.
AO2
D0
D1
AO2
D2
D3
AO2
Z
AO2
D4
D5
A
B
C
Figure 5.3
Combinational Logic is Easy to Test
CHAPTER 5 - SECTION 3
Add Test Lines to Sequential Logic
Sequential logic is more difficult to test than combinational logic
because it contains storage elements. You can make sequential logic
more testable by adding test lines to logically break feedback loops
between flip-flops.
00
10 01
A B
D Q D Q
11
FD1 FD1
QN QN
CLOCK
DATA
OUTPUT
OUTPUT when
initial A, B=00 1 1 0 1 0 0 1 0 0 1 0 0 1 repeat 001
OUTPUT when
initial A, B=01 0 1 0 1 0 0 1 0 0 1 0 0 1 repeat 001
OUTPUT when
initial A, B=10 1 0 1 0 0 1 0 0 1 0 0 1 0 repeat 010
OUTPUT when
initial A, B=11 0 0 0 0 1 0 0 1 0 0 1 0 0 repeat 100
Figure 5.4
Untestable Sequential Logic
Figure 5.5 shows how adding a test line makes the same circuit
testable. The test line logically breaks the feedback loop. Holding
the test line low can produce a known output after a maximum of
two clock cycles. Then the test line must be brought high. An input
test vector then produces a unique output vector. The test line need
not be a separate external pin on the package; you can derive it from
an unused state of some other combinational logic on the chip. (If
this approach is used, however, take care that your test signal is
stable before you begin your test.) Note that if, in using this
approach, your logic should ever pass through the unused state, the
chip may be put into the test state. Design your chip in such a way
that it cannot be put into test state by accident.
ND3 D Q D Q
TEST
FD1 FD1
QN QN
CLOCK IV ND2
DATA
ND2
OUTPUT
ND2
Figure 5.5
Testability Added by “Test” Input (Breaking the Loop)
CHAPTER 5 - SECTION 4
Use Reset Signals to Initialize Storage Devices
Adding reset signals is one of the simplest and most effective ways
to improve the testability of circuits that contain storage elements.
The penalty is the additional wiring the buffering needed to drive
the fanout. Storage elements require a testing strategy that puts
them into known states. You can initialize storage elements by
clocking in known input data, but by adding a reset signal you can
reduce the number of initialization test vectors.
Since many subsystems (like the one shown in Figure 5.6) use reset
signals in their normal operation, a reset signal usually involves
little or no additional logic. The reset signal can be used to reset all
elements whose state cannot be readily initialized by other means.
For example, the divide-by-three circuit from Figure 5.4 (clock/3)
at the bottom of Figure 5.6 uses the reset signal from the shift
register above it.
DATA OUTPUT
Logic D Q D Q D Q D Q Logic
RESET
CLOCK
D Q D Q
FD1 FD1
Divide by three
Figure 5.6
Use Reset Lines to Initialize Storage Elements
CHAPTER 5 - SECTION 5
Make Logic Observable by Multiplexing Output Pins
In Figure 5.7a, the state of the three FD1’s can be observed by
placing a 3-to-1 multiplexer in front of the output buffer, as shown
in Figure 5.7b.
11 11
D Q D Q
FD1 FD1
12 12
3-to-1
D Q D Q MUX
FD1 FD1
13 13
D Q D Q
FD1 FD1
S0 S1
(a) (b)
Figure 5.7
Multiplex Output Pins
CHAPTER 5 - SECTION 6
Decode Input Pins
You can make deeply buried states (e.g., internal registers)
controllable for testing by decoding input pins, as illustrated in
Figure 5.8. You can monitor internal registers by connecting them
to package-pin test points. To increase the number of effective
s0 s0 11
D Q D Q
11
IBUF FD1 S0/D3 FD1
s1 s1 12
D Q D Q
12
IBUF FD1 S1/D4 FD1
clk
IBUF
Internal 2:1 13
D Q D Q
Internal 13 signal MUX
Signal FD1 FD1
Internal 14
D Q 2:1 D Q
Internal 14 signal MUX
Signal FD1 FD1
TEST
CLOCK
Figure 5.8
Decode Input Pins
CHAPTER 5 - SECTION 7
Break Up Long Sequential Chains
Long counters, although they may operate satisfactorily in circuit
systems, can add significantly to the number of test vectors needed
for proper testing. For example, the 16-state counter shown in
Figure 5.9 would require 216 power (65,536) test vectors merely to
apply a single clock to the control circuitry. Assuming the control
circuitry has 212 possible states for thorough testing, about 2.7 x
108 test vectors would be needed to test the circuit completely.
System inputs
100MHz
input Low
clock speed
16-Stage controller
prescaler -212
states
System outputs
Figure 5.9
Long Counter is Difficult to Test
A02 A02
Input 8-Stage 8-Stage
clock prescaler prescaler
System
inputs
System
outputs
Figure 5.10
Two Short Counters are Easier to Test
CHAPTER 5 - SECTION 8
Use Level-Sensitive Design
In large designs, unavoidable skew along a clock signal may make
edge-triggered flip-flops unreliable due to hold-time violations. One
way to avoid this problem is to use a level-sensitive design
technique. In Figure 5.11, ENA and ENB are non-overlapping,
active-high enable signals. Signals are transferred into one level of
the design using the ENA signal and into alternative levels using the
ENB signal. In this way, the flow of data through the design can be
staged and very well controlled.
Combinational logic
G G G
Figure 5.11
Level Sensitive Design
LD1
LD1
D Q
D Q AND2
A
Z G
ENA G B
ENB
Figure 5.12
Safely Gated Enable Line
CHAPTER 5 - SECTION 9
Partition Logic into Blocks
A large design may contain deeply buried functional blocks which
are both difficult to control and to observe. In such a case, it may be
desirable to divide the design into easily testable blocks. These
blocks may be made functionally transparent while the design is in
normal operational mode. With only a slight increase in delays
between blocks. Figure 5.13 and 5.14 show how this can be done.
C1
D2
C2
Block C3 Q2 Block
A B
LS2
D1 Q1
C1
D2
C2
C1
C2 C3 Q2
C3
To next latch
Figure 5.13
Block Level Boundary Scan
LSI
Block A D1 Q Block B
D2
MUX21H
A
Z
Normal B
function Normal
S
output
Normal Normal
input function
TEST
Figure 5.14
Block Partitioning Using Muxes
CHAPTER 5 - SECTION 10
Do Not Design Your Own Storage Devices
Timing Considerations
(2)
C
(1)
E
B
F
A
D
(3)
(4)
Figure 5.15
Customer-Designed Storage Device
Logic Considerations
U1
U2
A
Combo data B
Figure 5.16
Dangerous Use of a Transmission Gate Mux
CHAPTER 5 - SECTION 11
Use a Serial Scan Approach
Standard scan design uses a one-phase clock. Place a 2-to-1
multiplexer in front of each register, as shown in Figure 5.17.
In this case, each flip-flop receives its data input either from its
normal path or from the Q output of the previous flip-flop in the
scan chain, depending on the state of the test pin. The top flip-flop
is an exception; its input comes from a test input pin. The bottom
flip-flop’s Q output goes both to its normal destination and also to a
test output pin.
TESTIN IBUF
IBUF
MUX
D Q
TEST
FD1
MUX
D Q
Combinational FD1
Inputs IBUF logic
B4 TSTOUT
MUX
D Q
FD1
CLOCK IBUF
Figure 5.17
Scan Design
CHAPTER 5 - SECTION 12
Multiplex Output Pins to Make Scan Chain Observable
Pin requirements need not be increased for circuits using scan
design since scan outputs can be multiplexed with regular output
signals, as in Figure 5.18. Do not, however, multiplex critical path
outputs.
IBUF Shift
data in FD1S
MUX
D Q
FD1
CP
MUX
Large D Q
IBUF combo
FD1
logic
CP
IBUF
MUX
Logic D Q
MUX
FD1
TEST CP
IBUF
IBUF B4
Figure 5.18
Sharing the Scan Output
Note that several scan flip-flops are available in LSI Logic libraries,
(The circuitry enclosed within a dotted line in Figure 5.18 and
identified as an FD1S is such a flip-flop.) Each scan flip-flop has a
test input (TI) and test enable (TE) input in addition to its normal
inputs. Scan test flip-flops have an “S” suffix on their macrocell
names.
CHAPTER 5 - SECTION 13
Implementing the Vil/Vih (NAND) Tree
The LSI Logic NAND tree is a dedicated circuit for testing each
input buffer's Vil/Vih. It is formed from a special NAND gate
embedded within each input or bidirect buffer. One input of the
NAND gate is connected to the output of the buffer, and the other is
given the port name PI. The output of the NAND gate is referred to
as PO. This means that, for a non- inverting input buffer, the PI pin
is active when the input signal is at 1 and inactive when the input
signal is at 0. One unidirectional output buffer is chosen as the
NAND tree output. This buffer is driven, through a MUX if desired
soas to avoid having a dedicated NAND tree output pin, by the
nearest PO pin. The corresponding PI pin is then connected another
PO pin, and the process is repeated until all PO pins are connected.
This leaves one unconnected PI pin, which is tied to logic 1. The
result is a daisy-chain of NAND gates, all leading to a single output
buffer. There is now a purely combinational path from each input
buffer to a single output buffer.
The Vil/Vih test is performed with each input at its own Vil-max or
Vih-min. This leaves no noise margin at all. Any storage element
within the circuit would be susceptible to the first noise spike that
came along.
Normal Output1
Normal Circuit
Input Buffer
Input Buffer
Input Buffer
Procmon
Input Buffer TESTOUT output pin
1. All outputs, except TESTOUT, are masked during the DC parametric tests.
Figure 5.19
NAND Tree Structure
A C
E
D
VDD
B TO
Figure 5.20
NAND Tree Sequence
The test must also take into account special cases such as Schmitt
triggers (where direction and sequence are important; they have to
be toggled both ways), and other special buffer types such as PCI.
Figure 5.21
NAND Tree Test
Subtleties
The NAND tree has some surprising subtleties for something that
seems at first glance to be so simple. These subtleties arise not from
the NAND tree itself, but from the input buffers which drive it. The
two most obvious problems are the bidirects and the MUX. How
can we have a purely combinational circuit whose output is a
function of all the inputs, and still guarantee that the bidirects will
not suddenly go from input mode to output mode? After all, if we're
testing the voltage response of the input half of the bidirect, we
can't very well have it slip into output mode. But one of the other
inputs that we need to toggle would have to be holding the bidirects
in input mode, and if we toggle that input to test it, we risk
switching the bidirects to output mode. What must happen is that
the global three-state enable pin must appear in the NAND tree
closer to the output than any bidirect buffer, and it's sense must be
such that the bidirects are forced to input mode when the PI pin is
active.
input buffers.
ICPTNU
D Q
QN
CD
INPUT_B PROC_DRV
N E
INPUT_A Z
A
PROCMON
S
RESET
Figure 5.22
A Complete NAND Tree Circuit
The PROC_DRV
The IIDDTN
The PROCMON2A
Yet another wrinkle in the NAND tree saga is the fact that the G12
technology has two different kinds of transistors, High Performance
(HP) and Low Leakage (LL). Thus the need for two different kinds
of PROCMONs and, for designs that contain both kinds of
transistors, a third, called PROCMON2A, that contains both of the
other two, along with an extra input (SEL) to choose between them.
In non-pulse mode, the path from the A pin through the low leakage
transistors is non-inverting, to differentiate it from the high
performance path and ensure that the input buffer driving the SEL
pin can cause a transition at the output and thus be tested for Vil/
Vih. Since the state of the SEL pin is only important during test
mode (presumably reset, as suggested earlier), the choice of which
input to hook to the SEL pin is not important but, as before, it
should be the PO pin from the input buffer, with the PI pin tied to
logic one.
lsints
Since there are always people who want to make extra work for
themselves, not to mention the occasional redesign, the program
has a “pattern_only” mode which will create test patterns but not a
netlist, which of course assumes that the NAND tree already exists
in the circuit. There are some discrepancies between the UCF file
for “full” mode and the UCF file for “pattern_only” mode. In “full”
mode, the arguments for procmon_npin and procmon_zpin are
identical. In “pattern_only” mode, the argument for procmon_npin
must be an internal net name. Also, in “full” mode the
“procmon_apin” command is ignored, but in “pattern_only” mode
it is required.
(2) = CLK
(1) = Q
(3) = DATA
(4) = Q
6
Automatic Tester Requirements
CHAPTER 6 - SECTION 1
Overview
Your device will be tested at two different stages during the
manufacturing process: Once at wafer probe, and once at final test
after the device has been packaged. During both testing sequences,
the automatic tester uses the test patterns which you created during
tester functional and parametric simulations. The tester verifies the
correct operation of each circuit by clocking in test patterns, and then
checks to see whether the output patterns are identical to the ones
predicted by simulation. The tester also verifies static and switching
parametric values on the pins of the device.
Information about how the testers used at LSI Logic operate can help
you to decide the best way to create test patterns for your circuit.
CHAPTER 6 - SECTION 2
Test Flow
The types of tests performed by the tester and the sequence in
which they occur is shown in Figure 6.1. When LSI Logic tests a
device, the tests are executed in the order shown in Figure 6.1. This
method reduces test time by identifying the most common failure
modes early in the sequence. Most manufacturing defects will be
detected in the first three tests.
Gross IDD X
Gross Functional X
Three-State X
VIL/VIH X
PROCMON X
Functional X
Static IDD X
VOH X
VOL X
IIL/IIH X
IOZ X
Figure 6.1
Test Flow
CHAPTER 6 - SECTION 3
Automatic Tester Configuration
LSI Logic currently uses a number of different automatic testers:
These Production Testers can handle pin counts up to 784 pins, can
run at speeds up to 100MHz, and can handle up to 8 scan chains.
Some of the testers are also setup to provide mixed mode capability.
These testers physically apply voltage and/or current to the device
under test, measure the voltage and current levels on the inputs and
outputs, and measure propagation delays through representative
paths.
T0 T0 T0
Input
0 0
Output
Strobe
Figure 6.2
Fundamentals of Tester Operation
Figure 6.3 depicts how the automatic tester applies input patterns
and measures outputs of the device under test. The shaded box at
the left labelled A represents the local memory in the tester as well
as the test patterns contained in the local memory. The patterns
from the input columns are taken out of the local memory and
applied to the device under test one pattern at a time. At the same
time, the patterns from the output columns are compared with the
output pins of the device under test.
A single binary bit in the local memory does not contain sufficient
information for the tester to apply the correct waveform to the
device under test. Therefore, a waveform generator is needed to
generate the proper waveform characteristics. All LSI testers have
at least six such generators, referred to as input timesets.
RZ DNRZ NRZ
1 0 1 0 1
TESTER LOCAL
MEMORY
AUTOMATIC TESTER
Figure 6.3
Automatic Tester Equivalent Diagram
tg1T
TS!
CLK1 tg2
CLK2 tg3
CNTL1
CNTL2 tg4
CNTL3
tg5
Figure 6.4
Example Showing Five Timesets
CHAPTER 6 - SECTION 4
Test Period Length
The minimum test period size must be greater than or equal to two
parameters:
IBUF
B4
Many macrocells
Figure 6.5
Maximum Propagation Delay Path
T0 T0 T0 T0 T0 T0
Input
Output
Strobe
Figure 6.6
Defining a Shorter Test Period
T0 T0 T0
Input
Best case
Worst case
1 1 1
Output 0 0 0
Strobe
T0 ≥ Tester minimum
Tpd MAX + Tester Guardband
Figure 6.7
Minimum Cycle Time
period. MAT is an output that in the worst case reaches steady state
10.0 ns after the active edge of CLK. Adding another 2X + 4 ns
tolerance demands a minimum test period size of 27 ns.
CLK 3.0 ns
10.0 ns 14 ns
MAT
Figure 6.8
Determining the Minimum Test Period
CHAPTER 6 - SECTION 5
Input Timesets
Using input waveforms, you may specify that input signals change
state either at the beginning of the test period, or after a fixed delay
measure from the beginning of the test period. Alternatively, you
may specify that input signals have a narrow duty cycle within the
period.
The four types of input waveforms are illustrated in Figure 6.9. The
numbers in the figure refer to the following waveform conventions:
T0 T0 T0
Input
4
≥ 4 ns
D1 D1
1
≥ 10 ns
D2 D2
D3 3
2
≥ 10 ns ≥ 10 ns
Figure 6.9
Four Types of Input Waveforms
Since they are mechanical devices, the automatic testers are neither
as stable nor as precise as the simulation results. When two or more
signals change at the same time, the simulation will accurately
predict circuit operation, but it will not analyze the effect of any
tester head skew (time difference) between the input signals. Even
if you specify the input signals in the same timeset, the tester may
not be able to apply them at exactly the same time. The skew
between any two input pins may be up to Y ns. If the sequence of
certain input signals is critical, either apply them at different
periods, or separate them in different timesets with sufficient delay
between them (normally X ns).
For ideal testing conditions, all inputs should be NRZ type with
data pins changing on the non-active edge of the clock pins, as
illustrated in Figure 6.10.
T0
Data
CLK
Figure 6.10
Idealized Input Test Patterns
T0 T0 T0
Test period 1 Test period 2 Test period 3
For RZ
For RTO
For DNRZ
Figure 6.11
Legal Timeset Initialization
The first test vector of each set of patterns must be an X for every
input pin and a Z for each bidirectional pin. An example of legal
timeset initialization patterns is shown in Figure 6.11. Undriven
non-bidirectional inputs should have a value of either 0 or 1. The X
or Z values are required only on the first pattern of the set (time 0).
Except for zero delay NRZ, input value changes are not legal at
cycle boundaries, and different input timesets must not have the
same delay value.
Bidirectional pin conflicts are not allowed during functional or
three-state test patterns at any time.
In most situations, you can generate all of the test patterns using
only NRZ waveforms for all input pins. However, if the number of
patterns required to test the part is larger than about 32K, use
DNRZ, RZ, or RTO types of waveforms for test pattern
compaction.
Figure 6.12 shows a set of ten test patterns using only NRZ
waveforms. In Figure 6.13, the same set of test patterns is
compacted by using an NRZ waveform for the DATA in, and an RZ
waveform for the CLK in. This arrangement uses only five test
patterns.
DATA T0 T0 T0 T0 T0 T0 T0 T0 T0 T0
CLK
Figure 6.12
Input Test Patterns Using NRZ Waveforms Only
T0
DATA
CLK
Figure 6.13
Adding RZ to NRZ Reduces the Number of Patterns
CHAPTER 6 - SECTION 6
Examples of Valid and Invalid Waveforms
Figure 6.14 shows how a set of simulation inputs is replicated by
input timesets. Look at the figure to observe that Inputs DATA 0 to
7 are NRZ waveforms, CLOCK 0 is an RZ waveform with 4.0 ns
4ns 4ns
CLOCK 0 4ns 4ns
ENABLE 1
4ns
SELECT 1 6ns
6ns
SELECT 2
8ns 8ns
Figure 6.14
Simulation Inputs Replicated by Timing Generators
Error 1
DATA 0 ~ 7
2ns
4ns
CLOCK 0 4ns
Error 3 Error 2
4ns
ENABLE 1
8ns
Error 4
4ns
CLOCK 1 4ns
4ns 6ns
Figure 6.15
Simulation Inputs that Can’t be Replicated by Timing Generators
CHAPTER 6 - SECTION 7
Output Strobes
The output strobes are similar to a camera in that they capture or
strobe output states at specified points in time: when the outputs
reach steady state. The captured states are then compared with
simulation results.
LSI Logic uses two output strobes to monitor the circuit’s output
pins in order to measure functionality of your design. They are used
to capture the steady-state results of the output pins at the end of
each test period.
The strobes must be set at the end of the test periods. The minimum
test period size is equal to Tpdmax +TOL ns, where Tpdmax is the
maximum propagation delay within the circuit.
The TOL tolerance for the tester is illustrated in Figure 6.16. The
minimum pulse width of a timeset is X ns. The measured value is
compared with the simulated value during this X ns.
15
IN
OUT
10 14
STROBE
IN’
4
OUT’
STROBE’ 10 10
Figure 6.16
Tester Tolerance
Due to the tester’s tolerance, a timeset must not change state within
X ns from either the beginning or the end of the test period.
Therefore, the trailing edge of the strobe cannot be closer than X ns
from the end of the test period. The accounts for 2X ns out of TOL
allowed for tester tolerance. An additional Y ns is added to strobe
time to compensate for the physical limitations of the tester, as
explained in the next paragraph.
When simulating the chip in software, you can specify that an input
signal make a low-to-high transition 15 ns after the beginning of the
test period. Likewise in the tester, you can program the driver for
this input pin to make a low-to-high transition 15 ns after the
beginning of the test period. However, it takes Y ns for the voltage
level at the output of the driver to reach the input threshold voltage
level of the input buffer on the chip. This tolerance factor must also
be allowed for when determining the minimum cycle time.
CHAPTER 6 - SECTION 8
Rules for Creating Time Blocks
Simulation test patterns are sometimes broken up into time blocks
that run sequentially. The term time block is used in Chapter 7,
Prepare Test Patterns for Simulation, and in this chapter.
Because the terminology is used in a different context in this
chapter, we will refer to time blocks used for simulation as
simulation time blocks, and time blocks used for the tester as tester
time blocks. You may have one or more simulation time blocks for
every set of test vectors in a single tester time block. The following
rules apply to tester time blocks.
At the beginning of the test program, two initial vectors are always
required. One defines the function of all pins at the outset of testing,
and one specifies which output pins will be strobed. Each time you
want to add or delete output pins from the output pins to be strobed,
a new test vector is required. If your circuit contains bidirectional
pins, allow two test vectors for each simulation cycle in which the
function (direction) of those pins will be switched in order to avoid
contention.
If a time block has more than the maximum number of test vectors
allowed for the tester you have selected, you have one of two
choices: either eliminate some test vectors or divide them into two
tester blocks.
The test period size must be the same for each set of test vectors in
a given time block. However different time blocks may have
Within any set of functional test patterns, all output pins must be
fully stabilized Y ns prior to the beginning of the next test period.
For example, if the slowest output on the chip has a delay of 9.6 ns,
the test period for that particular set of patterns must be at least 9.6
ns + Y ns.
The tester time block for the VIL/VIH,IDD, and three-state patterns
must be 2 uS (2000 ns).
7
Automatic Tester Requirements
CHAPTER 7 - SECTION 1
Overview
Recall from Chapter 6 that the LSI Logic automatic testers use test
patterns created during tester functional and parametric simulations.
The tester verifies the correct operation of each circuit by clocking in
the test patterns, monitoring the output pins, and checking them
against the output patterns predicted by simulation.
• system simulation
• VIL/VIH simulation
• three-state functional simulation
CHAPTER 7 - SECTION 2
Prepare Simulation Test Patterns
This section contains information you should know in order to
prepare simulation test patterns, including system simulation
patterns, tester functional patterns, VIL/VIH patterns, IDD
measurement, and three-state patterns.
Once the desired results are obtained from system simulation, input
patterns should be created for use by the automatic tester. The tester
functional simulation patterns enable the tester to verify the
device’s functionality, measure the propagation delay through the
representative path, and measure some Static parametric values
such as output voltage levels (VOL, VOH) and three-state output
leakage current. Functional simulation is performed only on the
top-level module, and should thoroughly exercise the network - i.e.,
cause every node in it to change state, or toggle, both from a 0 to 1
and from a 1 to 0 at least once. Full scan testing is used today by
most designers to create high fault coverage test vectors. And the
IDDQ tool in the FlexStream toolset allows designers to check their
toggle coverage.
timing, and different output strobe times. Different output pins may
be monitored on each block of patterns.
• Non-Return-to-Zero (NRZ)
• Delay-Non-Return-to-Zero (DNRZ)
• Return-to-Zero (RZ)
• Return-to-One (RTO)
Tester threshold simulation patterns are used to test the circuit for
input low voltage and input high voltage. These patterns enable the
tester to measure the circuit for input threshold voltages. The tester
VIL/VIH simulation must be performed under the following
conditions: VDD =<Nominal Value>, TEMP=25 C, worst-case
process, and a test period of 2000 ns. Only 0 ns delay Non-Return-
to-Zero (NRZ) signals are legal for the VIL/VIH test patterns.
Inputs
mi VDD To circuit
m2 To circuit
m1 To circuit
procmon mpout
mode
VDD
Figure 7.1
The Vil/Vih NAND Tree
The VIL/VIH test is executed with the input driver levels set at
VIL, (max) and VIH (min). This leaves no noise margin for the
input buffers. While switching, output buffers can generate large
current spikes capable of causing the input buffers to misread
voltage (and therefore logical) levels.
The VIH/VIH test patterns are specific. The first pattern must
consist of 1’s on the input pins for non-inverting input buffers and
0’s on the inputs of inverting input buffers. Then 0’s or 1’s are
applied according to the procedure shown in Figure 7.2 (all inputs
are non-inverting in this example). Note that a circuit containing n
number of input pins requires the use of at least (n + 1) VIL/VIH
test patterns. To test threshold both low-to-high and high-to-low
may require twice this number of vectors, but the overall vector
length is short and is determined by the number of inputs and
bidirect pins in a design.
For networks that do not have an extra output pin for the VIL/VIH
test circuit, a regular output pin or pins may be used to multiplex a
regular output with the VIL/VIH test output. Figure 7.2 shows VIL/
VIH test patterns. Make sure not to use a critical path output,
however, if you do intend to multiplex a functional output with this
TESTOUT signal.
TESTOUT
M1 M2 M3 Mi i=even i=odd
1 1 1... 1 1 0
1 1 1... 0 0 1
1 1 0... 0 1 0
1 0 0... 0 0 1
0 0 0... 0 1 0
Figure 7.2
Example NAND Tree Input Patterns
If regular input pins were used to multiplex the output for the VIL/
VIH circuit, a set of supplementary test patterns would have to be
added to the VIL/VIH test patterns. The purpose of these
supplementary test programs would be to check those input pins
that were used to multiplex the VIL/VIH test output.
M1 5V
M2
M3 MUX
TN
A
B TOUT
S
Normal
system
M4 output
TN signal
TE
Figure 7.3
A NAND Tree with Bidirects
M1 M2 M3 Mi TE TOUT
1 1 1 1 1 0
1 1 1 0 1 1
1 1 0 0 1 1
1 0 0 0 1 0
0 0 0 0 1 1
Put “0” on input B of 2:1 mux 0 0
Figure 7.4
Test Patterns for NAND Tree with BiDirects
Only a few patterns are needed to set up the circuit for Static
leakage current (the IDD test). Test conditions should be set up in
which all Static current is turned off to facilitate easy measurement
of Static leakage. The test condition must eliminate the following
sources of Static current:
The first two items mentioned above take priority over the others if
all cannot be attained.
To summarize, there are at least four sets of tester patterns that you
must create:
CHAPTER 7 - SECTION 3
Determine the Output Strobe Timing
Recall from Chapter 6 that two strobes are available. Both strobes
allow a direct comparison of expected (simulated) and actual
(measured) outputs. The first strobe is placed near the end of each
test period to test the functional operation of most of the output
pins. The second strobe can be used to get a feeling for the
performance of the chip. Place the second strobe earlier in time than
the first strobe within the cycle to measure a representative path
delay using the remaining output pins, as shown in Figure 7.5.
Input
Output
#1
Output
#2
Strobe
#1
(TS7)
Strobe
#2
(TS8)
Figure 7.5
Using Output Strobe #2
7.6. In a shorter path, the delays being measured are really those of
the input and output buffers and not the delay of the internal
macrocells. An example of this is shown in Figure 7.7.
IBUF
D Q B4
LD1
G QN Many macrocells
Figure 7.6
Long Path Delay, Dominated by Chip Circuitry
OUT
D Q
CLK
CP
Figure 7.7
Short Path Delay, Dominated by Tester Capacitance
CHAPTER 7 - SECTION 4
Avoid Voltage Spikes and Bus Contention
This section describes information about how to avoid voltage
spikes and bus contention. It is helpful to understand this
information before creating simulation input patterns.
A B LOW
tPDLH=0.6 ns Z
tPDHL=0.2 ns
Figure 7.8
NOR Gate Output Edge Asymmetry
Figure 7.9 shows the conditions in which a voltage spike will occur
on the NR2 gate in Figure 7.8.
Low
B
A
Voltage level questionable
.3 ns
Output was not given enough
time to stabilize after input
Z
tPDLH=.6 ns
t+.3 t+.6
Figure 7.9
NOR Output Asymmetry Generating a Spike/Glitch
B Low
High
Low
Z
t t+.6
If
Figure 7.10
Avoiding the Glitch
Figure 7.10 shows the conditions under which a voltage spike will
not occur on the same NR2 gate.
0 X
0 0 Output half A B
of bidirect
1 X A B Z 0
1 1 External Z 1
Source C
X 0 Z Z
X 1 Input half 0 Z
of bidirect
0 1 1 Z
1 0 Z Z
Values for signals
A, B that will
cause external
bus contention
Figure 7.11
Avoiding Bus Contention
The table to the right of the drawing by contrast, show values that
will not result in external bus contention. Bus contention will not
occur when either A is in a high impedance (Z) state, while B is in a
defined state (1 or 0), or vice-versa, or when both A and B are in the
high impedance (Z) state.
8
The Beginning
The material in this book should be digested before you begin your
ASIC design. Most of the techniques discussed here have been tried
or used in real designs at LSI Logic. Techniques that do not work or
have caused simulation problems in ASICs have been analyzed, and
this book has proposed alternatives to resolve difficulties caused by
the original logic configurations.
LSI Logic ASIC Customer Engineer (ACE) is from the first day
you begin your design work. That person will be your technical
contact throughout the duration of your interface with LSI Logic.
Use them as a resource.
Training Classes
http://www.lsilogic.com/techsupp/training/index.html
For LSI Logic personnel, you can reach a more expanded list of
training classes at: (Intranet site)
http://webvision.lsil.com
Milpitas, CA 95035
(408) 433-7687
Summary
one. It is our philosophy that each design must work right the first
time; this requires that we provide you the highest quality products,
software tools, and training in the industry. Our Training WEB sites
Training classes, and reference documents are available to provide
you with the information necessary to make your job as easy as
possible when you are using LSI Logic’s ASIC products and design
tools.
• Service -
which we provide at every step of the design process.
• Customer satisfaction -
an LSI Logic priority.