System-on-Chip Test Architectures Ch. 7 - Low-Power Testing - P. 1
System-on-Chip Test Architectures Ch. 7 - Low-Power Testing - P. 1
System-on-Chip Test Architectures Ch. 7 - Low-Power Testing - P. 1
Low-Power Testing
Focus
Outline
1. Introduction
2. Energy and power modeling 3. Test power issues 4. Low-power scan testing 5. Low-power BIST 6. Low-power test data compression 7. Summary and conclusion
1. Introduction
1. Introduction
Power dissipation in test mode is much higher than during functional mode
The circuit is highly stressed No correlation between consecutive test vectors Test vectors ignore functional constraints DFT circuitry is intensively used Parallel testing is often used for efficiency Low-power functional features (e.g., gated clock) often disabled during test
1. Introduction
Industry generally resorts to ad-hoc solutions:
Over sizing packages and use of cooling systems Test with reduced operation frequency Partitioning and appropriate test planning
Static power: power consumed when the circuit is idle (leakage power) Dynamic power: power consumed when the circuit is switching its state
Charging (01): of energy dissipated as heat
P
Vdd
N
EE141 System-on-Chip Test Architectures
has impact on the battery lifetime during power up or periodic self-test of battery operated devices
determines the thermal and electrical limits of components and the system packaging requirements
EE141 System-on-Chip Test Architectures
Average
Switching Activity
Test Frequency
Test Length
10
Combinational Toggling
Sequential Toggling
Clock Toggling
11
Heat produced during the functioning of a circuit is proportional to the dissipated power (Joule effect) and is responsible for die temperature increase Too high temperature can provoke irreversible structural degradations (premature destruction)
Too high temperature may affect circuit performance or can have an impact on the ICs reliability (corrosion, electro-migration, hot-carrier-induced defects, dielectric breakdown, )
12
Power supply noise L(di/dt) due to current variations through inductive connections (probes for wafer testing, pins for packaged circuits) Ground bounce or Voltage surge/droop - may change the rise/fall times of some signals in the circuit IR drop (resistive effect) and crosstalk (capacitive effects) similar effects
13
shift
capture
shift
SE
Time
Time
load/unload cycles
EE141 System-on-Chip Test Architectures
load/unload cycles
Time
14
Response capture
shift
shift
SE LOC scheme
Time
SE LOS scheme
Time
Launch is caused by the difference between the values loaded Time by the last shift pulse (V1) and the first capture pulse (V2)
SE easy
load/unload cycles to implement,
but
fault
LOS
15
Response capture
shift
shift
SE LOS scheme
Time
Time
Launch is caused by the difference between the values loaded by the next-to-last (V1) and the last (V2) shift pulses
Higher fault coverage than LOC, but SE not easy to implement
EE141 System-on-Chip Test Architectures
16
The problem of excessive power during scan testing can be split into two sub-problems: excessive power during the shift operation (called shift power) and excessive power during the capture operation (called capture power) At-speed scan testing especially vulnerable to excessive IR drop caused by the high switching activity generated in the CUT between launch and capture yield loss
17
The fraction of dont care bits (Xs) in a given ATPG test cube is nearly always a very large fraction of the total number of bits despite the application of state-of-the-art dynamic and static test pattern compaction techniques
In classical ATPG, Xs are randomly filled and then the resulting fully specified pattern is simulated to confirm detection of all targeted faults and to measure the amount of fortuitous detection
Ch. 7 Low-Power Testing - P. 18
18
Clever assignment of dont care bits in combinational (PODEM like) ATPG in order to minimize the number of transitions between two consecutive test vectors Minimizing the difference between the beforecapture and after-capture output values of a scan flip-flop
19
Applicable at the end of the design process, no area overhead Reduce test power consumption by reasonable increase of test length A few solutions exist for reducing power during test cycle (LOC)
EE141 System-on-Chip Test Architectures
20
Static compaction minimizes the number of test cubes generated by an ATPG tool by merging test cubes that are compatible in all bit positions Example 1: 11XX0 and 1X0X0 are compatible ( 110X0) Example 2: 11XX0 and 011X1 are not compatible
Conventional approaches target the minimum number of final test cubes [Sankaralingam 2000] used a greedy heuristic for merging test cubes in a way that minimizes the number of transitions (use of weighted transition metric)
D SI
0 1
Q output
SO
CLK
SE
Gate scan cells block transitions during scan shifting Very effective in test power reduction Significant area overhead and performance degradation
22
0101
1100 0 0 0 1
1 0 1 0
0 1 0 1
0 0 1 0
0 0 1 1
0 0 0 1
0 0 0 0
0 0 0 0
Need to change the order of bits in each vector during test application Scan cell reordering may lead to significant power reduction (up to 66%) No overhead, FC and test time unchanged, low impact on design flow May lead to routing congestion problems
EE141 System-on-Chip Test Architectures
23
Partition the circuit in clusters (by using geographical criteria) Then reorder the scan cells within each cluster so as to reduce WSA Clusters are then stitched together using the nearest neighbor criteria Good tradeoff between test power reduction and scan chain length
EE141 System-on-Chip Test Architectures
24
Capture
CLKA
Scan Chain C
ScanOut
CLKB CLKC
CLKB
Clock Adaptor
CLKC
One segment at a time is active during scan shifting Average power reduced by a factor of N with no impact on area and FC Clock power is reduced by gating the clock trees rather than the SE signals
EE141 System-on-Chip Test Architectures
25
Inserting logic elements (XOR gates) between scan cells in order to minimize the number of transitions occurring inside the scan chain Use of buffers (of various size) in multi-scan circuits to provoke a slight temporal shift between scan chains and reduce peak power
26
ScanOut 1 2 j N
CLK
Multiphase Generator
Scan architecture that uses the concept of a token ring to reduce shift power SI is broadcasted to all scan cells but only one scan cell is activated at a time An N-phase non-overlapping clocking scheme is applied with one clock for each scan cell
EE141 System-on-Chip Test Architectures
27
ScanIn
0 1 TCK
1 0 D
D1 Q
ScanOut
CLK
Ti
CLR D D2
T0
S S0 D0
Alternative solution to avoid large area overhead of the N multiphase clock routes and inter-phase skews due to the different lengths of the N clock routes It embeds the multiphase clock generator into each scan cell Require the use of a new type of scan cells, called token scan cells
EE141 System-on-Chip Test Architectures
28
Time
ScanIn
Scan Cells A
SE
Scan Cells B
1 0
ScanOut
Vdd CLK/2
2T
4T
Time
The two clocks are synchronous with the system clock and have the same period during shift operation except that they are shifted in time During capture operation, the two clocks operate as the system clock
Lowers the transition density in the CUT, the scan chains and the clock tree
EE141 System-on-Chip Test Architectures
29
A test pattern generator (TPG) automatically generates test patterns for application to the inputs of the circuit under test (CUT) In-circuit TPGs constructed from LFSRs are most commonly used LFSRs are also used for output response analyzer (ORA) BIST is implemented as Test-per-scan or as test-per-clock Even if it is slower, test-per-scan is the industry preferred solution today
EE141 System-on-Chip Test Architectures
30
5. Low-Power BIST
Low power test pattern generators (1/3)
Circuit Under Test
CLK
Slow LFSR/MISR
Normal-speed LFSR/MISR
Dual-Speed LFSRs is based on two LFSRs running at different frequencies Average power during test is reduced by connecting the CUT inputs with the highest transition densities to the low speed LFSR while CUT inputs with the lowest activity are connected to the normal speed LFSR
EE141 System-on-Chip Test Architectures
31
5. Low-Power BIST
Low power test pattern generators (2/3)
k
LFSR r SI TFF Scan Chain m ScanOut
CUT
Low transition random test pattern generator involves inserting an AND gate and a toggle flip-flop (TFF) between the LFSR and the input of the scan chain to increase the correlation of neighboring bits in the scan vectors TFF holds its previous values until it receives a 1 on its input. The same value (0 or 1) is repeatedly scanned into the scan chain until the value at the output of the AND gate becomes 1
EE141 System-on-Chip Test Architectures
32
5. Low-Power BIST
Low-power test pattern generators (3/3)
By carefully choosing the seed of the LFSR (choice of polynomial has no real influence)
By inserting translating logic between the LFSR and the CUT to obtain weighted random test vectors By using Gray counters producing consecutive test vectors with only one bit difference in the case of deterministic testing of data paths
Ch. 7 Low-Power Testing - P. 33
33
5. Low-Power BIST
Vector filtering BIST
Test Sequence
V0
CLK
LFSR
LFSR inhibition
Vi Vj Vk Vl
Decoder
Prevent application of non-detecting (but consuming) vectors to the CUT A decoder is used to store the first and last vectors of each sub-sequence of consecutive non-detecting vectors to be filtered Minimizes average power without reducing fault coverage
EE141 System-on-Chip Test Architectures
34
5. Low-Power BIST
Circuit partitioning
A B C D A B
DMUX M U X
C
DMUX
B
DMUX
C
DMUX
C1
C2
C1
M U X
C2
C1
M U X
C2
MUX
MUX
MUX
MUX
Partition the original circuit (using a graph partitioning algorithm that minimizes the cut size) into structural sub-circuits so each sub-circuit can be successively tested through different BIST sessions FC and test time are unchanged and area overhead is quite low Drawbacks are a slight penalty on performance and an impact on routing
EE141 System-on-Chip Test Architectures
35
5. Low-Power BIST
Power-aware test scheduling (1/3)
Power Power limit
Test time
The goal is to determine the blocks (memory, logic, analog, etc.) of an SOC to be tested in parallel at each stage of the BIST session in order to keep power dissipation under a specified limit while optimizing test time Some of the test resources (pattern generators and response analyzers) must be shared among the various blocks
EE141 System-on-Chip Test Architectures
36
5. Low-Power BIST
Power-aware test scheduling (2/3)
The NP-complete test scheduling problem may be addressed by using a compatibility graph and heuristic-driven algorithms For given power constraints and parameters related to the test organization (fixed, variable, or undefined test sessions with or without precedence constraints) or to the test structure (test bus width, test resources sharing), these solutions allow to optimize overall SOC test time
37
5. Low-Power BIST
Power-aware test scheduling (3/3)
SOC
Core 1 Embedded Tester Tester Memory BIST Core 3 BIST Core 4 BIST Core 5 Core 2
BIST
BIST
Test Controller
The test set is composed of core-level locally generated pseudo-random test patterns and additional deterministic test patterns that are generated off-line and stored in the system A careful tradeoff between the deterministic pattern lengths of the core must therefore be made in order to produce a globally optimal solution
EE141 System-on-Chip Test Architectures
38
High test data volume leads to a high testing time and may exceed the limited memory depth of ATE Test data compression involves encoding a test set so as to reduce its size ATE limitations, i.e., tester storage memory and bandwidth gap between the ATE and the CUT, may hence be overcome Using compressed test data involves having an on-chip decoder which decompresses the data Low-power test data compression techniques are needed to concurrently reduce scan power dissipation and test data volume during test
Ch. 7 Low-Power Testing - P. 39
39
Use of 0-filling on ATPG test cubes and then encode runs of 0s with Golomb codes (runlength codes) for reducing the number of transitions (75%) Golomb coding is very inefficient for runs of 1s A synchronization signal between the ATE and the CUT is required as the size of the compressed data (codeword) is of variable length Alternating run-length coding improves the encoding efficiency of Golomb coding (can encode both runs of 0s and runs of 1s ) 40
41
Low power LD using LFSR reseeding can be used LFSR reseeding not used to directly encode specified bits Each test cube divided into blocks LFSR reseeding used only to produce blocks containing transitions For blocks not containing transitions Logic value fed into scan chain simply held constant Reduces number of transitions in scan chain Efficient solution to trade-off between test data compression and test power reduction
42
Segment 1 Segment 2
Multi-Hot Decoder
Segment M
Output Compressor
Based on broadcasting the same value to multiple scan segments SAC enhances the Illinois scan architecture by avoiding the limitation of having to have all segments compatible to benefit from the segmentation Test power is reduced as segments which are incompatible during the time needed to upload a given test pattern are not clocked
EE141 System-on-Chip Test Architectures
43
Column address
Scan cells are configured as an SRAM-like structure using PRAS scan cells PRAS allows individual accessibility to each scan cell, thus eliminating unnecessary switching activity during scan, while reducing the test application time and test data volume by updating only a small fraction of scan-cells
EE141 System-on-Chip Test Architectures
44
Motivated by the need to concurrently test several banks of memories in a system to reduce test time A first strategy is to reorder memory tests to reduce the switching activity on each address line while retaining the fault coverage and the memory overall test time
Original Test Low-power Test s (W0, R0, W1, R1); s (W(1odd/0even), R(1odd/0even),
W(0odd/1even), R(0odd/1even));
Power dissipation reduced by a factor of two to A special design of the BIST circuitry is needed
EE141 System-on-Chip Test Architectures
sixteen
45
A second strategy is to exploit the predictability of the addressing sequence to reduce the pre-charge activity during test Pre-charge circuits contribute to up to 70% to power dissipation In functional mode, the cells are selected in random sequence, and all pre-charge circuits need to be always active, while during the test mode the access sequence is known, and hence only the columns that are to be selected need to be pre-charged This low-power test mode can be implemented by using a modified pre-charge control circuitry, and by exploiting the first degree of freedom of March tests, which allows choosing a specific addressing sequence Addressing sequence is fixed to word line after word line and the pre-charge activity is restricted to only two columns for each clock cycle: the selected column and the following one
Ch. 7 Low-Power Testing - P. 46
46
Prec
Prec
Prj-1 CS
j-1
Prj CSj
Prj+1 CSj+1
50% power savings with negligible impact on area overhead and memory performance
Ch. 7 Low-Power Testing - P. 47
47
Test throughput and manufacturing yield may be affected by excessive test power Therefore, lowering test power has been and is still a focus of intense research and development Following points have been surveyed:
Problems induced by an increased test power Structural and algorithmic solutions for low-power test along with their impacts on parameters such as fault coverage, test time, area overhead, circuit performance penalty, and design flow modification
EE141 System-on-Chip Test Architectures
48
Dynamic power management techniques "Shut-down" parts of design when idle Testing currently done sequentially Test deals with power domains one at a time Practice becoming inadequate due to test time concern Multiple-voltage domains used to reduce power How to safely handle test of such designs?
49