Buffer Insertion

Interconnect Optimizations
A scaling primer
G
Ideal process scaling:
Device geometries shrink by S= 0.7x)
Device delay shrinks by s
Wire geometries shrink by
R/ : /(ws.hs) = r/s2
Cc/ : (hs). /(Ss) = Cc
C/ : similar
R/ doubles, C/ and Cc/ unchanged
h
l
w S
Interconnect role
Short (local) interconnect
Used to connect nearby cells
Minimize wire C, i.e., use short min-width wires
Medium to long-distance (global) interconnect

Size wires to tradeoff area vs. delay
Increasing width Capacitance increases, Resistance
decreases Need to find acceptable tradeoff - wire sizing problem
Fat wires
Thicker cross-sections in higher metal layers
Useful for reducing delays for global wires
Inductance issues, sharing of limited resource
Cross-Section of A Chip
Block scaling
Block area often stays same
# cells, # nets doubles
Wiring histogram shape invariant
Global interconnect lengths dont shrink

Local interconnect lengths shrink by s
Interconnect delay scaling
Delay of a wire of length l :

int = (rl)(cl) = rcl2
(first order)
Local interconnects :
int : (r/s2)(c)(ls)2 = rcl2
Local interconnect delay unchanged (compare to faster devices)
Global interconnects :
int : (r/s2)(c)(l)2 = (rcl2)/s2
Global interconnect delay doubles unsustainable!
Interconnect delay increasingly more dominant
Buffer Insertion For Delay

Reduction
Analysis of Simple RC Circuiti(t)

R
R i (t ) v (t ) vT (t )
d (Cv (t ))
dv (t )
i (t )
C
dt
dt
dv (t )
RC
v(t ) vT (t )
dt
state
variable
Input
waveform
vT(t)
v(t)
Analysis of Simple RC Circuit

Step-input response:
v0
v0u(t)
v0(1-e-t/RC)u(t)
dv (t )
v (t ) v0u (t )
dt
t
v (t ) Ke RC v0u (t )
RC
match initial state:

v (0) 0 K v0u (t ) 0
output response for step-input:
v (t ) v0 (1 e
RC
)u (t )
Delays of Simple RC Circuit

v(t) = v0(1 - e-t/RC) -- waveform
under step input v0u(t)
v(t)=0.5v0 t = 0.69RC
i.e., delay = 0.69RC
(50% delay)
v(t)=0.1v0 t = 0.1RC
v(t)=0.9v0 t = 2.3RC
i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd)
Commonly used metric
TD = RC
(= Elmore delay)
Elmore Delay
Delay
Elmore Delay
Driver is modeled as R
Driver intrinsic gate delay t(B)
Delay = all Ri all Cj downstream from Ri Ri*Cj
Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2
Elmore delay at n1 R(B)*(C1+C2)
n1
B
R(B)
C1 R(w)
n2
C2
Elmore Delay
For uniform wire
unit wire capacitance c
unit wire resistance r
delay
( xr )( xc )
( xr )C
2
No matter how to lump, the Elmore delay

is the same
Delay for Buffer

u
v
C
u
C(b)
delay (u, v ) t (b) R(b)C

C (u ) C ( b)
Input capacitance
Driver resistance
Intrinsic buffer delay
Buffers Reduce Wire Delay

x/2
rx/2
cx/4 cx/4
x/2
rx/2
cx/4 cx/4
t
t_unbuf = R( cx + C ) + rx( cx/2 + C )
t_buf = 2R( cx/2 + C ) + rx( cx/4 + C )
+ tb
t_buf t_unbuf = RC + tb rcx2/4
Combinational Logic Delay

Regist
er
Primar
y Input
clock
Combination
al Logic
Regist
er
Primar
y
Output
Combinational logic delay <= clock period
Buffered global interconnects:

Intuition
l
Interconnect delay = r.c.l2
l1
l2
l3
ln
Now, interconnect delay = r.c.li2 < r.c.l2 (where l = lj )

since (lj 2) < (lj )2
(Of course, account for buffer delay also)
Optimal inter-buffer length
First order (lumped parasitic, Elmore delay) analysis

L
Rd On resistance of inverter
Cg Gate input capacitance
r,c Resistance, cap. per micron
Assume N identical buffers with equal inter-buffer length l (L = Nl)
T N Rd C g cl rl C g cl / 2
L rcl / 2 rC g Rd c Rd C g
l
For minimum delay,

dT
0
dl
rc Rd C g
2 0
2
lopt
lopt
2 Rd C g
rc
Optimal interconnect delay

Substituting lopt back into the interconnect delay
expression:
Rd C g
Topt L rcl opt rC g Rd c
lopt
L rc
2 Rd C g
rc
rC g Rd c
Rd C g
2 Rd C g
rc
Topt L 2 Rd C g rc rC g Rd c
Delay grows linearly with L (instead of quadratically)
Total buffer count

% cells used to buffer nets
80
clk-buf
70
buf
60
tot-buf
50
40
30
20
10
0
90nm 65nm 45nm 32nm
Ever-increasing fractions of total cell count will be buffers

70% in 32nm
ITRS projections
Relative
delay
100
Feature size (nm)

250
180
130
90
Gate delay (fanout 4)

Local interconnect (M1,2)
Global interconnect with repeaters
Global interconnect without repeaters
10
Source: ITRS, 2003
0.1
65
45
32
Buffers Improve Slack

RAT = 300
Delay
350 =
Slack = -50
slackmin =
-50
RAT = Required Arrival
Time
Slack = RAT - Delay
slackmin = 50
Decouple
capacitive load
from critical path
RAT = 700
Delay
600 =
Slack
100 =
RAT = 300
Delay
250 =
Slack = 50
RAT = 700
Delay
400 =
Slack
300 =
Timing Driven Buffering

Problem Formulation
Given
A Steiner tree
RAT at each sink
A buffer type
RC parameters
Candidate buffer locations
Find buffer insertion solution such that the

slack at the driver is maximized
Candidate Buffering Solutions
Candidate Solution Characteristics

Each candidate
solution is
associated with
vi is a sink
ci is sink
capacitance
vi: a node
ci: downstream
capacitance
qi: RAT
v is an internal node
Van Ginnekens Algorithm
Candidate solutions are

propagated toward the
source
Dynamic Programming
Solution Propagation: Add Wire
(v2, c2, q2)
(v1, c1, q1)
c2 = c1 + cx
q2 = q1 rcx2/2 rxc1
r: wire resistance per unit length
c: wire capacitance per unit length
Solution Propagation: Insert Buffer
(v1, c1b, q1b)
(v1, c1, q1)
c1b = Cb
q1b = q1 Rbc1 tb
Cb: buffer input capacitance
Rb: buffer output resistance
tb: buffer intrinsic delay
28
Solution Propagation: Merge

(v, cl , ql)
(v, cr , qr)
cmerge = cl + cr
qmerge = min(ql , qr)
Solution Propagation: Add Driver
(v0, c0d, q0d)
(v0, c0, q0)
q0d = q0 Rdc0 = slackmin

Rd: driver resistance
Pick solution with max slackmin
Example of Solution Propagation

2
(v3, 5, 8)
Add wire
(v1, 1, 20) Rb = 1, Cb = 1, tb =
1
Rd = 1
Add wire
(v2, 3, 16)
r = 1, c = 1
(v2, 1, 12)
v1
Insert
buffer
(v3, 3, 8)
Add wire
v1
slack =
3
Add
driver
v1
v1
slack =
5
Add
driver
Example of Merging
Left
candidat
es
Right
candidates
Merged
candidates
32
Solution Pruning
Two candidate solutions
(v, c1, q1)
(v, c2, q2)
Solution 1 is inferior if
c1 > c2 : larger load
and q1 < q2 : tighter timing
Pruning When Insert Buffer
They have the same load cap

Cb, only the one with max q is
kept
Generating Candidates
(1)
(2)
(3)
35
From Dr. Charles Alpert
Pruning Candidates
(3)
(a)
(b)
Both (a) and (b) look the same to the source.

Throw out the one with the worst slack
(4)
36
Candidate Example Continued

(4)
(5)
37
Candidate Example Continued

After pruning
(5)
At driver, compute which candidate maximizes

slack. Result is optimal.
38
Merging Branches
Left
Candidates
Right
Candidates
39
Pruning Merged Branches

Critical
With pruning
40
Van Ginneken Example

(20,400)
Buffer
C=5, d=30
(30,250)
(5, 220)
Buffer
C=5, d=50
C=5, d=30
(45, 50)
(5, 0)
(20,100)
(5, 70)
41
Wire
C=10,d=150
(20,400)
Wire
C=15,d=200
C=15,d=120
(30,250)
(5, 220)
(20,400)
Van Ginneken Example Contd

(45, 50)
(5, 0)
(20,100)
(5, 70)
(30,250)
(5, 220)
(20,400)
(5,0) is inferior to (5,70). (45,50) is inferior to (20,100)

Wire C=10
(30,10)
(15, -10)
(20,100)
(5, 70)
(30,250)
(5, 220)
Pick solution with largest slack, follow arrows to get solution

42
(20,400)
Basic Data Structure

Worse load
cap
(c1, q1)
(c2, q2)
(c3, q3)
Better timing
Sorted list such that

c1 < c2 < c3
If there is no inferior
candidates q1 < q2 < q3
Prune Solution List

Increasing c
(c1, q1)
q1 < q 2 ?
(c2, q2)
Prune 2
(c3, q3)
q1 < q 3 ?
q1 < q 4 ?
Prune 3
Y
q2 < q 3 ?
(c4, q4)
Prune 3
q2 < q 4 ?
Y
q3 < q 4 ?
Prune 4
q3 < q4 ?
Prune 4
44
Pruning In Merging
Left
candidate
s
(cl1, ql1)
Right
candidate
s
(cr1, qr1)
(cl2, ql2)
(cr2, qr2)
(cl3, ql3)
(cl1, ql1)
(cr1, qr1)
(cl2, ql2)
(cr2, qr2)
(cl3, ql3)
ql1 < ql2 < qr1 < ql3 <

qr2
(cl1, ql1)
Merged
candidate
(cl2, ql2)
s
(cl3, ql3)
(cl1+cr1,
ql1)
(cr1, qr1)
(cr2, qr2)
(cl2+cr1,
ql2)
(cl1, ql1)
(cr1, qr1)
(cl2, ql2)
(cr2, qr2)
(cl3+cr1,
qr1)
(cl3, ql3)
45
Van Ginneken Complexity

Generate candidates from sinks to source
Quadratic runtime
Adding a wire does not change #candidates
Adding a buffer adds only one new candidate
Merging branches additive, not multiplicative
Linear time solution list pruning
Optimal for Elmore delay model
Multiple Buffer Types

2
(v2, 3, 16)
(v1, 1, 20)
v1
r = 1, c = 1
Rb1 = 1, Cb1 = 1, tb1 = 1

Rb2 = 0.5, Cb2 = 2, tb2 =
0.5
Rd = 1
(v2, 2, 14)
(v2, 1, 12)
v1
v1

Buffer Insertion

Uploaded by

Copyright:

Available Formats

Buffer Insertion

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Buffer Insertion

Uploaded by

Copyright:

Available Formats

What are the different types of interconnects discussed in the document?

What are the different types of interconnects discussed in the document?

How does interconnect delay scale with technology scaling?

How does interconnect delay scale with technology scaling?

Interconnect Optimizations

Ideal process scaling:

Device geometries shrink by S= 0.7x)

Device delay shrinks by s

Wire geometries shrink by

Medium to long-distance (global) interconnect

Global interconnect lengths dont shrink

Interconnect delay scaling

Delay of a wire of length l :

Local interconnect delay unchanged (compare to faster devices)

int : (r/s2)(c)(l)2 = (rcl2)/s2

Global interconnect delay doubles unsustainable!

Interconnect delay increasingly more dominant

Buffer Insertion For Delay

Analysis of Simple RC Circuiti(t)

Analysis of Simple RC Circuit

match initial state:

Delays of Simple RC Circuit

Commonly used metric

unit wire resistance r

No matter how to lump, the Elmore delay

Delay for Buffer

delay (u, v ) t (b) R(b)C

Intrinsic buffer delay

Buffers Reduce Wire Delay

Combinational Logic Delay

Combinational logic delay <= clock period

Buffered global interconnects:

Interconnect delay = r.c.l2

Now, interconnect delay = r.c.li2 < r.c.l2 (where l = lj )

Optimal inter-buffer length

First order (lumped parasitic, Elmore delay) analysis

Assume N identical buffers with equal inter-buffer length l (L = Nl)

For minimum delay,

Optimal interconnect delay

Delay grows linearly with L (instead of quadratically)

Total buffer count

Ever-increasing fractions of total cell count will be buffers

Feature size (nm)

Gate delay (fanout 4)

Source: ITRS, 2003

Buffers Improve Slack

Timing Driven Buffering

Find buffer insertion solution such that the

Candidate Buffering Solutions

Candidate Solution Characteristics

Van Ginnekens Algorithm

Candidate solutions are

Solution Propagation: Add Wire

(v2, c2, q2)

(v1, c1, q1)

Solution Propagation: Insert Buffer

(v1, c1b, q1b)

(v1, c1, q1)

Solution Propagation: Merge