Buffer Insertion
Buffer Insertion
Buffer Insertion
A scaling primer
G
R/ : /(ws.hs) = r/s2
Cc/ : (hs). /(Ss) = Cc
C/ : similar
R/ doubles, C/ and Cc/ unchanged
h
l
w S
Interconnect role
Short (local) interconnect
Used to connect nearby cells
Minimize wire C, i.e., use short min-width wires
Fat wires
Thicker cross-sections in higher metal layers
Useful for reducing delays for global wires
Inductance issues, sharing of limited resource
Cross-Section of A Chip
Block scaling
Block area often stays same
# cells, # nets doubles
Wiring histogram shape invariant
(first order)
Local interconnects :
int : (r/s2)(c)(ls)2 = rcl2
Global interconnects :
R i (t ) v (t ) vT (t )
d (Cv (t ))
dv (t )
i (t )
C
dt
dt
dv (t )
RC
v(t ) vT (t )
dt
state
variable
Input
waveform
vT(t)
v(t)
v0
v0u(t)
v0(1-e-t/RC)u(t)
dv (t )
v (t ) v0u (t )
dt
t
v (t ) Ke RC v0u (t )
RC
v (t ) v0 (1 e
RC
)u (t )
(50% delay)
v(t)=0.1v0 t = 0.1RC
v(t)=0.9v0 t = 2.3RC
i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd)
TD = RC
(= Elmore delay)
Elmore Delay
Delay
Elmore Delay
Driver is modeled as R
Driver intrinsic gate delay t(B)
Delay = all Ri all Cj downstream from Ri Ri*Cj
Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2
Elmore delay at n1 R(B)*(C1+C2)
n1
B
R(B)
C1 R(w)
n2
C2
Elmore Delay
For uniform wire
unit wire capacitance c
delay
( xr )( xc )
( xr )C
2
v
C
u
C(b)
Driver resistance
rx/2
cx/4 cx/4
x/2
rx/2
cx/4 cx/4
t
t_unbuf = R( cx + C ) + rx( cx/2 + C )
t_buf = 2R( cx/2 + C ) + rx( cx/4 + C )
+ tb
t_buf t_unbuf = RC + tb rcx2/4
Combination
al Logic
Regist
er
Primar
y
Output
l1
l2
l3
ln
Rd On resistance of inverter
Cg Gate input capacitance
r,c Resistance, cap. per micron
T N Rd C g cl rl C g cl / 2
L rcl / 2 rC g Rd c Rd C g
l
rc Rd C g
2 0
2
lopt
lopt
2 Rd C g
rc
Rd C g
Topt L rcl opt rC g Rd c
lopt
L rc
2 Rd C g
rc
rC g Rd c
Rd C g
2 Rd C g
rc
Topt L 2 Rd C g rc rC g Rd c
80
clk-buf
70
buf
60
tot-buf
50
40
30
20
10
0
90nm 65nm 45nm 32nm
ITRS projections
Relative
delay
100
180
130
90
0.1
65
45
32
Slack = -50
slackmin =
-50
RAT = Required Arrival
Time
Slack = RAT - Delay
slackmin = 50
Decouple
capacitive load
from critical path
RAT = 700
Delay
600 =
Slack
100 =
RAT = 300
Delay
250 =
Slack = 50
RAT = 700
Delay
400 =
Slack
300 =
vi is a sink
ci is sink
capacitance
vi: a node
ci: downstream
capacitance
qi: RAT
v is an internal node
c2 = c1 + cx
q2 = q1 rcx2/2 rxc1
r: wire resistance per unit length
c: wire capacitance per unit length
c1b = Cb
q1b = q1 Rbc1 tb
Cb: buffer input capacitance
Rb: buffer output resistance
tb: buffer intrinsic delay
28
(v, cr , qr)
cmerge = cl + cr
qmerge = min(ql , qr)
(v3, 5, 8)
Add wire
(v1, 1, 20) Rb = 1, Cb = 1, tb =
1
Rd = 1
Add wire
(v2, 3, 16)
r = 1, c = 1
(v2, 1, 12)
v1
Insert
buffer
(v3, 3, 8)
Add wire
v1
slack =
3
Add
driver
v1
v1
slack =
5
Add
driver
Example of Merging
Left
candidat
es
Right
candidates
Merged
candidates
32
Solution Pruning
Two candidate solutions
(v, c1, q1)
(v, c2, q2)
Solution 1 is inferior if
c1 > c2 : larger load
and q1 < q2 : tighter timing
Generating Candidates
(1)
(2)
(3)
35
Pruning Candidates
(3)
(a)
(b)
(4)
36
(5)
37
38
Merging Branches
Left
Candidates
Right
Candidates
39
40
Wire
C=10,d=150
(20,400)
Wire
C=15,d=200
C=15,d=120
(30,250)
(5, 220)
(20,400)
(30,250)
(5, 220)
(20,400)
(20,100)
(5, 70)
(30,250)
(5, 220)
(20,400)
(c1, q1)
(c2, q2)
(c3, q3)
Better timing
(c1, q1)
q1 < q 2 ?
(c2, q2)
Prune 2
(c3, q3)
q1 < q 3 ?
q1 < q 4 ?
Prune 3
Y
q2 < q 3 ?
(c4, q4)
Prune 3
q2 < q 4 ?
Y
q3 < q 4 ?
Prune 4
q3 < q4 ?
Prune 4
44
Pruning In Merging
Left
candidate
s
(cl1, ql1)
Right
candidate
s
(cr1, qr1)
(cl2, ql2)
(cr2, qr2)
(cl3, ql3)
(cl1, ql1)
(cr1, qr1)
(cl2, ql2)
(cr2, qr2)
(cl3, ql3)
(cr1, qr1)
(cr2, qr2)
(cl2+cr1,
ql2)
(cl1, ql1)
(cr1, qr1)
(cl2, ql2)
(cr2, qr2)
(cl3+cr1,
qr1)
(cl3, ql3)
45
(v2, 3, 16)
(v1, 1, 20)
v1
r = 1, c = 1
(v2, 2, 14)
(v2, 1, 12)
v1
v1