Lecture9 Placement1
Lecture9 Placement1
Placement (1)
Prof. David Pan
dpan@ece.utexas.edu
Office: ACES 5.434
10/22/08 1
Problem formulation
• Input:
– Blocks (standard cells and macros) B1, ... , Bn
– Shapes and Pin Positions for each block Bi
– Nets N1, ... , Nm
• Output:
– Coordinates (xi , yi ) for block Bi.
– No overlaps between blocks
– The total wire length is minimized
– The area of the resulting block is minimized or given a fixed
die
• Other consideration: timing, routability, clock, buffering
and interaction with physical synthesis
2
Different Wire Length
3
Different Routability/Chip Area
4
Placement can Make a Difference
• MCNC Benchmark circuit e64 (contains 230 4-LUT).
Placed to a FPGA.
Random Initial Final After Detailed
Placement Placement Routing
5
Importance of Placement
• Placement is a fundamental problem for physical design
• Glue of the physical synthesis
• Becomes very active again in recent years:
– Many new academic placers for WL min since 2000
– Many other publications to handle timing, routability, etc.
• Reasons:
– Serious interconnect issues (delay, routability, noise) in deep-
submicron design
• Placement determines interconnect to the first order
• Need placement information even in early design stages (e.g., logic
synthesis)
– Placement problem becomes significantly larger
– Cong et al. [ASPDAC-03, ISPD-03, ICCAD-03] point out that
existing placers are far from optimal, not scalable, and not stable
6
Design Types
• ASICs
– Lots of fixed I/Os, few macros, millions of standard cells
– Placement densities : 40-80% (IBM)
– Flat and hierarchical designs
• SoCs
– Many more macro blocks, cores
– Datapaths + control logic
– Can have very low placement densities : < 40%
• Micro-Processor (µP) Random Logic Macros(RLM)
– Hierarchical partitions are placement instances (5-30K)
– High placement densities : 80%-98% (low whitespace)
– Many fixed I/Os, relatively few standard cells
7
Requirements for Placers (1)
• Must handle 4-10M cells, 1000s macros
– 64 bits + near-linear asymptotic complexity
– Scalable/compact design database (OpenAccess)
• Accept fixed ports/pads/pins + fixed cells
• Place macros, esp. with var. aspect ratios
– Non-trivial heights and widths
(e.g., height=2rows)
• Honor targets and limits for net length
• Respect floorplan constraints
• Handle a wide range of placement densities
(from <25% to 100% occupied), ICCAD `02
8
Requirements for Placers (2)
9
Optimal Relative Order:
A B C
10
To spread ...
A B C
11
.. or not to spread
A B C
12
Place to the left
A B C
13
… or to the right
A B C
14
Optimal Relative Order:
A B C
15
Placement Footprints:
Standard Cell:
Data Path:
IP - Floorplanning
16
Placement Footprints:
Core
Reserved areas
IO Control
17
Placement Footprints:
Perimeter IO
Area IO
18
Unconstrained
Placement
19
Floor planned
Placement
20
VLSI Global Placement Examples
21
Major Placement Techniques
• Simulated Annealing
– Timberwolf package [JSSC-85, DAC-86]
– Dragon [ICCAD-00]
• Partitioning-Based Placement
– Capo [DAC-00]
– Fengshui [DAC-2001]
• Analytical Placement
– Gordian [TCAD-91]
– Kraftwerk [DAC-98]
• FastPlace [ISPD-04]
• Hall’s Quadratic Placement
• Genetic Algorithm
22
Outline
• Wire length driven placement
• Main methods
– Simulated Annealing
• Gate-Array: Timberwolf package
• Standard-Cell: Timberwolf package, Dragon
– Partition-based methods
– Analytical methods
– Timing, congestion and other considerations
• Global placement (rough location)
• Detailed placement (legalization)
23
A down-to-the-earth method
• Clustering growth
– Select unplaced components and place them in slots
– SELECT: choose the unplaced component that is most
strongly connected to all (or any single) of the placed
component
– PLACE: place the selected component at a slot such
that a certain “cost” of the partial placement is
minimized
– Simple and fast: ideal for initial placement
24
Simulated Annealing Based Placement
( I ) “ The Timberwolf Placement and Routing Package”, Sechen,
Sangiovanni; IEEE Journal of Solid-State Circuits, vol SC-20, No. 2(1985)
510-522
“Timber wolf 3.2: A New Standard Cell Placement and Global Routing
Package” Sechen, Sangiovanni, 23rd DAC, 1986, 432-439
Timber wolf
Stage 1
❁ Modules are moved between different rows as well as within the same
row
❁ modules overlaps are allowed
❁ when the temperature is reduced below a certain value, stage 2 begins
Stage 2
❁ Remove overlaps
❁ Annealing process continues, but only interchanges adjacent modules
within the same row
25
Solution Space
overlaps
26
Neighboring Solutions
Three types of moves:
.
M1: Displace a module to .
a new location
modules
1 2 2 1 1 2
Axis of
reflections
3 4 3 4 3 4
27
Move Selection
28
Move Restriction
Range Limiter
❁ At the beginning, R is very large, big enough to
contain the whole chip
❁ Window size shrinks slowly as the temperature
decreases. In fact, height and width of R ∝ log(T)
❁ Stage 2 begins when window size are so small
that no inter-row modules interchanges are
possible
Rectangular window R
29
Cost Function
net i
Ψ = C1+C2+C3
hi
C1 : ∑(α i w i + β i h i ) wi
i
α i, β i are horizontal and vertical weights, respectively
α i =1, β i =1 ⇒1/2 •perimeter of bounding box
❁ Critical nets: Increase both α i and β i
30
Cost Function (Cont’d)
i≠ j
31
Annealing Schedule
• Tk = r(k)•T k-1 k= 1, 2, 3, ….
r(k) increase from 0.8 to max value
0.94 and then decrease to 0.1
• At each temperature, a total number
of K•n attempts is made
n= number of modules
K= user specified constant
32
Dragon2000:
Standard-Cell Placement Tool for Large
Industry Circuits
10/22/08 33
Main Idea
• Simulated annealing based
– 1.9x faster than iTools 1.4.0 (commerical version of TimberWolf)
– Comparable wirelength to iTools (i.e., very good)
– Performs better for larger circuits
– Still very slow compared with than other approaches
– Also shown to have good routability
• Top-down hierarchical approach
– hMetis to recursively quadrisect into 4h bins at level h
– Swapping of bins at each level by SA to minimize WL
– Terminates when each bin contains < 7 cells
– Then swap single cells locally to further minimize WL
• Detailed placement is done by greedy algorithm
34
Outline
• Wire length driven placement
• Main methods
– Simulated Annealing
• Gate-Array: Timberwolf package
• Standard-Cell: Timberwolf package, Grover, Dragon
– Partition-based methods
– Analytical methods
35
Partition based methods
• Partitioning methods
– FM
– Multilevel techniques, e.g., hMetis
• Two academic open source placement tools
– Capo (UCLA/UCSD/Michigan): multilevel FM
– Feng-shui (SUNY Binghamton): use hMetis
• Pros and cons
– Fast
– Not stable
36
Partitioning-based Approach
• Try to group closely connected modules together.
• Repetitively divide a circuit into sub-circuits such that the
cut value is minimized.
• Also, the placement region is partitioned (by cutlines)
accordingly.
• Each sub-circuit is assigned to one partition of the
placement region.
37
An Example
Cutline
Circuit
Placement
38
Variations
• There are many variations in the partitioning-based
approach. They are different in:
– The objective function used.
– The partitioning algorithm used.
– The selection of cutlines.
39
Partitioning:
Objective:
40
FM Partitioning:
After Cut 2
41
FM Partitioning:
Moves are made based on object gain.
-1 0 2
- each object is assigned a
gain
- objects are put into a sorted 0
gain list 0 -
-2
- the object with the highest gain
from the larger of the two sides
is selected and moved.
- the moved object is "locked" 0 0
- gains of "touched" objects are -2
recomputed
-1
- gain lists are resorted
1
-1
1
42
FM Partitioning:
-1 0 2
0
0 -
-2
0 0
-2
-1
1
-1
1
43
-1 -2 -2
0
-2 -
-2
0 0
-2
-1
1
-1
1
44
-1 -2 -2
0
-2 -
-2
0 0
-2
-1
1 1
-1
45
-1 -2 -2
0
-2 -
-2
0 0
-2
-1
1
1
-1
46
-1 -2 -2
0
-2 -
-2
0 -2
-2
1 -1
-1
-1
47
-1 -2 -2
-2 -
-2 0
0 -2
-2
1 -1
-1
-1
48
-1 -2 -2
-2 -
-2 0
0 -2
-2
1 -1
-1
-1
49
-1 -2 -2
-2 1
-2
0
-2 -2
-2
1 -1
-1
-1
50
-1 -2 -2
-2 1
-2
0
-2 -2
1 -2
-1
-1
-1
51
-1 -2 -2
-2 1
-2
0
-2 -2
1 -2
-1
-1
-1
52
-1 -2 -2
-2 1
-2
0
-2 -1
-2
-2
-3
-1
-1
53
-1 -2 -2
1
-2
-2
0
-2 -1
-2
-2
-3
-1
-1
54
-1 -2 -2
1
-2
-2
0
-2 -1
-2
-2
-3
-1
-1
55
-1 -2 -2
-1
-2
-2
-2
-2 -1
-2
-2
-3
-1
-1
56
Quadrature Placement Procedure
3a
1
3b
4a 2 4b
57
Bisection Placement Procedure
3a
2a
3b
1
3c
2b
3d
5a 4 5b
6a 6b 6c 6d
58
Terminal Propagation Algorithm by Dunlop
and Kernighan
10/22/08 59
Problem of Partitioning Subcircuits
A B
A B B
A
A A B
A B B
61
Terminal Propagation
62
Creating Circuit Rows
• Terminal propagation reduce overall area by ~30%
• Creating rows
– Choose α and β preferably to balance row to balance row
length (during re-arrangement )
63
Can Recursive Bisection Alone Produce Routable
Placement?
(Name of placer: Capo)
10/22/08 64
Capo Overview
65
Capo Approach
66
Partitioning:
Pros:
- very fast
- great quality
- scales nearly linearly with problem size
Cons:
- non-trivial to implement
- very directed algorithm, but this limits the ability to deal with
miscellaneous constraints
- Not stable (if there is minor change)
67
Summary for Partition Based Placement
68