High Performance Design Challenges On 16FF+: Mitigations and Solutions
High Performance Design Challenges On 16FF+: Mitigations and Solutions
High Performance Design Challenges On 16FF+: Mitigations and Solutions
Solutions
Avago Technologies
Bangalore, India
www.avagotech.com
ABSTRACT
Race to outperform & fueled by performance benefits , has drawn attention to 16FinFET+ technology node.
This has brought in new complexity and alien challenges to timing, area and power parameters of the design.
The paper explains how sudden variations in resistance (R) across metal layer stack along with coupling
capacitance (Cc) has hit the performance . It is a comprehensive study of preroute to postroute gaps , Crosstalk
analysis - fixing ,optimum power recovery implementation and congestion mitigations. This paper proposes
distinguished methodologies and features of the tool used to address the issues. The methodology has shown
improvment which means a serious business for the size /complexity of contemporary designs .
In last two years there has been significant developments on 16 FF . The technology has revolutionazied the structure
of transistor. This revolution has brought in new advantages of performance , power , cost. Meanwhile there were many
flow and methodology updates for physical design engineers. Still the technology suffers from challenges of congestion,
timing , DRC , power .
Similar challenges were there in earlier technology nodes also. But as we have scaled down further , and also change the
structure of the transistor , there are various factors which are new. Primarily temperature inversion[1],
layer_resistance[2] , double pattern layers(DPT) [3] takes control of these challenges. The temperature inversion[1] has
made timing analysis for the node different and difficult . Layer resitance [2] , and Crosstalk [3] has made the timing
closure a bit harder.
The traditional approaches could not adapt to the new challenges. The methodology for the discussed challenges takes
each problem conceptualy , and is based on rectifying the root cause. The approach is based on ICC and PrimeTime
platform. Gravity of issues , extent of mitigation and detailed analysis of the results is discussed. Performance and
quality of result is not impacted. There is a tradeoff of Run time with better correlation gap . The approach comes with a
limitation where the subchip owner has to be careful of caevats and pitfalls of the approach.
2. Design stats :
The subchips under study were of varied complexity . The technology used use is 16 FF+ and 12 layer metla stack is
used which is a set of two six odd and six even layers. The following table illustrates the complexity of test cases:
Congestion / congestion map is the measure of routing feasibility of implemented design .Congestion is an outcome of
two factors here in this node. Former being approach to put more performance and complexity in design and increase
the utilization. Later is reduced Routability of lower layers due to double patterning. This had increased deficit of
available vs required resources. Same can be inferred from the deisgn’s congestion map now.
The test case taken for congestion is a design with complex logic connectivity and large cell count. The following
figure shows the congestion challenge we have faced in this design.
Congestion overflow:
28.81%(Overall)
32.8%(Horizontal)
24.9%(Vertical)
Run time : 40 Hours
Traditional congestion mitigation help to reduce overflow number a bit. But the could not make design routable.
4. Timing Difficulties:
Three timing difficulties are discussed. Where temperature inversion causing issue with timing analysis and Layer
resistance and crosstalk makes timing closure a nightmare. Place to Route timing gap is one of the worst challenge one
face in 16 FF.
“Unlike 28nm planar node , a place closed design may not be closed at route.”
The experiments and flow proposed is studied on subchips of varied complexities and found to be consistent and robust.
At higher CMOS technologies, cell delays increases when temperature rises. But in lower nodes (<65nm)
delay has inverse proportionality with temperature. With increase in temperature mobility and threshold
voltages both decreases which affects current and eventually cell delay contrarily.
Despite of decrease in mobility cell delay decreases with rise in temperature as change in threshold voltage
dominates due to lesser supply voltage. As DUT uses 16nm FinFet, it is expected to show temperature
inversion like conventional lower nodes designs. But ULVT (ultra low vth cells) cells don’t follow the same
trend as Vth itself is very small, thus doesn’t show inversion unlike other cells of higher threshold voltages
(SVT and HVT).
Experiments and Analysis shows that ULVT cells are ~6% slower at higher temperature in comparison to
lower operating temperature. This different observance demands for more corners/scenarios to deploy for
scope of complete coverage leading in higher TAT.
Scaled Resistance
M1 500
M2 500
M3 500
M4 500
M5 156.5217391
M6 143.4782609
M7 58.69565217
M8 69.56521739
M9 11.08695652
M10 12.39130435
M11 1.02173913
M12 0.608695652
M13 1
Above graph shows that for a given layer stack lower metal layer (<= metal 4) resistances are much lesser than
resistance of higher metal layers (>= Metal 4), which results in mistaken estimation by placer engine leading
major timing gaps after routing.
Optimal usage of “layer_opt” and upfront addition of pessimism helps bridging the gap.
At higher CMOS technologies including planar 28 nm technology cross-talk was an important parameter but it
was not as dominant as in 3D 16FF+ technology. The primary reason for this behavior is taller, denser &
narrower interconnects being used in 16FF+ node. This makes the crosstalk between two neighboring
interconnects in same layer much more dominant than crosstalk between two different layers interconnects.
The consequences of this significant increase in crosstalk affect QoR of the design adversely. These include:
The significant timing improvement achieved with proposed crosstalk flows has almost closed the block as
compared to the conventional route_opt mega command & it is seen across all the subchips.
For crosstalk driven flow 1 some TAT impact has been observed, which has been recovered slightly by playing
around with search and repair loops in crosstalk driven flow 2
6. DRC challenges:
The DRC challenges seen in 16FF+ nodes are escalated to a new height due to addition of Local Double
Pattern loops.
Out of 12 layers used in 16FF+ implementation, 4 layers use double pattern masks. The double pattern
masks provide the way denser interconnects but comes with a unique challenge of Local Double Pattern
(DPT) Loops DRC. The DPT issue arises when tool is not able to decide on which mask the particular metal
shape to be inserted. It also provides significant challenge while doing manual fixes if required.
MVT Swap:
Though the VT swap related issue is not uncommon in planer technologies, it presents new challenge in
16FF+ node due to size reduction of the cells. The VT swap related issues arise when a small cell gets
sandwiched between different VT type cells. The issue has been mitigated by not allowing few small sized
cells in the design.
Frozen Routes:
Use of custom clock has frozen route and prevents router from addressing the DRCs related to these routes.
Issue has been addressed by momentarily removing the freeze attribute from the custom clock nets and
doing search and repair loops to remove DRCs before signal routes.
As shown in above flow chart, the DRC challenge has been addressed by reserving extra resources from the
beginning, addressing custom clock related DRCs and at the end by employing commands signoff_drc and
autofix_signoff_drc to address DPT and other DRC violations. Also few manual fixed have been done by
keeping DPT in the mind.
1
0.8
0.6
0.4 Planar-28
0.2 16FF
0
Cell Pin Wire
Cap Cap
This gap between dynamic and leakage power means a change in methodology with power computation
shifted to slower corner.
It has been observed that primetime is highly efficient in power recovery with the following drive strength
and vt distribution before and after power recovery.
8. Conclusion:
16 Ff+ comes with advantages of power , performance and cost . The challenges due to structure change cannot be
ignored.Temperature inversion , Layer Stack Resistance and X-talk pose difficulties in timing analysis and closure . The
challenges of temperature inversion could be solved by hybrid scenario creation , and Layer_opt , Xtalk aware
optimizations could solve the issue of timing closure and timing correlation gap.
The power recovery flow was updated to adapt with dynamic power dominant technology node , and tool gave pleasant
VT distribution and drive distribution.
With thr proposed flow , we can take best out of 16 FF.
1. Benjamin M. , SNUG,2014 Santa Clara , “Low Power Design in the FinFET Technology“
2. Synopsys, ICC Manual , “[email protected] user manual”