High Performance Design Challenges On 16FF+: Mitigations and Solutions

High performance design challenges on 16FF+: Mitigations and
Solutions
Akshay Pagariya, Himanshu Gupte, Piyush Sharma, Priyank Laad
Avago Technologies
Bangalore, India
www.avagotech.com
ABSTRACT
Race to outperform & fueled by performance benefits , has drawn attention to 16FinFET+ technology node.
This has brought in new complexity and alien challenges to timing, area and power parameters of the design.
The paper explains how sudden variations in resistance (R) across metal layer stack along with coupling
capacitance (Cc) has hit the performance . It is a comprehensive study of preroute to postroute gaps , Crosstalk
analysis - fixing ,optimum power recovery implementation and congestion mitigations. This paper proposes
distinguished methodologies and features of the tool used to address the issues. The methodology has shown
improvment which means a serious business for the size /complexity of contemporary designs .
SNUG 2015 1 High performance design challenges on 16FF+: Mitigations

______________________________________________________________________________aand Solutions
Table of Contents
1. Introduction : ................................................................................................................................................. 3
2. Design stats : ................................................................................................................................................. 3
3. Congestion Issue : ......................................................................................................................................... 4
3.1 Congestion Challenge ...................................................................................................................................... 4
3.2 Congestion Mitigation ..................................................................................................................................... 4
4. Timing Difficulties: ...................................................................................................................................... 5
4.1 Temperature Inversion ..................................................................................................................................... 5
4.2 Layer Metal Stack Resistance Variation .......................................................................................................... 6
5. Cross Talk Impact ......................................................................................................................................... 8
6. DRC challenges: ......................................................................................................................................... 10
6.1 DRC mitigations ............................................................................................................................................ 11
7. Power Recovery Challenge ......................................................................................................................... 12
7.1 Implementation Approach ............................................................................................................................. 12
7.2 Tool Approach ................................................................................................................................................ 13
8. Conclusion: ................................................................................................................................................. 13
9. References ................................................................................................................................................... 14

______________________________________________________________________________aand Solutions
1. Introduction :
In last two years there has been significant developments on 16 FF . The technology has revolutionazied the structure
of transistor. This revolution has brought in new advantages of performance , power , cost. Meanwhile there were many
flow and methodology updates for physical design engineers. Still the technology suffers from challenges of congestion,
timing , DRC , power .
Similar challenges were there in earlier technology nodes also. But as we have scaled down further , and also change the
structure of the transistor , there are various factors which are new. Primarily temperature inversion[1],
layer_resistance[2] , double pattern layers(DPT) [3] takes control of these challenges. The temperature inversion[1] has
made timing analysis for the node different and difficult . Layer resitance [2] , and Crosstalk [3] has made the timing
closure a bit harder.
The traditional approaches could not adapt to the new challenges. The methodology for the discussed challenges takes
each problem conceptualy , and is based on rectifying the root cause. The approach is based on ICC and PrimeTime
platform. Gravity of issues , extent of mitigation and detailed analysis of the results is discussed. Performance and
quality of result is not impacted. There is a tradeoff of Run time with better correlation gap . The approach comes with a
limitation where the subchip owner has to be careful of caevats and pitfalls of the approach.
2. Design stats :
The subchips under study were of varied complexity . The technology used use is 16 FF+ and 12 layer metla stack is
used which is a set of two six odd and six even layers. The following table illustrates the complexity of test cases:
Cell Count 1.4 million+

Gate Count 5 million+
Macro Count 90+
Frequency 1 GHz
Characteristics 12 Layer Stack , Custom Clock
Technology 16 FF+
More information on Clock Methodology is available to Avago Employees .

The subchips were analysed for congestion , timing , DRC , power recovery difficulties.The flow and methodology
proposed were individually applied on each of subchip and the results were found to be correlating .

______________________________________________________________________________aand Solutions
3. Congestion Issue :
Congestion / congestion map is the measure of routing feasibility of implemented design .Congestion is an outcome of
two factors here in this node. Former being approach to put more performance and complexity in design and increase
the utilization. Later is reduced Routability of lower layers due to double patterning. This had increased deficit of
available vs required resources. Same can be inferred from the deisgn’s congestion map now.
3.1 Congestion Challenge
The test case taken for congestion is a design with complex logic connectivity and large cell count. The following
figure shows the congestion challenge we have faced in this design.
Congestion overflow:
28.81%(Overall)
32.8%(Horizontal)
24.9%(Vertical)
Run time : 40 Hours
Traditional congestion mitigation help to reduce overflow number a bit. But the could not make design routable.
3.2 Congestion Mitigation

This flow is a composition of flow practises and mitigation at each stage throughout implementation process . Seeing
the criticality of problem we have to plan we our mitigation from netlist level itself.
Choose DCG ( placement aware synthesised netlist )to improve

global connections. Break complex cells to improve pin
accessibility and improbe local connections.
Micro commands to have better seed for placement and

optimization.
Congestion aware placement optimization .
Placement blockage mesh , and keepout margin based on local

congestion overflow numbers by script
Do refine placement to obey keepouts and placement blockages

and legalize the placement followed by optimization

______________________________________________________________________________aand Solutions
Conngestion QOR for different mitigation strategies
Place_opt - Custom DCG/ Logic Mitigation

congestion Keepout Blockages Break Flow
Horizpntal
Overflow 28% 14% 14% 6.00% 2.40%
Vertical
Overflow 32% 18% 11% 10% 4.70%
Total
Overflow 24.90% 11% 2% 2% 0.15%
Run Time 40 Hrs 36 Hrs 34 Hrs 22 Hrs 19 Hrs
4. Timing Difficulties:
Three timing difficulties are discussed. Where temperature inversion causing issue with timing analysis and Layer
resistance and crosstalk makes timing closure a nightmare. Place to Route timing gap is one of the worst challenge one
face in 16 FF.
“Unlike 28nm planar node , a place closed design may not be closed at route.”
The experiments and flow proposed is studied on subchips of varied complexities and found to be consistent and robust.
4.1 Temperature Inversion
At higher CMOS technologies, cell delays increases when temperature rises. But in lower nodes (<65nm)
delay has inverse proportionality with temperature. With increase in temperature mobility and threshold
voltages both decreases which affects current and eventually cell delay contrarily.
Despite of decrease in mobility cell delay decreases with rise in temperature as change in threshold voltage
dominates due to lesser supply voltage. As DUT uses 16nm FinFet, it is expected to show temperature
inversion like conventional lower nodes designs. But ULVT (ultra low vth cells) cells don’t follow the same
trend as Vth itself is very small, thus doesn’t show inversion unlike other cells of higher threshold voltages
(SVT and HVT).
Observation and Mitigation:
Experiments and Analysis shows that ULVT cells are ~6% slower at higher temperature in comparison to
lower operating temperature. This different observance demands for more corners/scenarios to deploy for
scope of complete coverage leading in higher TAT.

______________________________________________________________________________aand Solutions
In DUT, hybrid scenarios having worst/best delays modelled in a way to reduce TAT without any coverage
loss.
4.2 Layer Metal Stack Resistance Variation

At lower nodes metal layer resistance varies considerably which impacts preroute resistance estimation,
especially for long nets which leads in increased buffer count at preroute and suboptimal quality of result.
Scaled Resistance
M1 500
M2 500
M3 500
M4 500
M5 156.5217391
M6 143.4782609
M7 58.69565217
M8 69.56521739
M9 11.08695652
M10 12.39130435
M11 1.02173913
M12 0.608695652
M13 1

______________________________________________________________________________aand Solutions
Observation and Mitigation:
Above graph shows that for a given layer stack lower metal layer (<= metal 4) resistances are much lesser than
resistance of higher metal layers (>= Metal 4), which results in mistaken estimation by placer engine leading
major timing gaps after routing.
Optimal usage of “layer_opt” and upfront addition of pessimism helps bridging the gap.
No Layer_opt Optimal Layer_opt
Conclusions of Lyaer_opt experiment
• This shows improved correlation in WNS numbers from place to route.

• TNS also shows similar trend across subchips.
• Buffer Area is also reduced
• Similar evidence shown by route length in higher layers.

______________________________________________________________________________aand Solutions
5. Cross Talk Impact
At higher CMOS technologies including planar 28 nm technology cross-talk was an important parameter but it
was not as dominant as in 3D 16FF+ technology. The primary reason for this behavior is taller, denser &
narrower interconnects being used in 16FF+ node. This makes the crosstalk between two neighboring
interconnects in same layer much more dominant than crosstalk between two different layers interconnects.
The consequences of this significant increase in crosstalk affect QoR of the design adversely. These include:
• Poor pre-post route timing correlation

• More SI( Delay and Noise) violations
• Degraded slow and fast corner cell delays

______________________________________________________________________________aand Solutions
Mitigation:
High crosstalk impact observed when route_opt mega command was used. Hence to alleviate the crosstalk
effect, routing has been done in stages – global, track and detail with crosstalk aware track optimization. The
crosstalk aware track optimization provides better seed for detail routing in terms of crosstalk. Proposed Flows
to address crosstalk
X-talk Flow Flow1
The significant timing improvement achieved with proposed crosstalk flows has almost closed the block as
compared to the conventional route_opt mega command & it is seen across all the subchips.
For crosstalk driven flow 1 some TAT impact has been observed, which has been recovered slightly by playing
around with search and repair loops in crosstalk driven flow 2

______________________________________________________________________________aand Solutions
Consistent results were observed across sub chips of varied complexities
6. DRC challenges:
The DRC challenges seen in 16FF+ nodes are escalated to a new height due to addition of Local Double
Pattern loops.
Double Pattern Loops:
Out of 12 layers used in 16FF+ implementation, 4 layers use double pattern masks. The double pattern
masks provide the way denser interconnects but comes with a unique challenge of Local Double Pattern
(DPT) Loops DRC. The DPT issue arises when tool is not able to decide on which mask the particular metal
shape to be inserted. It also provides significant challenge while doing manual fixes if required.
MVT Swap:
Though the VT swap related issue is not uncommon in planer technologies, it presents new challenge in
16FF+ node due to size reduction of the cells. The VT swap related issues arise when a small cell gets
sandwiched between different VT type cells. The issue has been mitigated by not allowing few small sized
cells in the design.
Frozen Routes:
Use of custom clock has frozen route and prevents router from addressing the DRCs related to these routes.
Issue has been addressed by momentarily removing the freeze attribute from the custom clock nets and
doing search and repair loops to remove DRCs before signal routes.

______________________________________________________________________________aand Solutions
6.1 DRC mitigations
As shown in above flow chart, the DRC challenge has been addressed by reserving extra resources from the
beginning, addressing custom clock related DRCs and at the end by employing commands signoff_drc and
autofix_signoff_drc to address DPT and other DRC violations. Also few manual fixed have been done by
keeping DPT in the mind.

______________________________________________________________________________aand Solutions
7. Power Recovery Challenge
In planar nodes, leakage power was dominant with large focus on low leakage designs. However shift to 3d
Finfet has changed that and has brought new challenges in terms of power. Finfet input capacitance is higher for
the same gain compared to its planar counterpart, owing to its 3 dimensional structure. This combined with
lower leakage due to greater gate control, leads to designs which are dynamic power dominant. It is observed
that the gate capacitance is greater than wire capacitance when compared to planar 28 nm in which it was
opposite as demonstrated in the below graph.
Normalized pin capacitance and wire capacitance
1
0.8
0.6
0.4 Planar-28
0.2 16FF
0
Cell Pin Wire
Cap Cap
This gap between dynamic and leakage power means a change in methodology with power computation
shifted to slower corner.
7.1 Implementation Approach

To control dynamic power in design through physical implementation, experiments suggest that there is a
need to control the input pin capacitance. This can be achieved by merging simple cells to complex cells.
This should be done keeping an eye out for congestion which may increase owing to higher pin density, in
which case keepouts may be necessary for such cells. Apart from complex cells, usage of register banks
also helps in reducing overall dynamic power in designs.
Also, to help the implementation tool, it is suggested to use full variety of drive strength and vt cells which leads to
lesser area and very healthy drive strength distribution

______________________________________________________________________________aand Solutions
7.2 Tool Approach
In order to recover power in primetime, priority should be given to dynamic power. As a suggested flow
following can be used:
It has been observed that primetime is highly efficient in power recovery with the following drive strength
and vt distribution before and after power recovery.
8. Conclusion:
16 Ff+ comes with advantages of power , performance and cost . The challenges due to structure change cannot be
ignored.Temperature inversion , Layer Stack Resistance and X-talk pose difficulties in timing analysis and closure . The
challenges of temperature inversion could be solved by hybrid scenario creation , and Layer_opt , Xtalk aware
optimizations could solve the issue of timing closure and timing correlation gap.
The power recovery flow was updated to adapt with dynamic power dominant technology node , and tool gave pleasant
VT distribution and drive distribution.
With thr proposed flow , we can take best out of 16 FF.

______________________________________________________________________________aand Solutions
9. References
1. Benjamin M. , SNUG,2014 Santa Clara , “Low Power Design in the FinFET Technology“
2. Synopsys, ICC Manual , “[email protected] user manual”

______________________________________________________________________________aand Solutions

High Performance Design Challenges On 16FF+: Mitigations and Solutions

Uploaded by

Copyright:

Available Formats

High Performance Design Challenges On 16FF+: Mitigations and Solutions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

High Performance Design Challenges On 16FF+: Mitigations and Solutions

Uploaded by

Copyright:

Available Formats

High performance design challenges on 16FF+: Mitigations and

Akshay Pagariya, Himanshu Gupte, Piyush Sharma, Priyank Laad

SNUG 2015 1 High performance design challenges on 16FF+: Mitigations

SNUG 2015 2 High performance design challenges on 16FF+: Mitigations

Cell Count 1.4 million+

More information on Clock Methodology is available to Avago Employees .

SNUG 2015 3 High performance design challenges on 16FF+: Mitigations

3.1 Congestion Challenge

3.2 Congestion Mitigation

Choose DCG ( placement aware synthesised netlist )to improve

Micro commands to have better seed for placement and

Congestion aware placement optimization .

Placement blockage mesh , and keepout margin based on local

Do refine placement to obey keepouts and placement blockages

SNUG 2015 4 High performance design challenges on 16FF+: Mitigations

Place_opt - Custom DCG/ Logic Mitigation

4.1 Temperature Inversion

Observation and Mitigation:

SNUG 2015 5 High performance design challenges on 16FF+: Mitigations

4.2 Layer Metal Stack Resistance Variation

SNUG 2015 6 High performance design challenges on 16FF+: Mitigations

No Layer_opt Optimal Layer_opt

Conclusions of Lyaer_opt experiment

• This shows improved correlation in WNS numbers from place to route.

SNUG 2015 7 High performance design challenges on 16FF+: Mitigations

• Poor pre-post route timing correlation

SNUG 2015 8 High performance design challenges on 16FF+: Mitigations

SNUG 2015 9 High performance design challenges on 16FF+: Mitigations

Double Pattern Loops:

SNUG 2015 10 High performance design challenges on 16FF+: Mitigations

SNUG 2015 11 High performance design challenges on 16FF+: Mitigations

Normalized pin capacitance and wire capacitance

7.1 Implementation Approach

SNUG 2015 12 High performance design challenges on 16FF+: Mitigations

SNUG 2015 13 High performance design challenges on 16FF+: Mitigations

SNUG 2015 14 High performance design challenges on 16FF+: Mitigations

You might also like