Parallel Per-Critical-Clock (PPCC) Logic Synthesis and Netlist Fusion For Best PPA and Convergence

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Parallel Per-Critical-Clock (PPCC) Logic Synthesis and netlist

fusion for best PPA and convergence

Sandip Sar, Amrit Amresh, Joby Abraham

Qualcomm
Bangalore,India

https://www.qualcomm.com

ABSTRACT
For complex designs with multiple high frequency clocks logic synthesis tools today are unable
to achieve best possible PPA (Performance-Power-Area), convergence and runtime due to its
limitation in methodology. This paper will discuss these present limitations in the logic
synthesis methodologies available. As a solution to this we present a logic synthesis framework
called PPCC (parallel per-critical-clock) synthesis to break down the problem to pieces and
introduce parallelism : we initiate multiple PPCC synthesis runs to achieve per-critical-clock
best PPA. At the end multiple netlists are blended together to create final synthesized netlist.
SNUG 2019

Table of Contents
1. Introduction – How Logic Optimization happens today...................................................................................... 3
2. Problem in th epresent methodology .......................................................... Error! Bookmark not defined.
2.1 Design exploration quits without achieving the best PPA…………………………………………………3
2.2 Design convergence challenge for complex design…………………………………………………….……3
2.3 High optimization runtime…………………………………………………………………………………………….3
3. 3. PPCC solution introduced………………………………………………………………………………………………………4
3.1 Identification of critical clock candidates……………………………………………………………………….4
3.2 Identification of corresponding submodule scopes…………………………………………………………4
3.3 Initiate multiple PPCC optimization……………………………………………………………………………….4
3.4 Netlist fusion…………………………………………………………………………………………...……………………4
4 Superiority of th eproposed solution………………………………………………………………………………………….5
4.1 Better Timing………………………………………………………………………………………………….…………….5
4.2 Better Area……………………………………………………………………………………………………….…………..5
4.3 Reduced optimization runtime………………………………………………………………………….…………...5
4.4 Predictability of solution and faster design convergence…………………………………….…………..6
5. Experiments and Results…………………………………………………………………………………………………………..6
6. Conclusion………………….………………………………………………………………………………………………………………………..9
7. References……………………………………………………………………………………………………………………………….9

Table of Figures
Figure 1. PPCC flow chart...................................................................................................................................................... 5

Table of Tables
Table 1. Frequency Targtes. ................................................................................................................................................ 6
Table 2. Timing and runtime results for default synthesis. ................................................................................... 7
Table 3. Timing and Runtime result for CLKa PPCC……………………………………………………………………….7
Table 4. Timing and Runtime result for CLKb PPCC……………………………………………………………………….7
Table 5. Timing and Runtime result for CLKc PPCC……………………………………………………………………….7

Page 2 PPCC Logic synthesis methodology


SNUG 2019

1. Introduction – How logic optimization happens today


Logic synthesis optimization becomes tricky for complex designs when multiple high frequency
clocks have challenging critical timing paths to meet. Optimization tool creates separate path-groups
for all clocks individually and optimizes them sequentially. Tool creates a global optimization cost
function (GOC) which reflects the total violation in design in terms of different optimization targets
like performance, area, leakage power, dynamic power and also design rule costs (DRC) like
max_capacitance, max_fanout, max_transition etc. Tool uses reduction in the cost function (GOC) to
qualify optimization techniques. For e.g. if optimization method 1 reduces total cost from 1000 to
500 and method 2 reduces optimization cost from 1000 to 800, method 1 is preferred. For each path
group tool brings on optimization techniques to bring down this GOC value. When tool does not find
any further scope of improvement possible to GOC, it moves to the next path group. So this GOC is
the an important determining factor of the optimizations taken up and the final quality of the
synthesis netlist.

2. Problem in the present methodology


Above methodology falls short in 3 aspects for complex designs with multiple high frequency clocks.
1) It fails to achieve the best possible PPA (Performance-Power-Area)
2) Longer time and lesser predicatibilty towards design convergence
3) Longer Synthesis runtime.

2.1 Design exploration quits without achieving best PPA


Global optimization cost (GOC) gets cumulatively accumulated for an individual path-group if the
PPA (Performance-Power-Area) achieved fall short of PPA target within that group. This increased
cost function becomes a burden for the synthesis tool when it moves to optimize logic on a different
clock domain. In the end, this cost function becomes so large that an individual technique may not
bring about a substantial reduction to the total optimization cost and get dropped. As a result the tool
gives up on optimizing these paths and outputs a sub-optimized netlist.

2.2 Design convergence challnge for complex design


One of the recipes of further optimizing critical paths is to isolate the problematic logic to a new path
group. Even we can put higher weightage to the path groups which work as a directive to the
optimization tool to prioritize that path group further above other clock groups. Putting extra
weightage on one critical clock improves PPA on that clock domain, often deteriorates PPA of another
competitive critical clock domain. To reach a unified synthesis recipe for a multiple-critical clock
design it is needed to combine the solutions for individual clock groups and then giving them
proportionate weights or importance. However this is an empirical step which needs multiple
iterations and considerable convergence time. This convergergence time/effort gets manyfolded
with the increase in the number of critical clocks in design. Also there is no predictability attached
about the iterations required to close a complex design because of this.

2.3 High optimization runtime :


As optimization happens sequentially per critical path clock, synthesis runtime increases
exorbitantly in case there are multiple competing critical clock domains. This becomes a practical
hindrance towards design convergence especially when multiple synthesis iterations are required.

Page 3 PPCC Logic synthesis methodology


SNUG 2019

3. PPCC solution introduced


This paper presents a logic synthesis framework called PPCC (parallel per-critical-clock) synthesis to
break down the problem to pieces and introduce parallelism to address the three-fold problems
discussed above. This methodology works towards reducing the global optimization cost factor
(GOC) to ease optimization tool for a better job. Sections below describes the proposed flow in details.

3.1 Identification of critical clock candidates


The flow identifies all the competitive critical clocks in the design which run asynchronous to each
other and their corresponding submodule scopes. The clock candidates should be judiciously chosen
as the number of child PPCC run increases with the increase of number of critical clocks identifiled.
Say one design has total “T” number of clocks in the design, where “N” number of clocks are
asynchronous to each other and have timing critical paths, then these “N” critical clocks are qualified
primarily for PPCC runs.

3.2 Identification of corresponding sub-module scopes


The second input to the flow is identification of corresponding submodules in the design which are
clocked by the identified “N” critical clocks as mentioned in section 3.1. If one submodule hierarchy
is solely clocked by any of the “N” critical clocks the submodule gets selected. If two or more of the
critical clocks interact with in a particular submodule, then the particular submodule becomes
optimization scope for each of the interacting clocks. Let’s create a scenario here.Design has 3 critical
clocks CLK1, CLK2 and CLK3 which are asynchronous to each other. SubDes1 is solely clocked by
CLK1 and SubDes2 gets both the clocks CLK2 and CLK3. In this scenario, SubDes1 will be qualified
under CLK1 PPCC run and SubDes2 will be qualified under both CLK2 and CLK3.

3.3 Initiate multiple PPCC synthesis optimization


Once critical clocks and their sub-module scopes are identified, flow initiates multiple PPCC synthesis
runs where only one critical clock (in some cases may be more than one) is given priority and other
critical competitive clocks are deprioritized (by means of relaxing PPA targets or putting lesser
weightage on those clock domains). Boundary optimization will be turned off on the targeted sub-
modules to stop any alteration to the interface of the logic boundary as well as optimization across
its logic boundary (logic sharing, inversion push etc.) for individual PPCC synthesis run. Ofcourse
inside an identified submodule tool is free to do all sort of required optimizations. For a placement
aware synthesis extra input of phyical bounds need to be provided. One has to make sure the different
bounds for different submodules are non-overlapping in nature. This will make sure the targeted
submodule logic from different clock domains don’t overlap physically with each other in the
resultant netlist for the given floorplan.

3.4 Netlist fusion


Once all the PPCC runs are complete the flow will extract the targeted sub-designs from the individual
PPCC netlists and blend them at the top level to produce the final synthesized netlist. Say netlist1 is
optimized for CLK1 which runs into SubDesA, SubDesB and netlist2 is optimized for both CLK2 and
CLK3 which run into SubDesC. In that case flow will extract SubDesA, SubDesB from netlist1 and
SubDesC from netlist2 and stitch them together to create the Top level final netlist. The following
flow chart depicts the proposed flow.

Page 4 PPCC Logic synthesis methodology


SNUG 2019

Figure 1 : PPCC flow chart

4. Superiority of proposed solution


4.1 Better Timing
PPCC (Parallel Per-Critical-Clock) synthesis runs will give priority to only one (or selected) critical
clock(s) and all other critical competitive clocks will be deprioritized either by reducing their PPA
targets. All other critical clocks will meet their PPA targets easily and there will be negligible or no
deterioration on the total optimization cost incurred by the other critical clocks. This will ensure
concentrated optimization effort from synthesis engine solely for the targeted clock. Hence the
individual netlists will have the better timing closure for the targeted critical clocks.

4.2 Lesser Area


For the same logic stated in section 4.1, PPCC will ensure the best possible area footprint for the
targeted sub-modules. When the final top netlist is written out stitching individual submodules the
netlist maintains the best possible area footprint.

4.3 Run time reduction


All the PPCC synthesis optimizations take place in parallel, eventual synthesis runtime in this
methodology is equal to the longest of all the PPCC runtimes. As individual PPCC run solves only one

Page 5 PPCC Logic synthesis methodology


SNUG 2019

clock domains problem, individual PPCC runtime significantly improves w.r.t multiple-clock
synthesis runtime . Thus eventually we achieve a significant improvement in runtime.

4.4 Predicatibility of the solution and faster design convergence


Predictability of design convergence goes down when optimization recipe of one clock detoriates
optimization of other clock domains. One can not be sure upfront if fixing some paths in one clock y
moving to PPCC methodology, tool solves one problem at a time. As other clocks don’t have any
interference in it there is much more predicatibily w.r.t convergence

5. Experiments and Results


We picked up a design with 2.3M instance count with 3 completing clocks CLKa , CLKb and CLKc.
Here are the frequency targets and register count under these clocks :

Frequency Register Count


CLKa 400 72K
CLKb 430 126K
CLKc 400 7K
Table 1 : Frequency targets

Here CLKa, CLKb and CLKc are asynchronous to each other and the design has distinct hierarchies
SubDESa, SubDESb and SubDESc which work under CLKa, CLKb and CLKc respectively. Here 3 PPCC
synthesis runs were invoked.
1) First PPCC synthesis run was targeted for CLKa clock and SubDESa sub-module scope; where
CLKa was maintained at the targeted frequency and CLKb and CLKc were lowered to 100MHz.
Boundary optimization across SubDESa was turned off and its interfaces were kept intact.
2) Second PPCC synthesis run was targeted for CLKb clock and SubDESb submodule scope; here
CLKb was maintained at the targeted frequency and CLKa and CLKc were lowered to 100MHz.
Boundary optimization across SubDESb was turned off and its interfaces were kept intact.
3) Third PPCC synthesis run was targeted for CLKc clock and SubDESc submodule scope. CLKc
was maintained at the targeted frequency and CLKa and CLKb were lowered to 100MHz.
Boundary optimization across SubDESc was turned off and its interfaces were kept intact.
4) At the end, top netlist was created extracting and choosing SubDESa from first netlist,
SubDESb from second netlist and SubDESc from third netlist.

We are presenting relevant tables an graphs which compare the timing results, logic area and
compile runtime between PPCC netlist and the normal/default synthesis.

Page 6 PPCC Logic synthesis methodology


SNUG 2019

Frequency WNS (ns) FEP Runtime


CLKa 400 -0.276 6751
CLKb 430 -0.353 12521 510 minutes
CLKc 400 -0.076 384
Table 2: Timing and Runtime result for default synthesis run

Frequency WNS (ns) FEP Runtime


CLKa 400 -0.182 4211
CLKb 100 0 0 430 minutes
CLKc 100 0 0
Table 3 : Timing and Runtime result for CLKa PPCC

Frequency WNS (ns) FEP Runtime


CLKa 100 0 0
CLKb 430 -0.213 6987 465 minutes
CLKc 100 0 0
Table 4 : Timing and Runtime result for CLKb PPCC

Frequency WNS (ns) FEP Runtime


CLKa 100 0 0
CLKb 100 0 0 415 minutes
CLKc 400 -0.006 17
Table 5 : Timing and Runtime result for CLKc PPCC

Page 7 PPCC Logic synthesis methodology


SNUG 2019

Chart 1 : WNS comaparison b/w Default and PPCC synthesis

Chart 2 : Logic area comaparison b/w Default and PPCC synthesis

Page 8 PPCC Logic synthesis methodology


SNUG 2019

Chart 3 : Runtime comaparison b/w Default and PPCC synthesis

Overall Runtime of the PPCC synthesis is the maximum of all individual PPCC runtimes. In this
example compile time of CLKB PPCC runtime is maximum (465 minutes). So overall PPCC runtime
can be deemed to be 465 minutes here.

6. Conclusions
PPCC was experimented on couple of other cores also, everywhere this methodology is responding
the similar way. PPCC netlist achieved better timing , area as well as compile runtime. Apart from
that design convergence and predicatability of convergence increases significantly with PPCC. This
PPCC benefit will be amplified for when the complexity of the design grows.

7. References
[1] Solvnet
[2] VLSI forum

Page 9 PPCC Logic synthesis methodology

You might also like