Li 3
Li 3
Li 3
Abstract
The demand for high-performance electronic gadgets has increased two-folds in the last decade, fueling technology
manufacturers to shrink fabrication node sizes. The decreasing channel sizes along with an increase in gate count and cell
density pose numerous congestion issues during physical implementation of the chips, making design closure ever more
difficult. Double Data Rate (DDR) memories that access data on both edges of the clock cycle require extreme timing control
and must meet the strict timing requirements during Physical Design (PD). Floor-plan, being the first stage of back-end
PD implementation, is an important step to mitigate congestion and timing issues during the subsequent stages of the
implementation. On-chip macros, with connections to the standard cells and the Input/Output (IO) ports of the chip, need
to be strategically placed during the floor-plan of the design to enable congestion-free placement of standard cells and
signal routes. Previously, designers opted for island macro placement strategy, wherein macros were grouped close together,
thereby leaving a uniform square region for standard cell placement. However, this method alone cannot be considered for
chip designs today that has denser macro pin connections to the chip IO ports as in the Last Level Cache (LLC) block of a
DDR subsystem. In this paper, two new placement strategies have been considered – peripheral and donut, for the LLC
module. A congestion-optimized, floor-plan to Place and Route (PNR) flow methodology has been presented for each of
these placement strategies using Cadence Innovus Implementation System and Synopsis IC Compiler II. The Quality of
Results (QOR) for each strategy was then compared. The peripheral macro placement strategy is found to be best among
the three, while the donut macro placement is the worst. A 16% improvement in the overall on-chip delay is seen in the
peripheral macro placement when compared to island macro placement. Furthermore, a 19.6% power reduction is observed
in the peripheral macro placement strategy as compared to island macro placement. The overall congestion for peripheral
macro placement is 0.32%, which is the least among the three strategies. Hence, the peripheral macro placement strategy
proves to be the best choice for macro placement, when considering floor-plan for the LLC module in a DDR subsystem.
Keywords
Double data rate, Physical design, Floor-plan, Macro placement, Island, Peripheral, Donut, Congestion.
usage of wearable sensors and smart devices [2]. The Integration (VLSI) system's multifaceted nature
memory requirements for such devices are stringent increments instantly, physical planning is getting
due to real-time data access requirements, imposing increasingly troublesome [6].
strict timing constraints. The usage of large-scale
access control lists in IoT applications [3], furthers the Various floors-plan techniques have been explored in
need for high-speed data access solutions. the past, such as a partition level floor-plan method to
understand the in-depth structure of the block to
The DDR memory subsystem consists of the decide floor-plan and obtain better timing [7].
controller, physical interface, and Input/Output (I/O)
drivers. One of the important blocks within the DDR Challenges such as large design sizes, increasing
is the Last Level Cache (LLC). The LLC is a macro count, timing/power estimations, region
standalone memory inserted between the external shaping and pin assignment, predefined placement
memory and functional blocks to provide another level locations, macro-orientations, and pin positions,
of cache. The LLC is the last memory level to be simultaneous standard cell and macro placement,
checked on-chip before moving to fetch data from congestion, and timing-driven placement is increasing
external memory. The time to access data from an off- for a floor-plan designer [8].
chip memory is very high, hence the LLC, acting as a
buffer cache, helps to reduce data fetch off-chip. Since Macro placement is a crucial step to obtain
the DDR is at the interface of the chip and off-chip congestion-free designs at the later stages of PD flow.
memory, extreme timing control is required to ensure The placement of standard cells, which is done by the
the correct functioning of DDR, requiring several placement tool, ideally requires a uniform square
hardware components and algorithms to facilitate this region on-chip, to perform an optimum placement. To
complex design. Numerous architectural satisfy this requirement, designers initially employed
optimizations such as deep pipelines, branch the island macro placement configuration for a floor-
prediction, and aggressive reordering aim to provide plan designs, which groups all macros in one corner of
high performance [4]. The substantial research carried the chip to provide such a uniform region for standard
out to improve the efficiency of this subsystem, has cells. However, as the macro pin connection to IO
increased the gate level complexity and power ports of the chip grows denser, this method is
consumption of this subsystem. A 33% power inefficient and often leads to more congested designs.
consumption of the LLC and Dynamic Random- Therefore, there is a need to explore different macro
Access Memory (DRAM) alone is observed in the placement strategies to avoid such congested designs.
DDR subsystem [5]. The high gate density and critical In this paper, two new macro placement strategies,
timing due to the physical interface with the off-chip peripheral macro placement, and donut macro
memory needs to be taken care of at the PD placement have been explored for the LLC block, and
implementation level. complete congestion optimized Place and Route
(PNR) flow for each of these has been implemented
Physical Design (PD) implementation is a back-end using Cadence Innovus Implementation Systems and
flow from the net-list to Graphic Data Stream (GDS) Synopsis IC Compiler II. The various inbuilt settings
and is the correlation step between design and chip of these powerful tools have been leveraged to
manufacture. The PD flow ensures that the design optimize congestion and improve timing integrity
created works on the silicon chip. Numerous problems throughout the PNR flow. The Quality of Results
arise when the design is converted to one, which can (QOR) of the three macro placement strategies was
function properly on silicon. The PD flow first then compared to arrive at the best choice for macro
involves proper planning and placement of pins, and placement for LLC modules in DDR subsystems.
custom macros on the chip during the floor-plan stage.
Next, the placement of the logic on the chip is 2.Literature review
performed along with the introduction of the clock and A detailed study of state-of-the-art architectures of
power distribution network. The design is then routed DDR and LLC was carried out. Various developments
and checked for various parameters such as power, in the architecture of both blocks have been explored
area, and performance. The congestion and timing in the past decade to reduce the latency and power of
requirements need to be met during PD to facilitate the each. These developments give an idea to appreciate
correct functioning of the design. All blocks of the the complexity involved during the PD process to
chip need to be implemented and tested according to ensure the proper functioning of the blocks. Numerous
this PD implementation flow. As the Very Large-Scale floor-plan techniques have also been developed in the
904
International Journal of Advanced Technology and Engineering Exploration, Vol 8(80)
past to improve congestion and timing of the blocks in throughput. A sensitivity analysis to estimate the
during back-end flow. signal and power integrity of a PDN for a DDR is
presented in [13]. A synthesized Resistance-
A Power Delivery Network (PDN) in a Re- Inductance-Capacitance (RLC) model is proposed to
Distribution Layer (RDL) for a DDR memory perform model extraction instead of the Computer-
subsystem is presented in [9]. It is observed that Aided Design (CAD) layout model extraction. The
decreasing the PDN loop inductance is critical for a model is created using self and transfer impedance
robust high-speed PDN design. The loop inductance equations that can be incorporated into an algorithm.
depends on the wire width and length. Hence, the wire The models are created quickly and efficiently and
parameters used for the power distribution network match very closely to the CAD layout extraction
must be altered. A voltage ripple reduction between models. The models are passive and causal, and
the Power/Ground (PG) rails is done by a simple PDN correlation is good for both frequency and time
model. The voltage ripple reduction is caused by domains. The above method produces faster analysis
opting for symmetric PG PDN structure and a unity results while maintaining the accuracy of the CAD
PG ratio is a must for maintaining the power integrity layouts.
of the design along with keeping signal integrity at an
optimum level. The DDR memory controller is the With advancements in DDR, rapid advancement in
brain of the entire DDR subsystem, hence, an LLC technology has also taken place to keep up with
optimized controller design as discussed in [10] can fast-growing memory needs. Several developments in
improve the performance of the overall subsystem. All LLC have taken place in the past decade. One such
commands like read/write access and pre-charge development is in stacking technology. The increased
commands were tested and verified. The verification parallelism in LLC has resulted in opting for 3D
was done on System Verilog to provide high coverage stacking as compared to the traditional 2D stacking.
for the code to make sure the perfect functioning of the However, the leakage power is seen to increase greatly
block. The controller was designed to generate timing due to dense 3D integration [14]. A novel hybrid
and control signals to synchronize the command reconfigurable architecture for LLC was proposed.
operations. The drawback of the above design is an The new design combines SRAM along with Spin
increased number of buffers that are inserted. The Transfer Torque (STT) SRAM technology to
inserted buffers result in an extra delay in the data dynamically reduce power at runtime by restoration
paths, which severely affect the timing closure of the and duplication. The power is seen to reduce by 98.4%
designs. as compared to the traditional design. A cache-
partitioning algorithm is used to efficiently divide the
The power rail noise limit is determined by the DDR LLC block among the different processors. A novel
to interface current profile and PDN impedance [11]. method to partition cache using Non-Volatile
The dynamic behavior of the memory subsystem Memories (NVM) instead of SRAM is presented in
greatly increases the power rail noise due to the sudden [15]. The cache is periodically portioned in such a
charge and discharge of current through the Static way, to assign heavily accessed ways to low accessed
Random-Access Memory (SRAM) cells. On-die PDN partitions, thereby distributing the access to the entire
is studied using the solution space analysis, wherein LLC block.
the power rails are decomposed into lumped on-die
capacitors and effective series resistance. Different Sakhare et al. [16] presented the replacement of
currents and voltages are applied to emulate the SRAM-based LLC with STT Magnetic Random
various operating conditions to estimate the overall Access Memory (MRAM) based LLC, due to the
voltage drop. The analysis shows that higher limited scaling capability of SRAM. The STT-MRAM
capacitance and low series resistance lowers the based design proves to provide larger energy gains and
voltage drop. A design of freeway Network On-Chip low access latency. Two more designs, Compressed
(NoC) is proposed in [12] which routes flits on DDR Tag (CT) cache [17] and data shepherding [18] are
and allows bypass pipelining. Pipeline bypassing presented to manage larger LLC blocks. The
reduces the packet latency at a low traffic load. The developments on DDR and LLC have increased the
routing is done in such a way that only flits moving cell level complexity and timing criticality during PD
straight can pass through the bypass pipeline. In flow. The larger number of logic cells that was inserted
smaller networks, the freeway latency is found to be to optimize the design in terms of power, results in a
49% higher than short-path, but in large networks, the highly congested placement if care is not taken to
freeway-NoC latency is 5% lower with a 23% increase prepare the floor-plan. Several floor-plan and macro
905
J. Fadnavis and Kariyappa B.S.
placement techniques have been explored to enable machine learning model to decide the optimum macro
congestion-free standard cell placements and placement for giving floor-plan specifications.
overcome the higher logic density challenges.
The above macro placement techniques, while
A macro placement algorithm for regular placement of accounting for the connections between macros and
macros is presented in [19]. Macros and standard cells standard cells, do not account for the connections
are clustered together in advance according to the between macros and I/O ports of the chip. With
connections between them, creating different modules such as the LLC, the macros majorly have
hierarchies of macros. The macros are then legalized connections to the I/O ports of the chip. These
to obtain an efficient floor-plan. The simulated connections are of utmost importance in an LLC
annealing algorithm combined with the corner module, as these ultimately interfaces with the off-
stitching algorithm is explored for macro placement in chip memory. Moreover, the above algorithms are
[20]. This method is effective to refine the placement provided for a full-custom PNR flow, where macros
of standard cells along with macros according to the and standard cells are simultaneously placed, which
placement regions defined by the algorithm. A requires such automated algorithms. However, since
clustering algorithm for standard cells and macros the LLC module is developed as a semi-custom
built as a tree from the design hierarchy during design, the macros are placed first, followed by
synthesis is presented in [21], allowing the algorithm standard cells. In this paper, two macro placement
to consider the indirect connectivity of macros to the strategies are presented for semi-custom flow that
standard cells. This method is best used when the takes into consideration the connections of macros to
placement of macros and standard cells is done the standard cells as well as the I/O ports of the chip.
simultaneously.
3.Methods
A novel multi-level algorithm that considers the 3.1DDR subsystem
Register Transfer Logic (RTL) connections between The System-on-Chip (SoC) design needs to interface
macros and standard cells is discussed in [22]. The with the off-chip memory as shown in Figure 1. The
synthesis net-list is divided based on dataflow memory subsystem is shared and must respond to
hierarchy and a cost function is evaluated to optimize numerous requests from multiple cores, each having
the wire length and timing of the connections. The its latency and bandwidth requirements. The
proposed algorithm enables easy timing and Design processor, along with the Graphics Processing Unit
Rule Check (DRC) closure. The amount of impact that (GPU) and Digital Signal Processor (DSP), interacts
the macro placement has on the congestion of the with the memory. To decrease the memory access
design is assessed in [23]. Two different macro times, the DDR subsystem acts as an interface between
placement strategies take into consideration and the the processors and the memory. The DDR enables
impact at each PD stage is evaluated. The congestion memory access on both edges of the clock cycle as
and QOR are observed at every step to assess the effect compared to the traditional memory systems accessing
of macro placement. To ensure the timing closure of data on only one clock edge. One of the blocks in the
the design, several manual optimizations are required DDR subsystem is the LLC. The LLC acts as an
to meet the setup and hold times. Different methods to additional cache memory apart from the L1 and L2
fix the setup and hold time are given in [24]. These caches. The LLC was added as an attempt to further
methods provide a robust timing closure method to reduce the memory access times by reducing the
obtain minimal DRCs during the sign-off phase of PD frequency of data access that is off-chip.
implementation. Various algorithms exist to group and
slice up the cell into the gate level net-list according to The DDR subsystem has been increasingly employed
parameters such as maximum interconnect length and in applications such as satellite navigation [27].
logical depth. One such algorithm is the Genetic and However, the physical interface and high-speed data
Simulated Annealing (GSA) algorithm [25], which is access, impose tight PD constraints on the module.
used to define weight values for different cells while Such new architecture furthers the need to meet timing
clustering them to perform an efficient placement. The requirements in all extreme corner cases to ensure the
macro placement has further been explored as a fully proper functioning of the memory interface across
automated solution using machine learning models in several environmental conditions. Hence, a detailed
[26]. Several floors-plans with different macro PD implementation is required to ensure the working
placements have been provided to build a robust of this subsystem.
906
International Journal of Advanced Technology and Engineering Exploration, Vol 8(80)
3.2Overall methodology The final placement and optimization, then take place,
The PD implementation starts with importing the gate followed by the legalization to snap to the
level net-list file, Synopsis Design Constraints (SDC) manufacturing grid. Next, the clock specifications
file, physical and technology files, timing liberty need to be defined to lay out the clock network on the
modules, and power intent file. The back-end flow, chip. To form the multi-clock tree, clock drivers are
then begins from the floor-plan stage. The first step is created followed by clock straps generation. Once the
to define the core utilization and the aspect ratio as clock mesh is ready, the global clock tree is built and
seen in Figure 2. The core to IO boundary is then checked. Next, the clock mesh is routed and the entire
decided to snap the corners of the instance grid. Next, clock tree is synthesized and legalized.
the pin placement on the boundary is done based on
the inputs from the top-level hierarchy module, and The routing begins with routing clock and certain
appropriate layers are assigned to the pins. Macro critical nets. Next, the secondary power grid mesh is
placement is carried out using the three different connected. The global route is then performed, where
strategies – island, peripheral, and donut. The various approximate routes are assigned and coarse congestion
placement blockages are then defined to ensure the is calculated. The track assignment is the step where
cleanliness and congestion-free placement of standard the tracks of different routing layers are assigned to the
cells. Placement regions are further defined grouping global routes. After the track assignment, violations
similar logic hierarchy cells together. The physical- may exist which are resolved during the detailed
only cells are then placed all over the core area. placement stage. Post-route optimization is performed
Finally, power rings and straps are generated based on to fix congestion and legalize the routing. Once
the power intent of the design. A sanity check on the routing is complete, the sign-off checks consisting of
floor-plan is performed to ensure a clean design before timing, congestion, area, and power analysis are
placement. performed. The setup and hold timings are fixed based
on timing reports generated, by size or replacing
The standard cell placement is performed by the tool buffers and inverters. The DRC checks are performed
using several inbuilt algorithms. The placement begins to make sure the design is ready for manufacture and
with the initial coarse placement that places the involves metal filling and Engineering Change Order
standard cells randomly according to the space (ECO) fixes.
available. This information is then used to perform
optimization, to adjust cells to reduce congestion and The above methodology is followed for each of the
meet timing. The next step is the refine incremental three different macro placement strategies along with
placement, wherein small perturbations are carried out leveraging the various tool options provided by
iteration-by-iteration to optimize the design further. Synopsis IC Compiler II, which are employed to
907
J. Fadnavis and Kariyappa B.S.
implement power and performance-optimized design each of the macro placement strategies were observed
for each. The timing, power, and congestion values for and analyzed at each stage.
908
International Journal of Advanced Technology and Engineering Exploration, Vol 8(80)
909
J. Fadnavis and Kariyappa B.S.
910
International Journal of Advanced Technology and Engineering Exploration, Vol 8(80)
3.3.7 Addition of physical-only cells 5. Specifying routing resources - The minimum and
The final stage of the floor-plan is the addition of the maximum routing layers globally, and for specific
physical-only cells. These cells have no logical nets can be set. Layers to be ignored for Resistance-
functionality. End-Cap cells are added at the edges of Capacitance (RC) estimation during optimization
the core and around macros to protect the diffusion and are also set using the set_ignored_layers command.
poly layers during lithography. Well-Tap cells are 6. Defining placement bounds - Placement bounds
added to provide the substrate of the transistors with are of move and group type. Move bounds have a
appropriate well- voltage for proper functionality. fixed location and boundary, whereas group types
Another type of cell added is tie-cell which is used to have a fixed boundary. The bounds are set to group
tie a wire to either logic 1 or 0. These are used to similar logic level cells to reduce wire length time.
interface between powered down and always-on 7. Enable power optimization - Dynamic power
power domains. optimization is enabled for the design by using the
command set_scenario_status-dynamic_power
3.4Place and route methodology true.
3.4.1 Placement preparation 8. Enabling congestion driven placement - The
After the floor-plan is completed, the floor-plan congestion effort can be controlled by,
specifications are written onto a Design Exchange set_app_options -name place_opt.congestion.effort
Format (DEF) file. The inputs to the placement tool -value high.
are the gate-level net-list, floor-plan DEF file, power 9. Enable global route estimation - The optimization
intent file, timing module files, and reference library engine makes use of a virtual route to estimate wire
files. Sanity checks are performed on the floor-plan length for timing fixing. Global routing gives a
file and gate-level net-list. The power intent file is more accurate estimate of the wire length, but
checked and any violations between the floor-plan increases the run time. In the design, global routing
data, gate-level net-list, and power intent are for placement and high fan-out net synthesis is
corrected. The floor-plan information is then loaded enabled.
onto the tool. Next, the power intent is committed, 10. Performing magnet placement - Magnet
which adds the isolation cells, retention cells, enable placement is used to place certain logic cells close
level shifters, and power Mux-s. to objects to reduce the wire length. Certain macros
3.4.2 Optimization preparation are set to act as magnet objects for some logic cells
Placement optimization is an important step during PD to which they connect extensively.
flow. Several parameters and tool options of the 3.4.3 Performing placement
Synopsis IC Compiler II tool need to be set before The place_opt command is run to invoke the tool to
optimization. run placement. Several optimization iterations are
1. Setting target library files - The library files that performed to get an optimized placement for
should be used by the tool for optimization and congestion and timing. The placement is then
clock tree synthesis should be defined by using the legalized to snap the standard cells to the
set_target_library_subset command. manufacturing grid. The placement is checked to
2. Restricting library cells - The command resolve any violations before moving to the Clock
set_lib_cell_purpose restricts the library cells used Tree Synthesis (CTS) stage.
during optimization, clock tree synthesis, and 3.4.4 CTS
setup/hold fixing. This reduces the tool runtime as The CTS starts with deriving the clock trees and
only specific cells will be tried and tested for checking for all clock constraints. The clock
optimization. constraints must be specified for all clocks and a clock
3. Preventing optimization on cells - By setting the reference must be derived for all clock cells. The
size_only option on some cells, optimization can be transition and capacitance for each input port must
prevented on certain cells. This command is set of also be specified. The parameters that are set to
cells present in the clock paths. prepare for CTS are as under:
4. Setting percentage low Voltage Threshold (VT) 1. Enable skew and target latencies - The tool tries
optimization - Low VT cells consume low power to achieve the skew and target latency values as
but have high leakage current. The required by specific designs during the
set_max_lvth_percentage command restricts the optimization.
use of low VT cells to a defined value and the tool 2. Enable local skew optimization and skew groups
considers leakage and power trade-off during - The skew groups are a set of clock cells among
optimization. In this design, the percentage low VT
is set to 20.
911
J. Fadnavis and Kariyappa B.S.
which the skew must be balanced. Local balancing direction can be fixed. Routing blockages are areas
results in a much-optimized timing for clock paths. where routing of certain layers is not allowed.
3. Specifying the primary corner - The optimization Routing blockages are placed close to the pins to
tool uses the set primary corner to resolve setup and reduce routing congestion. Routing corridors are
hold violations. The primary corner defined is regions where the routing of some nets can be
generally the extreme corner for which timing must restricted.
meet the requirements. 2) Defining Non-Default Routing (NDR) for clock
4. Enabling dirty design mode - The constraints in and signal nets - Certain nets require special route
the SDC file can get extremely tight, which layer characteristics. The trunks of the clock tree are
increases the optimization run time. This setting is usually routed with a double width layer which is
specified to get optimum results in lesser time as the specified as an NDR rule. NDR rules are specified
tool ignores a few constraints to meet timing. for certain nets after looking at logical connectivity.
5. Enabling global route - Global routing for clock 3) Routing clock nets - The global routing, track
nets is enabled to get accurate wire length timing assignment, and detail routing is performed for all
values during optimization. clock nets.
6. Enable Concurrent Clock and Data (CCD) 4) Routing critical nets - Certain nets as studied from
optimization - The option is enabled to perform the data flow logic are considered critical nets.
optimization on both clock and data paths. Buffers These nets must be routed first to fix them and
and inverters will be added to the data path to meet prevent optimization of these nets further.
and balance timing.
Once the clock and critical nets are routed, the routing
Once the specifications are enabled, a clock tree can of the entire design can be carried out. The routing
be built. The clock tree is first built by inserting the engine first assigns global routes to all nets and
mesh and tap drivers across the core area. The clock overflow in each global route cell is reported. The
mesh is then built using the create_clock_straps track assignment is then performed which contains
command. The global clock tree is then built which is certain violations. The detailed routing routes the nets
generally an H-tree structure. The mesh and tap drivers completely and resolves violations. Post route
are routed to the global clock tree and mesh, followed optimization is performed which includes legalization
by synthesis and optimization of the entire clock tree. of cells, incremental detail routing, and ECO routing.
A tap driver connected to the various sinks is shown in
Figure 9. 3.5 Implementation specifications
3.4.5 Routing The floor-plan of the LLC module was performed on
The routing parameters that are set before performing Cadence Innovus Implementation System and the
routing area: PNR flow was carried out using Synopsis IC Compiler
1) Defining routing guides, blockages, and II. The LLC module specifications are given below in
corridors – Routing guides are regions where Table 1.
specific routing characteristics such as horizontal
and vertical track utilization, and preferred routing
The floor-plan sanity check results are tabulated in placement, and highest for island placement. This
Table 2. The observations from the sanity check are as proves that the area available for clock and signal
follows: routing is more for peripheral placement as compared
1. The standard cell area for island macro placement is to the other strategies, thereby reducing congestion of
highest, followed by the donut and then peripheral the core area.
configurations. 5. The number of power switch cells and PG pins placed
2. The blockage area for donut macro placement is the in the core area is highest for island macro placement,
lowest, followed by peripheral macro placement and followed by peripheral and then donut. The power
island macro placement. This shows that the non- switch cells consume extra power and a higher value
uniformity of macro placement is highest for island of these lead to more power consumption of the chip.
placement, making it an inefficient macro placement The peripheral macro placement again is observed to
strategy. be best with a moderate value for power consumption.
3. While the number of cell rows is highest for island 6. The number of Global Cell (GCell) route congestion
macro placement, the number of unique cell rows is is a rough indication of the congestion after routing
the least, which indicates less uniform standard cell takes place per GCell. The congestion is seen to be
placement. In this regard, the peripheral macro minimum for peripheral macro placement, indicating
placement proves to be the best. it to be a better macro placement choice among the
4. The core density and gate density are lowest for three.
peripheral macro placement, moderate for donut
914
International Journal of Advanced Technology and Engineering Exploration, Vol 8(80)
The QOR results at each step of PD implementation each of these as the available density for routing has
flow for each macro placement strategy are given as reduced after the introduction of clock cells.
follows: 3)After routing- The WNS value for each macro
1)After standard cell placement- Table 3 tabulates placement strategy has become less negative,
the QOR after standard cell placement as follows: indicating an improvement in timing QOR in Table 5.
i) The overall WNS value is seen to be negative as the The power is seen to further increase, due to the power
design has not been optimized for timing. However, consumption of signal and clock rates. The congestion
the WNS value is least for peripheral macro value is increased slightly after CTS. The slight
placement, indicating better timing QOR. increase is caused due to the routing optimization
ii) The power is seen to be minimum for peripheral carried out by the tool.
macro placement and highest for donut macro 4)After chip sign-off- The WNS has been optimized
placement. to obtain a positive value which indicates better timing
iii) The congestion value is moderate for all the closure as seen in Table 6. The power is seen to have
strategies as the clock and signal routes have not been increased from the routing stage due to the inserted
placed yet. buffers to close timing. The overall routing congestion
2)After CTS- The QOR comparison after CTS is is seen to be lowest for peripheral placement proving
tabulated in Table 4. The WNS value has reduced it to be a better macro placement option along with the
across all the macro placement strategies due to the least WNS and power consumption.
post-placement optimization that occurs before CTS.
The power has increased due to extra power Complete list of abbreviations is shown in Appendix I.
consumption by the clock controllers and cells. The
routing congestion is also seen to have increased for
915
J. Fadnavis and Kariyappa B.S.
receiver for human body communication. IEEE [16] Sakhare S, Perumkunnil M, Bao TH, Rao S, Kim W,
Transactions on Biomedical Circuits and Systems. Crotti D, et al. Enablement of STT-MRAM as last level
2019; 13(3):566-78. cache for the high performance computing domain at
[3] Inoue K, Yano Y. A large scale access-control list for the 5nm node. In international electron devices meeting
IoT security comprising embedded IP-core and DDR 2018. IEEE.
DRAM. In international SoC design conference 2016 [17] Cho H, Kong J, Munir A, Giri NK. CT-cache:
(pp. 197-8). IEEE. compressed tag-driven cache architecture. In computer
[4] Hassan M. On the off-chip memory latency of real-time society annual symposium on VLSI 2018 (pp. 94-9).
systems: Is DDR dram really the best option? In real- IEEE.
time systems symposium 2018 (pp. 495-505). IEEE. [18] Jang G, Gaudiot JL. Data shepherding: a last level
[5] Behnam P, Bojnordi MN. STFL-DDR: improving the cache design for large scale chips. In international
energy-efficiency of memory interface. IEEE conference on high performance computing and
Transactions on Computers. 2020; 69(12):1823-34. communications; international conference on smart
[6] Soni A, Soni B, Mehta R. Congestion estimation using city; international conference on data science and
various floorplan techniques in 28nm soc design. In systems 2019 (pp. 1920-7). IEEE.
international conference on intelligent computing and [19] Lin JM, Deng YL, Li ST, Yu BH, Chang LY, Peng TW.
control systems 2020 (pp. 199-204). IEEE. Regularity-aware routability-driven macro placement
[7] Zhang Y, Peng X. A partition level floorplan method methodology for mixed-size circuits with obstacles.
based on data flow analysis for physical design of IEEE Transactions on Very Large Scale Integration
digital IC. In international conference on integrated (VLSI) Systems. 2018; 27(1):57-68.
circuits and microsystems 2017 (pp. 74-7). IEEE. [20] Lin JM, Deng YL, Yang YC, Chen JJ, Chen YC. A
[8] Garg S, Shukla NK. A study of floorplanning novel macro placement approach based on simulated
challenges and analysis of macro placement approaches evolution algorithm. In international conference on
in physical aware synthesis. International Journal of computer-aided design 2019 (pp. 1-7). IEEE.
Hybrid Information Technology. 2016; 9(1):279-90. [21] Lin JM, Li ST, Wang YT. Routability-driven mixed-
[9] Chan CK, Wu TM, Wu ML, Fan GJ, Shiah C, Lu NC, size placement prototyping approach considering
et al. Power distribution network modeling and design design hierarchy and indirect connectivity between
of re-distribution layer in DDR application. In macros. In proceedings of the annual design automation
workshop on signal and power integrity 2020 (pp. 1-4). conference 2019 (pp. 1-6).
IEEE. [22] Vidal-obiols A, Cortadella J, Petit J, Galceran-oms M,
[10] MP PK, Panda SK. Design and verification of DDR Martorell F. Multi-level dataflow-driven macro
SDRAM memory controller using systemverilog for placement guided by RTL structure and analytical
higher coverage. In international conference on methods. IEEE Transactions on Computer-Aided
intelligent computing and control systems 2019 (pp. Design of Integrated Circuits and Systems. 2020.
689-94). IEEE. [23] Uppula V, Kesav SV, Vura B. Impact on the physical
[11] Sim SW, Andersson W. On-die decoupling capacitor design flow, due to repositioning the macros in the
optimization for DDR IO interface power rail. In floorplan stage of video decoder at lower technologies.
conference on electrical performance of electronic International conference on distributed computing,
packaging and systems 2018 (pp. 229-31). IEEE. VLSI, electrical circuits and robotics 2019 ((pp. 1-6).
[12] Ejaz A, Papaefstathiou V, Sourdis I. FreewayNoC: a IEEE.
DDR NoC with pipeline bypassing. In international [24] Shaikh M, Soni B, Mehta R. Optimization of floorplan
symposium on networks-on-chip 2018 (pp. 1-8). IEEE. strategies to reduce timing violation on 28nm ASIC and
[13] Mohamed J, Michalka T, Ozbayat S, Luevano GR. scopes of improvement for data center ASICs. In
PDN design and sensitivity analysis using synthesized international conference on intelligent computing and
models in DDR SI/PI co-simulations. In electrical control systems 2020 (pp. 93-8). IEEE.
design of advanced packaging and systems symposium [25] Hu Q, Zhang MS. A collaborative optimization for
2018 (pp. 1-3). IEEE. floorplanning and pin assignment of 3D ICs based on
[14] Al-obaidy F, Asad A, Mohammadi F. Power- GA-SA algorithm. In international symposium on
management based on reconfigurable last-cache level electromagnetic compatibility & signal/power integrity
on non-volatile memories in chip-multi processors. In 2020 (pp. 434-8). IEEE.
Canadian conference of electrical and computer [26] Cheng WK, Wu CS. Machine learning techniques for
engineering 2019 (pp. 1-4). IEEE. building and evaluation of routability-driven macro
[15] Nath A, Kapoor HK. Write variation aware cache placement. In international conference on consumer
partitioning for improved lifetime in non-volatile electronics-Taiwan 2019 (pp. 1-2). IEEE.
caches. In international conference on VLSI design and [27] Wang L, Wang J, Zhang Q. Design and implementation
international conference on embedded systems 2019 of DDR SDRAM controller based on FPGA in satellite
(pp. 425-30). IEEE. navigation system. In international conference on
signal processing 2012 (pp. 456-60). IEEE.
917
J. Fadnavis and Kariyappa B.S.
918