Ta 01 Sharma Crimmins Pres User

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Faster Timing Closure and Power Saving for a VPU

(Vision Processing Unit) SoC intended for Deep


Learning and AI acceleration

Eamonn Crimmins & Amit Sharma


Intel Corporation

June 12, 2018


Europe

SNUG 2018 1
Agenda

Introduction
Design Information
Merged Switching Activity Power Reduction
Physically Aware Timing Closure
Final Leakage Recovery
Conclusions

SNUG 2018 2
Introduction

SNUG 2018 3
Introduction

• What is a VPU SOC?

– Vision Processing Unit with Neural Compute Engine (NCE)


• Always-On Real-Time Intelligent Vision Based Decision Making At The Physical / Network Edge

VPU SOC

NCE

SNUG 2018 4
Introduction

• What is a VPU SOC?

– Applications
• Drones (Navigation / Tracking)
• Security (Cat or Cat-Burglar!)
• Mobile (Facial Recognition)
• AR/VR Headsets

– Low Power Network edge device


• Always-on low latency vision-based decision making
• No (limited) cloud bandwidth required

SNUG 2018 5
Introduction

SNUG 2018 6
Introduction

• Why are Power / Speed a concern?

– Neural Compute Engine (NCE) executes the compute intensive neural algorithms
• Speed of response is critical (Real-Time)
• Dynamic and Leakage power consumption critical (Always-On)

SNUG 2018 7
Introduction

Partition 1 Partition 2 Partition 11 Partition 12

RTL RTL RTL RTL

DC DC DC DC

ICC2 ICC2 ICC2 ICC2

ECO LOOPS

PT

SNUG 2018 8
Introduction
ICC2 routed
Extracted database x 12
Parasitics x12

StdCell LEF
STAR-RC
STAR-RC
STAR-RC
STAR-RC
Macro LEF

ICC2 Partitions
MACRO and ICC2 Partitions
Netlist and ICC2 Partitions
Std-Cell DEF x 12 ICC2
LEFs

PRIMETIME Primetime
ECO
Scripts x 12
SNUG 2018 9
Introduction

• Goals

– Dynamic and Leakage power recovery during signoff.


– Faster Turn-Around-Time for our signoff timing closure loops.
– Utilize Primetime Advanced feature’s which include:
• Multiply Instantiated Modules (MIM) support
• Switching Activity based power recovery
• Min-Vt spacing rules support
• Side-load cell down-sizing for setup fixing
• Clock network ECO

SNUG 2018 10
Glossary

• SHAVE Streaming Hybrid Architecture Vector Engine


– 128-bit VLIW C-programmable processor which augments the NCE

SNUG 2018 11
Design Information

SNUG 2018 12
Design Information

SNUG 2018 13
Design Information

• Myriad X SOC

– 4Gbit LPDDR4 DRAM support


– Multiple interfaces 16 x MIPI lanes / USB 3.1 / 10GbE / PCIe 3.0 / I2S / SD / Quad-SPI
– >4 TOPS Compute capacity
– 1 TOPS Neural network capability
– 8.1mm x 8.8mm package

SNUG 2018 14
Design Information

• Myriad X SOC

– 12 physical partitions
– Replicated SHAVE block x16 (MIM)
– Multi-million instances
– 18 Signoff PVT corners with Functional / BIST / Scan stuck@ / Scan @speed modes
– Power optimization was targeted on the standalone NCE partition
– Timing Signoff and Timing ECO’s were performed on a flat top-level design

SNUG 2018 15
Merged Switching Activity Power Reduction

SNUG 2018 16
Merged Switching Activity Power Reduction

• Flow Overview

– SAIF Switching Activity Interchange Format


• ASCII file used to capture the switching toggle rate and static probability of nets and pins

– Primetime Switching Activity Based Power Recovery


• Uses the power analysis engines PT-PX “update_power” for power recovery
• Does not degrade Timing DRC / Setup / Hold Slacks

MERGED SAIF PRIMETIME ICC2


RTL SIMULATED
SAIF GENERATION

SNUG 2018 17
Merged Switching Activity Power Reduction

• Merged SAIF Benefits

– Prevents over/under optimization as can occur with a single SAIF


– Highlights nets/pins with true dominant switching activity
– Highlights nets/pins with true low switching activity
– More realistic toggle-rates and static probabilities

SNUG 2018 18
Merged Switching Activity Power Reduction

• Test-Case Selection Criteria

– 60+ real-world Neural Compute Engine algorithms simulated


• Various filter sizes (5x5/3x3/1x1)
• Object detection / Object classification
• Maximum and Optimal DPE (Data-Path-Element) node utilization

– Each individual SAIF is then weighted to produce a merged SAIF


• Engineering decision as to what weight to apply to each SAIF
• The weighted total must add up to 100%

SNUG 2018 19
Merged Switching Activity Power Reduction

• PT Merged SAIF utility

merge_saif -input_list {-input ${NCE_SAIF1} -weight 5 \


-input ${NCE_SAIF2} -weight 2 \
-input ${NCE_SAIF3} -weight 2 \
...
-input ${NCE_SAIF60} -weight 1} \
-strip_path i_tb/i_dut \
-output merged_nce.saif

SNUG 2018 20
Merged Switching Activity Power Reduction

• DMSA SAIF Driven Power Optimization Setup


– Performed post timing-analysis
Common Leakage
current_scenario {FUNC.TT} and Dynamic
Power Scenario
remote_execute {
set_app_var power_enable_analysis true
set_app_var power_analysis_mode averaged
set_app_var power_clock_network_include_register_clock_pin_power false

read_saif merged_nce.saif
Moves clock pin power onto the data network
update_power (clock network was don’t-touched for power
} recovery)

SNUG 2018 21
Merged Switching Activity Power Reduction

• DMSA SAIF Driven Power Optimization


Timing and DRC constraints
# Select all scenarios honoured during power optimization
current_scenario –all

# SAIF driven dynamic power reduction


fix_eco_power –pba_mode path -power_mode dynamic -dynamic_scenario FUNC.TT
fix_eco_power –pba_mode path -power_mode leakage -leakage_scenario FUNC.TT
fix_eco_power –pba_mode path -power_mode dynamic -dynamic_scenario FUNC.TT

To speed up runtimes “path” We found that the dynamic ->


pba_mode was used for all power leakage -> dynamic sequence gave
recovery runs the best results for our design

SNUG 2018 22
Merged Switching Activity Power Reduction
% Combinational Power Improvement Vs Base Design for most used Deep Neural
Network Test-cases in the Neural Compute Engine
70
63%
60
51%
50
38% 41%
40
31%
28%
30

20
11% 13%

10

0
Internal Power Switching Power Leakage Power Total Gain

Dynamic Only Dynamic + Leakage + Dynamic

SNUG 2018 23
Physically Aware Timing Closure

SNUG 2018 24
Physically Aware Timing Closure

• Advanced Features

– Distributed Multi-Scenario Analysis (DMSA)


• More convergent flow across all PVT and modes

pt_shell –multi_scenario ...

– Physical Aware ECO


• Minimal Physical Impact

set_eco_options -physical_tech_lib_path ${STD_CELL_LEF_LIST}


set_eco_options -physical_lib_path ${MACRO_LEF_LIST}
set_eco_options -physical_design_path ${PARTITION_DEF_LIST}

SNUG 2018 25
Physically Aware Timing Closure

• Advanced Features

– Enable the “all-paths” path based analysis (PBA) algorithm


• Avoids the path recalculation limit for exhaustive analysis

set_app_var pba_exhaustive_endpoint_path_limit infinity

Faster execution with


maximum pessimism removed

SNUG 2018 26
Physically Aware Timing Closure

• Advanced Features
Ideal for our replicated
– Enable MIM ECO support “SHAVE” physical design

set_app_var eco_enable_mim true


set_eco_options -mim_group [all_instances -hierarchy shave]

– Enable Advanced Technology Min-Vt Spacing Rules

set_eco_options -physical_lib_constraint_file ${advanced_tech_rules}

Eliminates Min-Vt Spacing Signoff DRC


violations in IC-Validator

SNUG 2018 27
Physically Aware Timing Closure

• DRC Fixing

set_app_var eco_alternative_area_ratio_threshold 3.0 Applies a limit on the


area increase due to
fix_eco_drc -type max_transition –verbose \ sizing.
-physical_mode open_site \
-methods {size_cell} \
-cell_type combinational

set_app_var eco_insert_buffer_search_distance_in_site_rows 8
Controls buffer
fix_eco_drc -type max_transition -verbose \ insertion location
-physical_mode open_site \
-methods {insert_buffer} –buffer_list {LVT_BUF_X4 LVT_BUF_X5...} \
-cell_type combinational

SNUG 2018 28
Physically Aware Timing Closure

• Setup Fixing

fix_eco_timing –type setup \


-physical_mode open_site \ Initial ECO loops use “path”
–methods {size_cell} \ pba_mode. Final ECO loops use
“exhaustive” pba_mode
-cell_type combinational \
-group {shave_2_shave ...} \
–pba_mode exhaustive
Reduces the parasitic capacitance
fix_eco_timing –type setup \ load on timing critical nets by
-physical_mode open_site \ downsizing net side-loads
–methods {size_cell_side_load} \
-cell_type combinational \
-group {shave_2_shave ...} \
–pba_mode exhaustive

SNUG 2018 29
Physically Aware Timing Closure

• Hold Fixing
Targeted path-group
fixing
fix_eco_timing –type hold -physical_mode open_site \
–methods {size_cell} \
-group {shave_2_shave ...} –pba_mode exhaustive \
-cell_type combinational
Low Leakage High Vt cells
used for buffer insertion
fix_eco_timing –type hold -physical_mode open_site \
–methods {insert_buffer} \
–buffer_list {HVT_BUF_X1 HVT_BUF_X2 HVT_DEL05_X1 HVT_DEL10_X2 ...} \
-group {shave_2_shave ...} –pba_mode exhaustive \
-cell_type combinational

SNUG 2018 30
Physically Aware Timing Closure

• Advanced Features
Import physical clock
information
– Clock network ECO (Useful skew)

set_eco_options -physical_enable_clock_data Only allow clock LVT cells


to be used during sizing
set_user_attribute [get_lib_cells */*LVT*] is_clk_cell true
set_app_var eco_alternative_cell_attribute_restrictions is_clk_cell

fix_eco_timing –type setup -physical_mode open_site \


–cell_type clock_network –pba_mode exhaustive \
–methods {size_cell} \ Prevents an overly
-clock_max_level_from_reg 2 \ intrusive clock eco from
being performed
-clock_fixes_per_change 2

SNUG 2018 31
Physically Aware Timing Closure

• How to debug unfixable violations?

SNUG 2018 32
Physically Aware Timing Closure

• Advanced Features

– IMSA (Interactive Multiple Scenario Analysis) GUI


• Very useful for debugging “Unfixable reasons” messages output from fix_eco_timing
• Session with a collection of paths exported from each DMSA slave
• All timing path information loaded into the GUI layout viewer
• IMSA ECO operations support buffer insertion, resizing, and buffer removal

SNUG 2018 33
Physically Aware Timing Closure

SNUG 2018 34
Physically Aware Timing Closure

• Additional Steps

– Initial ECO loops used “-physical_mode open_site”


• Allows ECO to size-cells or insert buffers into empty sites only
– Additional 5-10% DRC / Setup / Hold fixing using “-physical_mode occupied_site” mode
• Allows ECO to overlap existing cells
– After all ECO loops >90% DRC and Setup violations and >98% Hold violations were fixed
– Remaining violations were fixed logically
• any large cell displacements in ICC2 were investigated
– Each PT ECO loop consumed approx. 6hrs.

SNUG 2018 35
Physically Aware Timing Closure

3000 2685 Setup NVP


2500
2000 1726
1500 1380
959
1000
460
500 220 55
0

ECO 1 ECO 2 ECO 3 ECO 4 ECO 5 ECO 6 ECO 7


0
-5
-3.04358 -0.093
-10 -6.3408
-15
-13.21
-20
-19.0224
-25
-23.778
-30
-35 Setup TNS
-40 -36.988
SNUG 2018 36
Final Leakage Recovery

SNUG 2018 37
Final Leakage Recovery

• Leakage Recovery

– Final signoff step


– Performed on full flat design
– Existing setup / hold / drc slacks are honored
– All PVT corners and functional modes included

SNUG 2018 38
Final Leakage Recovery

• Leakage Recovery Setup

– Cells are grouped into appropriate threshold_voltage_groups

remote_execute {
set_user_attribute [get_lib_cells */*HVT20*] threshold_voltage_group HVT20
set_user_attribute [get_lib_cells */*HVT16*] threshold_voltage_group HVT16
set_user_attribute [get_lib_cells */*SVT20*] threshold_voltage_group SVT20
set_user_attribute [get_lib_cells */*SVT16*] threshold_voltage_group SVT16
set_user_attribute [get_lib_cells */*LVT20*] threshold_voltage_group LVT20
set_user_attribute [get_lib_cells */*LVT16*] threshold_voltage_group LVT16
set_user_attribute [get_lib_cells */*ULT20*] threshold_voltage_group ULT20
set_user_attribute [get_lib_cells */*ULT16*] threshold_voltage_group ULT16
}

SNUG 2018 39
Final Leakage Recovery

• Leakage Recovery Setup

– Cell sizing limited to cells with equivalent area

remote_execute {
set_app_var power_enable_analysis true
set_app_var power_analysis_mode averaged
set_app_var eco_alternative_area_ratio_threshold 1
define_user_attribute cell_footprint -class lib_cell -import -type string
set_app_var eco_alternative_cell_attribute_restrictions {cell_footprint}
}

SNUG 2018 40
Final Leakage Recovery

• Leakage Recovery

– Leakage pattern priority used to define cell selection priority


• HVT > SVT > LVT > ULVT
– A positive setup margin restriction of 50ps was applied
– “path” pba_mode was used to speed up runtime
– Both combinational and sequential swapping were allowed

fix_eco_power -pba_mode path \


-pattern_priority <LEAKAGE_PATTERN_PRIORITY_LIST> \
-attribute threshold_voltage_group \
-verbose -setup_margin 0.050 \
-cell_type {combinational sequential}

SNUG 2018 41
Final Leakage Recovery

• Leakage Recovery

– 24.5% reduction in ULVT usage


– 24.7% reduction in LVT usage
– Leakage recovery run took 48 hours with 16 cores per scenario to complete

SNUG 2018 42
Final Leakage Recovery

%SAVING IN TOTAL LEAKAGE POWER


29.3%

21.0% 21.9%

14.9% 16.2%
7.0%
10.7%
8.7% 9.3% 11.5%
10.7%
7.6% 7.6%
5.1% 5.9%

SNUG 2018 43
Conclusions

SNUG 2018 44
Conclusions

• >30% combinational power reduction in the NCE achieved using the merged
switching activity flow

• Faster ECO Turn-Around-Time (TAT) during Signoff phase with PT ECO loops
completed in under 6 hours and 90% of DRC / Setup and 98% of Hold
violations fixed with 7 ECO loops.

• Almost 15% leakage power recovered on Signoff database while honoring


timing drc/setup/hold

SNUG 2018 45
Thank You

SNUG 2018 46

You might also like