Containing Power Dissipation in The Latest Generation of Integrated Circuits Is Testing The Ingenuity of Designers
Containing Power Dissipation in The Latest Generation of Integrated Circuits Is Testing The Ingenuity of Designers
Containing Power Dissipation in The Latest Generation of Integrated Circuits Is Testing The Ingenuity of Designers
42
STAY COOL
the gap
between what we can theoretically
manufacture in silicon and what we can
realistically design has finally stabilised.
This is the direct result of re-use strategies
involving successively higher levels of
Containing power dissipation in the latest abstraction – cell re-use, succeeded by IP re-
generation of integrated circuits is testing use, succeeded, in turn, by sub-system and
architecture re-use. Today, the system on
the ingenuity of designers chip (SoC) design community is already
contemplating chip re-use – tiling several
by Peter Klapproth and Francesco Pessolano fully verified ‘chiplets’ onto a single piece
of silicon. Sub-100nm CMOS process
technologies will continue to make cost-
effective implementation of these designs
possible. The latest large-scale SoCs are
already heterogeneous multi-processor
systems, containing an array of CPUs,
DSPs, vector processors and hardware
accelerators.
LOWERING THE
SWITCHING FREQUENCY
OPENS UP THE
POSSIBILITY OF POWER
SAVINGS THROUGH A
REDUCTION IN SUPPLY
VOLTAGE.
phone is expected to support several hours of compute- voltage, or a combination of both. Clock
intensive activity, at the same time retaining sufficient frequencies can theoretically be adjusted
battery power to support mobile communications. between the maximum clock frequency
Unfortunately, these escalating energy demands have that the process technology can sustain at
not been matched by a corresponding increase in its maximum Vdd, down to zero frequency
battery capacity. (a stopped clock). The supply voltage can be
Things will only get worse. SoC implementation in adjusted over the specified operating ➔
44
voltage range of the process technology. In
POWER REDUCTION STRATEGIES FOR SOC addition, supply voltage and clock
frequency are interdependent, with higher
Vdd values being required to sustain higher
Vddcore Vddram Vddsoc Vddalways clock frequencies.
Dynamic frequency switching, in the
CPU L1 cache
Peripherals Always-On
form of clock gating (reducing the clock
Core logic
Functions frequency to zero), has been used for some
time to reduce dynamic power dissipation
in areas of a chip that are temporarily idle.
However, although clock gating is easy to
Networks and Bridges
implement, it does nothing to save power
consumption when these areas are once
SDRAM Static Embedded Tunnels again active.
Memory SRAM Dynamic frequency scaling, on the other
hand, addresses active as well as idle-state
power consumption, by progressively
reducing the chip’s clock frequency as the
computational load on its processing
Vddcore Vddram Vddsoc Vddalways
resources falls, at the same time ensuring
that computational tasks still complete
CPU L1 cache within the real-time constraints of the
Peripherals
Core logic Frequency Always-On
Frequency application. This technique is particularly
switching and switching Functions
switching suitable for circuits that operate
DVFS and DVFS
continuously at known performance levels,
for example, peripheral devices in which
Networks and Bridges the device driver software has a good
Frequency switching knowledge of performance requirements
SDRAM Static Embedded
in different operating modes. However, for
Tunnels
Frequency Memory SRAM circuits that operate intermittently,
switching Ctrl dynamic frequency switching may achieve
equivalent power savings and be simpler to
implement.
Lowering the switching frequency has
the added bonus of opening up the
Vddcore Vddram Vddsoc Vddalways
possibility of additional power savings
CPU L1 cache through a reduction in supply voltage.
Core logic & TCM Peripherals Always-On Given that power consumption is
Voltage switching Functions proportional to the square of the supply
switching reduction voltage, this means that even a small
reduction in supply voltage can have a
Networks and Bridges significant benefit.
This ability to reduce the clock
frequency to parts of a chip, and run those
SDRAM Static Embedded Tunnels
Ctrl Memory
Voltage parts at reduced supply voltage, is already
SRAM
Switching being exploited in SoCs. Implementation
Ctrl Ctrl
takes the form of ‘voltage islands’, in which
IP blocks with a common maximum clock
frequency are grouped together and
Top: Voltage islands; middle: dynamic power reduction techniques; powered from a separate supply voltage.
bottom: static power reduction techniques
For example, one recent 65nm SoC intro-
duced by Philips has four distinct voltage
46
system actively involved in task execution consume The static power reduction techniques
power, and the moment all tasks are completed the employed in the Philips 65nm SoC are shown
system automatically goes into a near-zero power in bottom section of the ‘Power reduction
consumption standby mode. Handshake Solutions has strategies for SoC’ panel.
already worked with processor company ARM to
produce the ARM996HS processor, the world’s first CROSSING THE BOUNDARIES
synthesisable clockless ARM9E family processor for A consequence of both frequency and
real-time embedded low-power applications. voltage scaling in conventional synchronous
SoC architectures is that signals travelling
CUTTING STATIC POWER from one clock or voltage domain to another
Static (standby) power consumption in deep sub- must cross frequency and/or voltage
micron CMOS is dominated by sub-threshold channel boundaries. Boundaries between clock
leakage currents. It is approximated by the following domains must therefore incorporate clock
relationship: synchronisation mechanisms and voltage
Pstatic ∝ Vdd k e(Vgs – Vt)/s W/L boundaries must include level-shifters, all of
Where Vdd = the supply voltage which adds to silicon cost and complexity.
k = dielectric constant of the gate dielectric For frequency boundaries, communication
Vgs = transistor gate-source voltage latencies are also increased. This is because
Vt = transistor threshold voltage synchronisation mechanisms typically add
s = a process-specific parameter several clock cycles of latency that on a CPU
W = transistor channel width bus interface can reduce CPU performance.
L = transistor channel length Future SoCs may therefore adopt new
system architectures that help to overcome
Because W and L are fixed by the transistor design, these effects.
this leaves adjustment of Vdd and Vt (via body-biasing) as One approach currently finding favour is
the two principle ways of fine-tuning static power the Globally Asynchronous, Locally
consumption in the finished SoC. Synchronous (GALS) architecture. This
As with frequency switching, switching off the Vdd approach, which fits well with dynamic
supply to areas of the chip that are not in use is the frequency scaling, breaks the chip up into a
simplest form of control and has the advantage of number of independent clock domains,
reducing static power consumption in these areas to each of which is internally synchronous,
zero. However, this inevitably leads to a loss of state in but which communicate with one another
the de-powered logic blocks, which means that the via asynchronous communications
power consumption saved by switching these channels. The addition of routers in these
components off has to be weighed against the communication channels offers the
additional power needed to save/restore their state on potential to turn these architectures into
entry/exit from the standby condition. In addition, on- networks-on-chip. Within such an
chip voltage switching adds to design complexity, and architecture, frequency or voltage
the IR drops and timings associated with embedded Vdd switching can be used to turn entire sub-
switching transistors are not well handled by current systems off for periods of time. Although
EDA tools. originally designed to overcome the fact
In situations where the internal state of IP blocks that timing closure was becoming
must be retained, for example in SRAM blocks, dropping increasingly difficult to achieve due to the
Vdd to the minimum level required for state retention long clock distribution and signal
can also reduce power consumption. However, the interconnect lengths in large-scale SoCs,
savings achieved by this technique are not normally GALS architectures will also contribute
large. Forcing the CMOS transistors harder off by significantly to power savings.
applying back-bias to alter their Vt holds out the
promise of much more substantial static power savings Francesco Pessolano is CTO programme
for 90nm CMOS but like the body-biasing mentioned manager, and Peter Klapproth is a technology
earlier it is not expected to scale well to 65nm CMOS and architect with the System Technology &
beyond. Architecture Group at Philips Semiconductors