QUICKRECALL: A Low Overhead HW/SW Approach For Enabling Computations Across Power Cycles in Transiently Powered Computers
QUICKRECALL: A Low Overhead HW/SW Approach For Enabling Computations Across Power Cycles in Transiently Powered Computers
QUICKRECALL: A Low Overhead HW/SW Approach For Enabling Computations Across Power Cycles in Transiently Powered Computers
I. I NTRODUCTION
Transiently powered computers (TPCs) [1] represent a new
class of ultra-low power embedded computing platforms that
are batteryless and rely solely on external power sources for
their energy supply. Examples of such TPCs include computational RFID tags [2], batteryless sensors [3], etc. Successfully
performing computations on TPCs is a major challenge due to
the unpredictable and highly intermittent nature of the power
supply. For example, a TPC may receive power in small
intermittent bursts (often less than 100ms), far lower than the
time required to execute most programs.
Existing techniques to address this challenge are based on
the idea of frequent checkpointing of system state. When power
loss is imminent, a snapshot (checkpoint) of system state
(e.g., processor registers, contents of SRAM) is stored to ash
memory, which is non-volatile. During the next burst of power,
the system reboots, restores state from the stored checkpoint,
and resumes program execution. Thus, long-running programs
execute gradually, in small increments, as and when power
becomes available. However, checkpointing to ash involves a
signicant time and energy overhead due to the high erase/write
times of ash memory (tens of ms). As a result, a big portion
of the time when a TPC receives power (henceforth referred
to as the ON time) is spent performing checkpointing, which
limits the amount of time available for program execution.
More importantly, if the ON time is less than the time required
for storing and retrieving checkpoints, the TPC can never
successfully complete program execution.
Recent advances in semiconductor technology have resulted
1063-9667/14 $31.00 2014 IEEE
DOI 10.1109/VLSID.2014.63
III. M OTIVATION
Recent years have witnessed the emergence of non-volatile
memories (NVMs) such as Ferroelectric RAM (FRAM) and
Magnetoresistive RAM (MRAM). In addition to being nonvolatile, these memories have distinct advantages over ash in
terms of power consumption, performance, endurance, etc.
A. Ferroelectric RAM: A candidate for unied memory
The signicant overhead in performance and energy of ash,
due to its inherent device limitations, is the primary motivator
for employing FRAM in embedded systems. Flash memory bitcells can only be written from logic 1 to logic 0. Writing a logic
1 to a cell that was previously set to logic 0 requires the ash
bitcell to be erased rst. Depending upon the ash memory
size and architecture, the smallest memory unit for erasure can
vary. As an example, for the MSP430F2132 microcontroller
used in the BlueWISP RFID platform, the smallest erasable
unit is a segment of size 512 bytes and erasing it takes 10ms
to 18ms. Moreover, an erase operation requires higher voltage
and, therefore, is energy expensive [9]. An FRAM memory
cell, is DRAM-like in structure and uses the polarization on a
ferroelectric capacitor to distinguish between the logic states
[10]. Thus, FRAM is random-access for reads and writes and
requires no erase operations. Even though FRAM involves a
destructive read, the write-back is hidden and instantaneous,
thereby presenting almost no latency overhead to the system.
Consequently, while ash memories present asymmetric readwrite latencies, FRAM access latencies are symmetric. Another
limitation of ash memory is the limited endurance that it has.
While the endurance limit for ash memory is around 105
erase/write cycles, FRAM devices have an endurance almost
10 orders of magnitude greater than ash [7].
331
B. Software Flow
Writing applications aimed at resuming computations across
power cycles requires minor variations to the traditional embedded programming style. Previous work has tried to address this
for large-scale systems [7]. Q UICK R ECALL uses a similar ow
although we design for scenarios where the system is severely
power-deprived. Q UICK R ECALL places two requirements on
the programmer in this regard. First, the programmer has to
use the predened Q UICK R ECALL global variables which store
the state. The memory addresses of the program symbols
reside in the ELF executable. Hence, the memory map for
a particular program remains unchanged across reboots. The
extra variables required for data retention are allocated in the
bss as uninitialized global variables. The variables required
332
V. D ESIGN I MPLEMENTATION
This section describes the implementation and experimental
setup for Q UICK R ECALL.
B. Determining Vtrig
A. Experimental setup
333
TABLE I
P ROGRAM E XECUTION T IME (C PU F REQ = 8MH Z )
Program
Total Runtime
CRC
8.18s + 1.854ms + 4.4s
551ms
RSA
8.18s + 1.854ms + 4.4s
11.12s
SENSE
8.18s + 20ms + 4.4s
79ms
a Store Overhead + Initialization Overhead + Restore Overhead
Slowdown
CWi
i=1
(1)
T otalRuntime
Slowdown happens due to the overhead presented by checkpointing schemes to store and restore the system snapshot. Note
that in the above denition, the amount of time the MCU
is in the OFF state does not contribute to the calculation of
slowdown.
3) Single Life Cycle: We dene the execution of a program
in a single continuous run in the absence of power loss as a
single life cycle execution of the program.
B. Results
To evaluate Q UICK R ECALL, three test programs were used,
namely CRC, RSA and SENSE. CRC calculates a 16-bit CRC
and a 32-bit CRC of a message using polynomials. RSA
does a 64-bit encryption on 128 characters. The program then
decrypts the encrypted value and veries correctness. SENSE
senses accelerometer data, processes it using a low pass lter,
and then performs statistical computations such as nding the
minimum, maximum, mean, and standard deviation of the
collected data. SENSE implements nested interrupts as well
as dynamic memory allocation on the heap 1 .
The overhead introduced by Q UICK R ECALL per power cycle
and the single life cycle execution time for each test program
is given in Table I. Q UICK R ECALL overhead comprises of
checkpointing (storing) overhead and wake-up overhead. Wakeup overhead comprises of an initialization overhead and restoring overhead. Initialization overhead denotes the time spent
1 Mementos, a checkpointing scheme for TPCs, does not support dynamic
memory allocation.
2 When the ash is not being erased in a power cycle, the only overhead for
conventional checkpointing schemes is the write operation.
334
4.5
4
12
Slowdown (x)
Normalized Runtime
3.5
14
2.5
2
1.5
4
1
2
0.5
CRC
RSA
SENSE
10
20
30
40
50
60
70
80
90
100
VII. C ONCLUSION
In this work, we have successfully implemented and demonstrated Q UICK R ECALL, a scheme which minimizes the checkpointing overhead by 100x-1000x in each power cycle and
335