QUICKRECALL: A Low Overhead HW/SW Approach For Enabling Computations Across Power Cycles in Transiently Powered Computers

2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems
Q UICK R ECALL: A Low Overhead HW/SW

Approach for Enabling Computations across Power
Cycles in Transiently Powered Computers
Hrishikesh Jayakumar, Arnab Raha, and Vijay Raghunathan
School of Electrical and Computer Engineering, Purdue University
Email: {hjayakum, araha, vr}@purdue.edu
AbstractTransiently Powered Computers (TPCs) are a new
class of batteryless embedded systems that depend solely on energy
harvested from external sources for performing computations.
Enabling long-running computations on TPCs is a major challenge
due to the highly intermittent nature of the power supply (often
bursts of < 100ms), resulting in frequent system reboots. Prior
work seeks to address this issue by frequently checkpointing
system state in ash memory, preserving it across power cycles.
However, this involves a substantial overhead due to the high
erase/write times of ash memory. This paper proposes the use
of FRAM, an emerging non-volatile memory technology that
combines the benets of SRAM and ash, to seamlessly enable
long-running computations in TPCs. We propose a lightweight,
in-situ checkpointing technique for TPCs using FRAM that
decreases the time taken for saving and restoring a checkpoint
to only 12.6s, which is over two orders of magnitude lower
than the corresponding overhead using ash. We have implemented and evaluated our technique, Q UICK R ECALL, using the
TI MSP430FR5739 FRAM-enabled microcontroller. Experimental
results show that our highly-efcient checkpointing translates to
a signicant speedup (1.4x - 4.5x) in program execution time.
in new forms of memory technologies such as Ferroelectric

RAM (FRAM), Magnetoresistive RAM (MRAM), etc., that
combine the speed, exibility, and endurance of SRAM with
the non-volatility of ash, all at a very low power consumption.
This has led to the possibility of unied memory where the
same type of memory technology is used as RAM and for nonvolatile program and data storage. Low power microcontrollers
with integrated FRAM are already commercially available. For
example, the TI MSP430FR5739 has 16KB of FRAM that can
be used as unied memory [4]. This paper makes a case for
(and demonstrates the benets of) using such emerging nonvolatile memories in TPCs. Specically, this paper makes the
following contributions:
To the best of our knowledge, this is the rst work
to investigate the use of emerging non-volatile memory
technologies (specically FRAM) in TPCs to seamlessly
enable long-running computations in the presence of frequent power interruptions.
We propose a lightweight, in-situ checkpointing technique, called Q UICK R ECALL, for TPCs that use FRAM.
Q UICK R ECALL can save and restore a checkpoint in just
12.6s, which is over two orders of magnitude lower
than the corresponding overhead using ash memory.
We have implemented Q UICK R ECALL using a TI
MSP430FR5739 microcontroller and evaluated it using
three typical embedded application programs. Experimental results show that the highly efcient checkpointing in
Q UICK R ECALL results in a signicant reduction (as much
as 4.5x) in program execution time, compared to a stateof-the-art ash-based checkpointing technique.
Q UICK R ECALL enables TPCs to perform computations
when ON times are as small as 5ms, as compared to
previous ash-based checkpointing methods which require
a minimum of 15ms ON time.
The remainder of this paper is organized as follows. Section
II describes related work. Section III makes the case for
using FRAM as unied memory in TPCs. Section IV presents
the design requirements (and tradeoffs) for enabling efcient
checkpointing in TPCs that use FRAM. Section V describes
our implementation of Q UICK R ECALL. Section VI presents our
experimental results and Section VII concludes the paper.
I. I NTRODUCTION
Transiently powered computers (TPCs) [1] represent a new
class of ultra-low power embedded computing platforms that
are batteryless and rely solely on external power sources for
their energy supply. Examples of such TPCs include computational RFID tags [2], batteryless sensors [3], etc. Successfully
performing computations on TPCs is a major challenge due to
the unpredictable and highly intermittent nature of the power
supply. For example, a TPC may receive power in small
intermittent bursts (often less than 100ms), far lower than the
time required to execute most programs.
Existing techniques to address this challenge are based on
the idea of frequent checkpointing of system state. When power
loss is imminent, a snapshot (checkpoint) of system state
(e.g., processor registers, contents of SRAM) is stored to ash
memory, which is non-volatile. During the next burst of power,
the system reboots, restores state from the stored checkpoint,
and resumes program execution. Thus, long-running programs
execute gradually, in small increments, as and when power
becomes available. However, checkpointing to ash involves a
signicant time and energy overhead due to the high erase/write
times of ash memory (tens of ms). As a result, a big portion
of the time when a TPC receives power (henceforth referred
to as the ON time) is spent performing checkpointing, which
limits the amount of time available for program execution.
More importantly, if the ON time is less than the time required
for storing and retrieving checkpoints, the TPC can never
successfully complete program execution.
Recent advances in semiconductor technology have resulted
1063-9667/14 $31.00 2014 IEEE
DOI 10.1109/VLSID.2014.63
II. R ELATED W ORK

Checkpointing schemes have long been used for fault tolerance in large-scale distributed systems. Checkpointing, performed at previously determined trigger points in the program,
stores a snapshot of system state in non-volatile memory. In
case of a fault, the system rolls back to the most recent
330
B. Unied Memory for TPCs

Converting any embedded program into an executable binary
involves the steps of compiling, assembling and linking. The
assembler creates object les for each source le of the
program. The different object les contain program information
in contiguous memory locations called sections. The linker is
tasked with combining sections, across multiple object les,
into a single executable le as well as mapping these sections
into memory. Conventionally, the linker allocates the uninitialized sections onto the RAM for run-time initialization, whereas
the global/static variables that are initialized and the program
code reside on the ROM. For example, in the MSP430 microcontroller, the bss, data, sysmem (heap), and stack
sections reside in the RAM while all the other sections are
allocated to the ROM. While the previous subsection established the advantages of FRAM over ash, its random access
and write-in-place properties also allow FRAM to be utilized as
RAM, thus enabling it to serve as a unied memory technology.
checkpoint and continues execution [5], [6]. Trigger points

are usually periodic in nature or programmer-inserted. While
checkpointing an application in this manner ensures a rollback point, it impedes normal program execution and causes
additional overhead. A checkpointing scheme for large-scale
systems using FRAM was previously explored in [7]. However,
the underlying design goals and targeted systems are very
different from those considered in this paper.
Mementos [1] is a checkpointing solution aimed at TPCs.
It instruments user-written code at compile time with trigger
points, which compare the supply voltage with a threshold and
trigger a checkpoint if the supply voltage is less than the threshold. Trigger points are inserted at the end of each iteration of a
loop or after a function return statement. Mementos addresses
the extra overhead by using a timer to periodically enable
trigger points. Typical checkpointing methods, like Mementos, use ash memory for storing checkpoints. Flash memory
erases/writes are cumbersome due to the large performance and
energy overhead they present. The overhead for checkpointing
the same program differs due to the variable stack depth at each
trigger point. In contrast, by using an emerging non-volatile
memory (NVM), Q UICK R ECALL sidesteps the data transfer latency and maximizes the time available for computation in each
power cycle. Finally, Idetic [8] targets ASIC implementations
of applications and embeds checkpoints during the behavioral
synthesis process. In contrast, Q UICK R ECALL enables and
maximizes the execution time of application in off-the-shelf
microcontrollers utilizing an emerging NVM.
IV. D ESIGN M ETHODOLOGY

Next, we discuss the requirements and tradeoffs associated
with enabling computations across power cycles in TPCs.
A. Checkpointing and Wake-up Overhead
To enable computations across power cycles, the application
needs to store the program and processor states to non-volatile
storage before power is lost. In conventional checkpointing
schemes, the checkpoint triggers are either periodic in nature or
programmer-inserted at vantage locations in the program. While
checkpointing an application in this manner ensures a rollback point, it impedes normal program execution and causes
additional overhead.
The rst design choice that Q UICK R ECALL makes is that,
for transiently powered computers, only a drop in the supply
voltage should trigger a checkpoint of the current system state.
Such a checkpointing scheme does not impede normal program
execution and only triggers a checkpoint if power loss is
imminent. However, one should note that, in such a scheme,
it is imperative that checkpointing be successfully completed
before power is lost. Q UICK R ECALL ensures this by choosing
an appropriate trigger voltage to interrupt the program and
initiate the checkpointing operation.
We dene the system context to consist of program state,
processor state, and the state of conguration registers of
various peripheral subsystems. Each of the above-mentioned
state information has to be retained for a successful recall
and resumption of computation across power cycles. Q UICK R ECALL introduces very little overhead to retain the state of
the TPC. The overhead introduced comprises of checkpointing
overhead and wake-up overhead. Checkpointing overhead is
dened as the time required to store the system state before
a power-loss. Wake-up overhead is dened as the time spent
in restoring the system state on power-up. A discussion on the
overheads introduced and design choices for Q UICK R ECALL
follows.
1) Retaining Program State: The program state consists of
the values of the global variables, stack, heap, bss, etc.,
in use by the program. Conventionally, the linker maps the
code section to a non-volatile storage like ash, and the data,
bss, and stack sections to the volatile SRAM. Figure 1
shows the proposed linker map of a microcontroller system that
III. M OTIVATION
Recent years have witnessed the emergence of non-volatile
memories (NVMs) such as Ferroelectric RAM (FRAM) and
Magnetoresistive RAM (MRAM). In addition to being nonvolatile, these memories have distinct advantages over ash in
terms of power consumption, performance, endurance, etc.
A. Ferroelectric RAM: A candidate for unied memory
The signicant overhead in performance and energy of ash,
due to its inherent device limitations, is the primary motivator
for employing FRAM in embedded systems. Flash memory bitcells can only be written from logic 1 to logic 0. Writing a logic
1 to a cell that was previously set to logic 0 requires the ash
bitcell to be erased rst. Depending upon the ash memory
size and architecture, the smallest memory unit for erasure can
vary. As an example, for the MSP430F2132 microcontroller
used in the BlueWISP RFID platform, the smallest erasable
unit is a segment of size 512 bytes and erasing it takes 10ms
to 18ms. Moreover, an erase operation requires higher voltage
and, therefore, is energy expensive [9]. An FRAM memory
cell, is DRAM-like in structure and uses the polarization on a
ferroelectric capacitor to distinguish between the logic states
[10]. Thus, FRAM is random-access for reads and writes and
requires no erase operations. Even though FRAM involves a
destructive read, the write-back is hidden and instantaneous,
thereby presenting almost no latency overhead to the system.
Consequently, while ash memories present asymmetric readwrite latencies, FRAM access latencies are symmetric. Another
limitation of ash memory is the limited endurance that it has.
While the endurance limit for ash memory is around 105
erase/write cycles, FRAM devices have an endurance almost
10 orders of magnitude greater than ash [7].
331
Fig. 1. Q UICK R ECALL linker map
uses an NVM technology such as FRAM as unied memory.

The same non-volatile memory is partitioned by the linker to
include all the sections. The non-volatile memory now acts
as the conventional RAM as well as the ROM. As a result,
while the MCU powers off, the RAM data is saved in-situ.
Similarly, while waking up, the program can pick up the data
from exactly the same address locations. By using FRAM as the
RAM, Q UICK R ECALL is superior to previous checkpointing
schemes as there is no time or energy overhead incurred to
retain RAM data.
2) Retaining Processor State: Capturing the processor state
involves retaining the state of the microcontroller register le
which includes the program counter (PC), stack pointer (SP),
status register (SR), and General Purpose Registers (GPRs).
The number of GPRs in use depends on the program state.
For the same program at different execution stages, variable
number of GPRs might be in use. A software approach to track
the number of active GPRs would hamper the normal program
execution. Hence, Q UICK R ECALL saves the values of all the
registers onto FRAM during checkpointing. This step involves
data transfer and introduces some checkpointing overhead.
3) Retaining Microcontroller and Peripheral Settings: Common microcontroller applications use multiple peripherals to
gather data from sensors and to communicate with the external
world. The microcontroller and peripheral settings that have
to be congured before execution include GPIO directions,
GPIO functions, and clock properties. For transiently powered
computers, it is pertinent to restore the MCU and peripheral
state when waking up to resume correct program execution.
Q UICK R ECALL addresses this problem by carefully structuring
programs used for transiently powered computers. Every time
the microcontroller boots up, the conguration registers are
re-initialized to their last known state. This step contributes
to the wake-up overhead and the duration of the overhead is
application and program dependent.
Fig. 2. Q UICK R ECALL Software Flow
for Q UICK R ECALL include a checkpoint ag, in addition to

memory required to store the GPRs, SR, SP, and PC. Second,
the programmer has to specify the initialization routine in a
function which Q UICK R ECALL can use while recalling the
system state.
As shown in Figure 2, the Q UICK R ECALL software ow has
two boot sequences upon powering up. Upon boot, Q UICK R E CALL veries the checkpoint ag which is declared globally.
An unset ag indicates a normal boot sequence. The normal boot sequence initiates a call to the main() function.
The main() function begins by initializing the MCU and
peripherals, and then executes the program. While executing
the application program, the MCU is interrupted if the supply
voltage goes below a preset trigger voltage. Explanation of
how we arrived at a V trig for an example platform is given in
Section V-B. Upon entering the ISR, the program context gets
pushed onto the stack. Q UICK R ECALL proceeds with storing
the current SR, SP, and the GPRs in predened variables. Note
that these registers now point to the ISR state. Q UICK R ECALL
then proceeds to set the checkpoint ag and saves the PC. Thus,
the system is safe for a power loss and could recall this state
on the following boot. The ISR spends any remaining time
in comparing the supply voltage to the trigger voltage. If the
supply voltage rises above the trigger voltage, a reverse contextswitch takes place and the program continues till the supply
voltage drops again. Alternatively, the microcontroller can lose
power and shut off with the entire system state saved for a
future recall.
B. Software Flow
Writing applications aimed at resuming computations across
power cycles requires minor variations to the traditional embedded programming style. Previous work has tried to address this
for large-scale systems [7]. Q UICK R ECALL uses a similar ow
although we design for scenarios where the system is severely
power-deprived. Q UICK R ECALL places two requirements on
the programmer in this regard. First, the programmer has to
use the predened Q UICK R ECALL global variables which store
the state. The memory addresses of the program symbols
reside in the ELF executable. Hence, the memory map for
a particular program remains unchanged across reboots. The
extra variables required for data retention are allocated in the
bss as uninitialized global variables. The variables required
332
On the next power up, a set checkpoint ag launches the

Q UICK R ECALL boot sequence that recalls the system state. It
begins by restoring the stack pointer following which, the MCU
and peripheral subsystems are re-initialized. The stack pointer
is restored initially so that the re-initialization routine may use
the remainder of the stack without corrupting the checkpointed
portion of the stack. Q UICK R ECALLs boot sequence then stalls
execution till the supply voltage surpasses the trigger voltage.
Note that, even though the peripherals have been initialized,
if the MCU powers off before achieving the trigger voltage,
the previous state remains intact as the ISR is not triggered.
Otherwise, all the registers are reinstated and the checkpoint
ag is cleared. Q UICK R ECALL resumes by re-entering the ISR
(Figure 2). The ISR returns, the program context is popped
from the stack and the program continues execution oblivious
to the power interruption.
Q UICK R ECALL supports all normal programming paradigms
including dynamic memory allocation and nested interrupts.
Dynamic memory allocation requires no additional performance overhead as the heap is also retained in-situ in the nonvolatile memory. The data in the heap is stored as a linked-list
structure in the FRAM. The memory allocation engine stores
the control variables used to keep track of free and allocated
heap segments in the bss. Since Q UICK R ECALL retains the
state of bss across power cycles, the heap and the memory
allocation engine work seamlessly across power cycles without
presenting any overhead. Enabling nested interrupts facilitates
the Q UICK R ECALL ISR to be triggered. Note that nested
interrupts are not enabled for the Q UICK R ECALL interrupt
vector to perform checkpointing.
We modied the linker to allocate the data, bss, stack,

and heap sections to the on-chip FRAM. Note that while the
system reboots across power cycles, the global variables should
not be initialized again. Hence they are dened in the bss
section of the code. The initialization routine that congures
the MCU and peripherals, like setting GPIO directions, clock
frequency, etc., are dened in a function (say foo()). foo()
is invoked in both the main() function and Q UICK R ECALL
boot sequence. Lastly, we modied the boot sequence and the
environment pre-initialization routines as shown in Figure 2 to
implement Q UICK R ECALL.
V. D ESIGN I MPLEMENTATION
This section describes the implementation and experimental
setup for Q UICK R ECALL.
The choice of a suitable V trig is crucial for Q UICK R ECALL

to avoid unwanted wait periods and incomplete checkpoints.
The MSP430FR5739 has a non-programmable internal Supply
Voltage Supervisor (SVS) that monitors the V dd and regulates
the voltage to the microcontroller core at a constant 1.5V .
Figure 4 is a conceptual graph that shows the state of the
microcontroller with the change in V dd . The comparator monitors the V dd and its output proctors the program execution
window. In Figure 4, shaded region A denotes the region
where V dd is less than the SVSon voltage. The internal SVS
keeps the microcontroller powered off in this region. B shows
the region where the microcontroller is powered on but the
program execution is stalled. In this region, V dd is below the
predened V trig and the program waits as a supply voltage of
atleast V trig is necessary to guarantee data retention. Region
C denotes the window when the program executes. As soon
as V dd drops below V trig , the Q UICK R ECALL interrupt is
triggered and the microcontroller operation moves from region
C to D. In D, the program executes the ISR to save the
system state and any remaining time in this region is spend on
monitoring the supply voltage. The microcontroller is switched
off once V dd drops below SVSof f .
V trig has to be greater than both SVSof f and SVSon
since they dictate the microcontroller on-off states. For
the MSP430FR5739 microcontroller, the typical voltages for
SVSof f and SVSon are 1.88V and 1.93V respectively. The
minimum voltage required for a safe FRAM operation is 2.0V
[4]. The chosen V trig has to guarantee correct FRAM operation
for the duration of checkpointing. The overhead of storing
a checkpoint at a CPU frequency of 8MHz, measured using
an oscilloscope, is 8.18s. For our experimental setup, the
Fig. 4. Microcontroller State with Vdd
B. Determining Vtrig
A. Experimental setup
Fig. 3. Experimental Setup
Figure 3 shows our experimental setup. We use the Texas

Instruments, MSP-EXP430FR5739 Experimenters board [11]
for implementing Q UICK R ECALL. The board is equipped with
an MSP430FR5739 microcontroller that has 1KB of SRAM
and 16KB of FRAM [4]. An Analog Devices comparator
(CMP401), is interfaced with the GPIO pins to provide a
digital signal output after comparing a reference V trig to the
microcontrollers V dd . To supply a variable V dd , we used
a function generator and supplied a square wave at varying
frequencies and duty cycles. The observed positive supply
voltage gradient for the function generator was 1000V /s.
333
TABLE I
P ROGRAM E XECUTION T IME (C PU F REQ = 8MH Z )
Program
Q UICK R ECALL Overhead
for waking up the embedded platform, stabilizing the voltage

regulator and PLLs, and includes the overhead for recalling the
microcontroller and the peripheral state. The duration of the
initialization overhead is application and platform dependent.
For example, SENSE has a longer wake-up overhead due to
the time required for the accelerometer to settle. Restoring
overhead is the time taken to restore the checkpointed data.
For Q UICK R ECALL, this refers to the time required to populate
the GPRs, SR, SP, and PC registers upon power up. TableI shows that Q UICK R ECALL introduces constant overheads
for storing and restoring operations for each power cycle.
Comparatively, for ash-based checkpointing, the data has
to be transferred to and from the SRAM and this overhead
depends on the stack depth, number of global variables, etc.
For example, storing 100 bytes of data in ash, adds a further
8ms overhead to checkpointing. In contrast, Q UICK R ECALL
employs FRAM to implement in-situ checkpointing for the
stack, bss, etc., and thus adds zero overhead. Table-I shows
that the overhead related to data transfer is a constant 12.6s
for Q UICK R ECALL. This is an improvement of 100x-1000x
over conventional checkpointing schemes using ash2 . Thus,
Q UICK R ECALL maximizes the time utilized for meaningful
computation in each power cycle. The total runtime given in
Table-I corresponds to the time taken by a program to complete
one execution in a single life cycle.
Figure 5 compares the normalized runtime for each program.
The baseline system (normalized value of 1) is the microcontroller system, using unied FRAM memory, executing the
program across a single computation window. Figure 5 shows
that the total execution time for Q UICK R ECALL single life
cycle is the same even when SRAM is used as the data memory.
We implement a conventional checkpointing scheme (henceforth referred to as Checkpoint), which uses trigger points for
voltage comparison. Since MSP430FR5739 does not have a
ash memory, we use computed values of ash erase and
write latencies for the MSP430F2132 [9] employed in WISP,
which is used to evaluate Mementos [1]. We note that ash
read/write timing characteristic is independent of the microcontroller architecture and depends only on the memory device
architecture. We assume zero overhead for reading the data
back from ash to SRAM. The ash architecture that we
consider contains 2 segments of 512 bytes each, which can
be used for checkpointing. The erase operation is performed
when the ash segment is exhausted. Using this data, we create
approximate versions of the loop-latch and function-return
modes of Mementos for Checkpoint. As discussed in Section
II, checkpointing schemes introduce trigger points, which add
overhead to program execution. Choice between the loop-latch
mode and function-return mode has a strong dependence on the
application program and its structure. For example, slowdown
for the same CRC program in function-return mode and looplatch mode were 1.1x and 18x respectively. Comparatively,
Q UICK R ECALL does not add any overhead to normal program
execution irrespective of the program structure. Q UICK R ECALL
avoids program re-execution by a simple choice of trigger
voltage which guarantees that the ISR has enough power to
successfully complete checkpointing. The results in Figure
5 include a conventional checkpointing scheme with trigger
Total Runtime
CRC
8.18s + 1.854ms + 4.4s
551ms
RSA
8.18s + 1.854ms + 4.4s
11.12s
SENSE
8.18s + 20ms + 4.4s
79ms
a Store Overhead + Initialization Overhead + Restore Overhead
rate of decay of V dd observed once the power supply is

cut-off, is e17.56t . Using capacitor discharge equations, we
determine a V trig of 2.0003V for successfully implementing
Q UICK R ECALL.
VI. E XPERIMENTAL R ESULTS
Next, we present our experimental results and compare
Q UICK R ECALL with a state-of-the-art checkpointing solution.
A. Denitions
1) Computation Window: For our experimental setup, Computation Window (CW) is dened as the time for which the
MCU is in the ON state. This corresponds to regions B,C,
and D in Figure 4. It is important to note that we use a square
wave as described in Section V-A.
2) Slowdown: Slowdown is dened as the ratio of time taken
by the program to complete an execution across multiple power
cycles to the time taken by the same code to complete executing
in a single run, without any loss in power. Mathematically, if
the application takes n power cycles to complete its execution,
and the duration of the ith power cycle is given by CWi ,
n
Slowdown
CWi
i=1
(1)
T otalRuntime
Slowdown happens due to the overhead presented by checkpointing schemes to store and restore the system snapshot. Note
that in the above denition, the amount of time the MCU
is in the OFF state does not contribute to the calculation of
slowdown.
3) Single Life Cycle: We dene the execution of a program
in a single continuous run in the absence of power loss as a
single life cycle execution of the program.
B. Results
To evaluate Q UICK R ECALL, three test programs were used,
namely CRC, RSA and SENSE. CRC calculates a 16-bit CRC
and a 32-bit CRC of a message using polynomials. RSA
does a 64-bit encryption on 128 characters. The program then
decrypts the encrypted value and veries correctness. SENSE
senses accelerometer data, processes it using a low pass lter,
and then performs statistical computations such as nding the
minimum, maximum, mean, and standard deviation of the
collected data. SENSE implements nested interrupts as well
as dynamic memory allocation on the heap 1 .
The overhead introduced by Q UICK R ECALL per power cycle
and the single life cycle execution time for each test program
is given in Table I. Q UICK R ECALL overhead comprises of
checkpointing (storing) overhead and wake-up overhead. Wakeup overhead comprises of an initialization overhead and restoring overhead. Initialization overhead denotes the time spent
1 Mementos, a checkpointing scheme for TPCs, does not support dynamic
memory allocation.
2 When the ash is not being erased in a power cycle, the only overhead for
conventional checkpointing schemes is the write operation.
334
4.5
4
Region where conventional

Checkpointing does not work
QuickRecall Slowdown
Checkpointing Slowdown
12
Slowdown (x)
Normalized Runtime
3.5
14
SRAM Single Life cycle

QuickRecall Single Life cycle
Checkpoint Single Life cycle
QuickRecall 50ms CW
Checkpoint 50ms CW
2.5
2
1.5
4
1
2
0.5
CRC
RSA
SENSE
10
20
30
40
50
60
70
80
90
100
Computation Window (ms)
Fig. 5. Execution Time Comparison
Fig. 6. RSA Slowdown with QuickRecall Single Lifecycle as Baseline
points inserted according to the program. Function-return mode

was used for CRC while loop-latch mode was implemented for
both RSA and SENSE.
The computation window was set to 50ms by feeding
a square wave to the V dd of the experimenter board from
a function generator. Figure 5 compares the slowdown of
Q UICK R ECALL with Checkpoint. Our results show that for all
test cases, the overhead of inserted trigger points in Checkpoint
single life cycle is more than a power cycled Q UICK R ECALL
implementation with 50ms computation window.
Q UICK R ECALL has similar slowdowns for both CRC and
RSA with 50ms computation window. This is due to the
same overhead per power cycle for both the programs as
shown in Table-I. On the other hand, due to the variation
in stack depth for CRC and RSA at each checkpoint, the
overhead and thus the slowdown is signicantly different for
the two programs for Checkpoint. SENSE has a larger slowdown for both Q UICK R ECALL and Checkpoint. This is due
to the overhead presented during wake-up by the initialization
routine. The larger overhead consumes a signicant portion of
the computation window. Therefore, more number of powercycles are required to complete program execution. For ashbased systems, more power cycles mean more checkpointing
operations and hence, more erase operations depending upon
the checkpoint data size.
Figure 6 compares the slowdown of Q UICK R ECALL with
Checkpoint when executing RSA. The computation window is
varied by using the function generator. A duty cycle of 8% is
maintained throughout the experiment. Predictably, Q UICK R E CALL does not slowdown the program as much as Checkpoint
and is almost 1 for larger computation windows. Additionally
as Figure 6 shows, due to the large overhead incurred for
Checkpoint, it cannot guarantee correct operation without reexecutions for computation windows less than 15ms which is
the minimum time required for an erase and write operation.
On the other hand, Q UICK R ECALL works for computation
windows as small as 5ms without re-executions. The extremely
low overhead of Q UICK R ECALL gives a 3x improvement in the
computation window size for which the program can execute.
This is a major step in enabling TPCs to perform computations
in power-decient conditions.
which completes a complex computation across power cycles

without re-execution at any stage. Our work enables transiently
powered computers to do computations even when they receive
power for periods as low as 5ms.
ACKNOWLEDGMENT
This work was supported in part by the National Science Foundation (NSF) under grants CNS-0953468 and CCF1018358. The opinions expressed here represent those of the
authors and not necessarily of NSF.
R EFERENCES
[1] B. Ransford, J. Sorber, and K. Fu, Mementos: system support
for long-running computation on rd-scale devices, SIGPLAN Not.,
vol. 46, no. 3, pp. 159170, Mar. 2011. [Online]. Available:
http://doi.acm.org/10.1145/1961296.1950386
[2] B. Ransford, S. Clark, M. Salajegheh, and K. Fu, Getting things
done on computational rds with energy-aware checkpointing and
voltage-aware scheduling, in Proceedings of the 2008 conference on
Power aware computing and systems, ser. HotPower08. Berkeley,
CA, USA: USENIX Association, 2008, pp. 55. [Online]. Available:
http://dl.acm.org/citation.cfm?id=1855610.1855615
[3] Y. Yang, L. Wang, D. K. Noh, H. K. Le, and T. F. Abdelzaher,
Solarstore: enhancing data reliability in solar-powered storage-centric
sensor networks, in Proceedings of the 7th international conference
on Mobile systems, applications, and services, ser. MobiSys 09.
New York, NY, USA: ACM, 2009, pp. 333346. [Online]. Available:
http://doi.acm.org/10.1145/1555816.1555850
[4] Msp430fr573x datasheet, Texas Instruments, April 2013. [Online].
Available: http://www.ti.com/lit/ds/symlink/msp430fr5739.pdf
[5] J. S. Plank, M. Beck, G. Kingsley, and K. Li, Libckpt: transparent
checkpointing under unix, in Proceedings of the USENIX 1995
Technical Conference Proceedings, ser. TCON95. Berkeley, CA,
USA: USENIX Association, 1995, pp. 1818. [Online]. Available:
http://dl.acm.org/citation.cfm?id=1267411.1267429
[6] J. S. Plank, An overview of checkpointing in uniprocessor and distributedsystems, focusing on implementation and performance, Knoxville, TN,
USA, Tech. Rep., 1997.
[7] S. Baek, J. Choi, D. Lee, and S. H. Noh, Energy-efcient and highperformance software architecture for storage class memory, ACM Trans.
Embed. Comput. Syst., vol. 12, no. 3, pp. 81:181:22, Apr. 2013. [Online].
Available: http://doi.acm.org/http://dx.doi.org/10.1145/2442116.2442131
[8] A. Mirhoseini, E. Songhori, and F. Koushanfar, Idetic: A high-level synthesis approach for enabling long computations on transiently-powered
asics, in Pervasive Computing and Communications (PerCom), 2013
IEEE International Conference on, 2013, pp. 216224.
[9] Msp430f21x2 datasheet slas578j, Texas Instruments, January 2012.
[Online]. Available: http://www.ti.com/lit/ds/symlink/msp430f2132.pdf
[10] G. R. Fox, F. Chu, and T. Davenport, Current and future ferroelectric
nonvolatile memory technology, Journal of Vacuum Science and Technology B, vol. 19, no. 5, 2001.
[11] Msp-exp430fr5739
fram
experimenter
board
user
guide,
Texas
Instruments,
January
2013.
[Online].
Available:
http://www.ti.com/lit/ug/slau343b/slau343b.pdf
VII. C ONCLUSION
In this work, we have successfully implemented and demonstrated Q UICK R ECALL, a scheme which minimizes the checkpointing overhead by 100x-1000x in each power cycle and
335

QUICKRECALL: A Low Overhead HW/SW Approach For Enabling Computations Across Power Cycles in Transiently Powered Computers

Uploaded by

Copyright:

Available Formats

QUICKRECALL: A Low Overhead HW/SW Approach For Enabling Computations Across Power Cycles in Transiently Powered Computers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

QUICKRECALL: A Low Overhead HW/SW Approach For Enabling Computations Across Power Cycles in Transiently Powered Computers

Uploaded by

Copyright:

Available Formats

2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems

Q UICK R ECALL: A Low Overhead HW/SW

in new forms of memory technologies such as Ferroelectric

II. R ELATED W ORK

B. Unied Memory for TPCs

checkpoint and continues execution [5], [6]. Trigger points

IV. D ESIGN M ETHODOLOGY

Fig. 1. Q UICK R ECALL linker map

uses an NVM technology such as FRAM as unied memory.

Fig. 2. Q UICK R ECALL Software Flow

for Q UICK R ECALL include a checkpoint ag, in addition to

On the next power up, a set checkpoint ag launches the

We modied the linker to allocate the data, bss, stack,

The choice of a suitable V trig is crucial for Q UICK R ECALL

Fig. 4. Microcontroller State with Vdd

Fig. 3. Experimental Setup

Figure 3 shows our experimental setup. We use the Texas

Q UICK R ECALL Overhead

for waking up the embedded platform, stabilizing the voltage

rate of decay of V dd observed once the power supply is

Region where conventional

SRAM Single Life cycle

Computation Window (ms)

Fig. 5. Execution Time Comparison

Fig. 6. RSA Slowdown with QuickRecall Single Lifecycle as Baseline

points inserted according to the program. Function-return mode

which completes a complex computation across power cycles

You might also like