Live Migration of Virtual Machines

Live Migration of Virtual Machines
Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen † ,

Eric Jul† , Christian Limpach, Ian Pratt, Andrew Warfield
University of Cambridge Computer Laboratory † Department of Computer Science
15 JJ Thomson Avenue, Cambridge, UK University of Copenhagen, Denmark
[email protected] {jacobg,eric}@diku.dk
Abstract certain system calls or even memory accesses on behalf of

Migrating operating system instances across distinct phys- migrated processes. With virtual machine migration, on
ical hosts is a useful tool for administrators of data centers the other hand, the original host may be decommissioned
and clusters: It allows a clean separation between hard- once migration has completed. This is particularly valuable
ware and software, and facilitates fault management, load when migration is occurring in order to allow maintenance
balancing, and low-level system maintenance. of the original host.
By carrying out the majority of migration while OSes con- Secondly, migrating at the level of an entire virtual ma-
tinue to run, we achieve impressive performance with min- chine means that in-memory state can be transferred in a
imal service downtimes; we demonstrate the migration of consistent and (as will be shown) efficient fashion. This ap-
entire OS instances on a commodity cluster, recording ser- plies to kernel-internal state (e.g. the TCP control block for
vice downtimes as low as 60ms. We show that that our a currently active connection) as well as application-level
performance is sufficient to make live migration a practical state, even when this is shared between multiple cooperat-
tool even for servers running interactive loads. ing processes. In practical terms, for example, this means
In this paper we consider the design options for migrat- that we can migrate an on-line game server or streaming
ing OSes running services with liveness constraints, fo- media server without requiring clients to reconnect: some-
cusing on data center and cluster environments. We intro- thing not possible with approaches which use application-
duce and analyze the concept of writable working set, and level restart and layer 7 redirection.
present the design, implementation and evaluation of high- Thirdly, live migration of virtual machines allows a sepa-
performance OS migration built on top of the Xen VMM. ration of concerns between the users and operator of a data
center or cluster. Users have ‘carte blanche’ regarding the
software and services they run within their virtual machine,
1 Introduction and need not provide the operator with any OS-level access
at all (e.g. a root login to quiesce processes or I/O prior to
Operating system virtualization has attracted considerable migration). Similarly the operator need not be concerned
interest in recent years, particularly from the data center with the details of what is occurring within the virtual ma-
and cluster computing communities. It has previously been chine; instead they can simply migrate the entire operating
shown [1] that paravirtualization allows many OS instances system and its attendant processes as a single unit.
to run concurrently on a single physical machine with high
Overall, live OS migration is a extremelely powerful tool
performance, providing better use of physical resources
for cluster administrators, allowing separation of hardware
and isolating individual OS instances.
and software considerations, and consolidating clustered
In this paper we explore a further benefit allowed by vir- hardware into a single coherent management domain. If
tualization: that of live OS migration. Migrating an en- a physical machine needs to be removed from service an
tire OS and all of its applications as one unit allows us to administrator may migrate OS instances including the ap-
avoid many of the difficulties faced by process-level mi- plications that they are running to alternative machine(s),
gration approaches. In particular the narrow interface be- freeing the original machine for maintenance. Similarly,
tween a virtualized OS and the virtual machine monitor OS instances may be rearranged across machines in a clus-
(VMM) makes it easy avoid the problem of ‘residual deter to relieve load on congested hosts. In these situations the
pendencies’ [2] in which the original host machine must combination of virtualization and migration significantly
remain available and network-accessible in order to service improves manageability.
USENIX Association NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation 273
We have implemented high-performance migration sup- have explored migration over longer time spans by stop-
port for Xen [1], a freely available open source VMM for ping and then transferring include Internet Suspend/Re-
commodity hardware. Our design and implementation ad- sume [4] and µDenali [5].
dresses the issues and tradeoffs involved in live local-area
migration. Firstly, as we are targeting the migration of ac- Zap [6] uses partial OS virtualization to allow the migration
tive OSes hosting live services, it is critically important to of process domains (pods), essentially process groups, us-
minimize the downtime during which services are entirely ing a modified Linux kernel. Their approach is to isolate all
unavailable. Secondly, we must consider the total migra- process-to-kernel interfaces, such as file handles and sock-
tion time, during which state on both machines is synchro- ets, into a contained namespace that can be migrated. Their
nized and which hence may affect reliability. Furthermore approach is considerably faster than results in the Collec-
we must ensure that migration does not unnecessarily dis- tive work, largely due to the smaller units of migration.
rupt active services through resource contention (e.g., CPU, However, migration in their system is still on the order of
network bandwidth) with the migrating OS. seconds at best, and does not allow live migration; pods
are entirely suspended, copied, and then resumed. Further-
Our implementation addresses all of these concerns, allow- more, they do not address the problem of maintaining open
ing for example an OS running the SPECweb benchmark connections for existing services.
to migrate across two physical hosts with only 210ms un-
availability, or an OS running a Quake 3 server to migrate The live migration system presented here has considerable
with just 60ms downtime. Unlike application-level restart, shared heritage with the previous work on NomadBIOS [7],
we can maintain network connections and application state a virtualization and migration system built on top of the
during this process, hence providing effectively seamless L4 microkernel [8]. NomadBIOS uses pre-copy migration
migration from a user’s point of view. to achieve very short best-case migration downtimes, but
makes no attempt at adapting to the writable working set
We achieve this by using a pre-copy approach in which
behavior of the migrating OS.
pages of memory are iteratively copied from the source
machine to the destination host, all without ever stopping VMware has recently added OS migration support, dubbed
the execution of the virtual machine being migrated. Page- VMotion, to their VirtualCenter management software. As
level protection hardware is used to ensure a consistent this is commercial software and strictly disallows the publi-
snapshot is transferred, and a rate-adaptive algorithm is cation of third-party benchmarks, we are only able to infer
used to control the impact of migration traffic on running its behavior through VMware’s own publications. These
services. The final phase pauses the virtual machine, copies limitations make a thorough technical comparison impos-
any remaining pages to the destination, and resumes exe- sible. However, based on the VirtualCenter User’s Man-
cution there. We eschew a ‘pull’ approach which faults in ual [9], we believe their approach is generally similar to
missing pages across the network since this adds a residual ours and would expect it to perform to a similar standard.
dependency of arbitrarily long duration, as well as provid-
ing in general rather poor performance. Process migration, a hot topic in systems research during
Our current implementation does not address migration the 1980s [10, 11, 12, 13, 14], has seen very little use for
across the wide area, nor does it include support for migrat- real-world applications. Milojicic et al [2] give a thorough
ing local block devices, since neither of these are required survey of possible reasons for this, including the problem
for our target problem space. However we discuss ways in of the residual dependencies that a migrated process re-
which such support can be provided in Section 7. tains on the machine from which it migrated. Examples of
residual dependencies include open file descriptors, shared
memory segments, and other local resources. These are un-
2 Related Work desirable because the original machine must remain avail-
able, and because they usually negatively impact the per-
formance of migrated processes.
The Collective project [3] has previously explored VM mi-
gration as a tool to provide mobility to users who work on For example Sprite [15] processes executing on foreign
different physical hosts at different times, citing as an ex- nodes require some system calls to be forwarded to the
ample the transfer of an OS instance to a home computer home node for execution, leading to at best reduced perfor-
while a user drives home from work. Their work aims to mance and at worst widespread failure if the home node is
optimize for slow (e.g., ADSL) links and longer time spans, unavailable. Although various efforts were made to ame-
and so stops OS execution for the duration of the transfer, liorate performance issues, the underlying reliance on the
with a set of enhancements to reduce the transmitted image availability of the home node could not be avoided. A sim-
size. In contrast, our efforts are concerned with the migra- ilar fragility occurs with MOSIX [14] where a deputy pro-
tion of live, in-service OS instances on fast neworks with cess on the home node must remain available to support
only tens of milliseconds of downtime. Other projects that remote execution.
274 NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association
We believe the residual dependency problem cannot easily three. For example, pure stop-and-copy [3, 4, 5] involves
be solved in any process migration scheme – even modern halting the original VM, copying all pages to the destina-
mobile run-times such as Java and .NET suffer from prob- tion, and then starting the new VM. This has advantages in
lems when network partition or machine crash causes class terms of simplicity but means that both downtime and total
loaders to fail. The migration of entire operating systems migration time are proportional to the amount of physical
inherently involves fewer or zero such dependencies, mak- memory allocated to the VM. This can lead to an unaccept-
ing it more resilient and robust. able outage if the VM is running a live service.
Another option is pure demand-migration [16] in which a

short stop-and-copy phase transfers essential kernel data
3 Design
structures to the destination. The destination VM is then
started, and other pages are transferred across the network
At a high level we can consider a virtual machine to encap- on first use. This results in a much shorter downtime, but
sulate access to a set of physical resources. Providing live produces a much longer total migration time; and in prac-
migration of these VMs in a clustered server environment tice, performance after migration is likely to be unaccept-
leads us to focus on the physical resources used in such ably degraded until a considerable set of pages have been
environments: specifically on memory, network and disk. faulted across. Until this time the VM will fault on a high
This section summarizes the design decisions that we have proportion of its memory accesses, each of which initiates
made in our approach to live VM migration. We start by a synchronous transfer across the network.
describing how memory and then device access is moved The approach taken in this paper, pre-copy [11] migration,
across a set of physical hosts and then go on to a high-level balances these concerns by combining a bounded itera-
description of how a migration progresses. tive push phase with a typically very short stop-and-copy
phase. By ‘iterative’ we mean that pre-copying occurs in
rounds, in which the pages to be transferred during round
3.1 Migrating Memory n are those that are modified during round n − 1 (all pages
are transferred in the first round). Every VM will have
Moving the contents of a VM’s memory from one phys- some (hopefully small) set of pages that it updates very
ical host to another can be approached in any number of frequently and which are therefore poor candidates for pre-
ways. However, when a VM is running a live service it copy migration. Hence we bound the number of rounds of
is important that this transfer occurs in a manner that bal- pre-copying, based on our analysis of the writable working
ances the requirements of minimizing both downtime and set (WWS) behavior of typical server workloads, which we
total migration time. The former is the period during which present in Section 4.
the service is unavailable due to there being no currently
executing instance of the VM; this period will be directly Finally, a crucial additional concern for live migration is the
visible to clients of the VM as service interruption. The impact on active services. For instance, iteratively scanning
latter is the duration between when migration is initiated and sending a VM’s memory image between two hosts in
and when the original VM may be finally discarded and, a cluster could easily consume the entire bandwidth avail-
hence, the source host may potentially be taken down for able between them and hence starve the active services of
maintenance, upgrade or repair. resources. This service degradation will occur to some ex-
tent during any live migration scheme. We address this is-
It is easiest to consider the trade-offs between these require- sue by carefully controlling the network and CPU resources
ments by generalizing memory transfer into three phases: used by the migration process, thereby ensuring that it does
Push phase The source VM continues running while cer- not interfere excessively with active traffic or processing.
tain pages are pushed across the network to the new
destination. To ensure consistency, pages modified
during this process must be re-sent. 3.2 Local Resources
Stop-and-copy phase The source VM is stopped, pages
are copied across to the destination VM, then the new A key challenge in managing the migration of OS instances
VM is started. is what to do about resources that are associated with the
physical machine that they are migrating away from. While
Pull phase The new VM executes and, if it accesses a page
memory can be copied directly to the new host, connec-
that has not yet been copied, this page is faulted in
tions to local devices such as disks and network interfaces
(“pulled”) across the network from the source VM.
demand additional consideration. The two key problems
Although one can imagine a scheme incorporating all three that we have encountered in this space concern what to do
phases, most practical solutions select one or two of the with network resources and local storage.
For network resources, we want a migrated OS to maintain VM running normally on Stage 0: Pre-Migration
Host A Active VM on Host A
all open network connections without relying on forward- Alternate physical host may be preselected for migration
Block devices mirrored and free resources maintained
ing mechanisms on the original host (which may be shut
down following migration), or on support from mobility Stage 1: Reservation
Initialize a container on the target host
or redirection mechanisms that are not already present (as
in [6]). A migrating VM will include all protocol state (e.g. Overhead due to copying Stage 2: Iterative Pre-copy
Enable shadow paging
TCP PCBs), and will carry its IP address with it. Copy dirty pages in successive rounds.
To address these requirements we observed that in a clus- Downtime

(VM Out of Service)
Stage 3: Stop and copy
Suspend VM on host A
ter environment, the network interfaces of the source and Generate ARP to redirect traffic to Host B
destination machines typically exist on a single switched Synchronize all remaining VM state to Host B
LAN. Our solution for managing migration with respect to Stage 4: Commitment
network in this environment is to generate an unsolicited VM state on Host A is released
ARP reply from the migrated host, advertising that the IP VM running normally on
Stage 5: Activation
Host B
has moved to a new location. This will reconfigure peers VM starts on Host B
Connects to local devices
to send packets to the new physical address, and while a Resumes normal operation
very small number of in-flight packets may be lost, the mi-
grated domain will be able to continue using open connec- Figure 1: Migration timeline
tions with almost no observable interference.
Some routers are configured not to accept broadcast ARP to system failure than when it is running on the original sin-
replies (in order to prevent IP spoofing), so an unsolicited gle host. To achieve this, we view the migration process as
ARP may not work in all scenarios. If the operating system a transactional interaction between the two hosts involved:
is aware of the migration, it can opt to send directed replies
only to interfaces listed in its own ARP cache, to remove Stage 0: Pre-Migration We begin with an active VM on
the need for a broadcast. Alternatively, on a switched net- physical host A. To speed any future migration, a tar-
work, the migrating OS can keep its original Ethernet MAC get host may be preselected where the resources re-
address, relying on the network switch to detect its move to quired to receive migration will be guaranteed.
a new port1 .
Stage 1: Reservation A request is issued to migrate an OS
In the cluster, the migration of storage may be similarly ad- from host A to host B. We initially confirm that the
dressed: Most modern data centers consolidate their stor- necessary resources are available on B and reserve a
age requirements using a network-attached storage (NAS) VM container of that size. Failure to secure resources
device, in preference to using local disks in individual here means that the VM simply continues to run on A
servers. NAS has many advantages in this environment, in- unaffected.
cluding simple centralised administration, widespread ven-
Stage 2: Iterative Pre-Copy During the first iteration, all
dor support, and reliance on fewer spindles leading to a
pages are transferred from A to B. Subsequent itera-
reduced failure rate. A further advantage for migration is
tions copy only those pages dirtied during the previous
that it obviates the need to migrate disk storage, as the NAS
transfer phase.
is uniformly accessible from all host machines in the clus-
ter. We do not address the problem of migrating local-disk Stage 3: Stop-and-Copy We suspend the running OS in-
storage in this paper, although we suggest some possible stance at A and redirect its network traffic to B. As
strategies as part of our discussion of future work. described earlier, CPU state and any remaining incon-
sistent memory pages are then transferred. At the end
of this stage there is a consistent suspended copy of
3.3 Design Overview the VM at both A and B. The copy at A is still con-
sidered to be primary and is resumed in case of failure.
The logical steps that we execute when migrating an OS are
Stage 4: Commitment Host B indicates to A that it has
summarized in Figure 1. We take a conservative approach
successfully received a consistent OS image. Host A
to the management of migration with regard to safety and
acknowledges this message as commitment of the mi-
failure handling. Although the consequences of hardware
gration transaction: host A may now discard the orig-
failures can be severe, our basic principle is that safe mi-
inal VM, and host B becomes the primary host.
gration should at no time leave a virtual OS more exposed
1 Note
Stage 5: Activation The migrated VM on B is now ac-
that on most Ethernet controllers, hardware MAC filtering will
tivated. Post-migration code runs to reattach device
have to be disabled if multiple addresses are in use (though some cards
support filtering of multiple addresses in hardware) and so this technique drivers to the new machine and advertise moved IP
is only practical for switched networks. addresses.
Tracking the Writable Working Set of SPEC CINT2000
80000
gzip vpr gcc mcf crafty parser eon perlbmk gap vortex bzip2 twolf
70000
60000
Number of pages
50000
40000
30000
20000
10000
0
0 2000 4000 6000 8000 10000 12000
Elapsed time (secs)
Figure 2: WWS curve for a complete run of SPEC CINT2000 (512MB VM)
This approach to failure management ensures that at least is: how does one determine when it is time to stop the pre-
one host has a consistent VM image at all times during copy phase because too much time and resource is being
migration. It depends on the assumption that the original wasted? Clearly if the VM being migrated never modifies
host remains stable until the migration commits, and that memory, a single pre-copy of each memory page will suf-
the VM may be suspended and resumed on that host with fice to transfer a consistent image to the destination. How-
no risk of failure. Based on these assumptions, a migra- ever, should the VM continuously dirty pages faster than
tion request essentially attempts to move the VM to a new the rate of copying, then all pre-copy work will be in vain
host, and on any sort of failure execution is resumed locally, and one should immediately stop and copy.
aborting the migration.
In practice, one would expect most workloads to lie some-
where between these extremes: a certain (possibly large)
set of pages will seldom or never be modified and hence are
4 Writable Working Sets good candidates for pre-copy, while the remainder will be
written often and so should best be transferred via stop-and-
When migrating a live operating system, the most signif- copy – we dub this latter set of pages the writable working
icant influence on service performance is the overhead of set (WWS) of the operating system by obvious extension
coherently transferring the virtual machine’s memory im- of the original working set concept [17].
age. As mentioned previously, a simple stop-and-copy ap-
In this section we analyze the WWS of operating systems
proach will achieve this in time proportional to the amount
running a range of different workloads in an attempt to ob-
of memory allocated to the VM. Unfortunately, during this
tain some insight to allow us build heuristics for an efficient
time any running services are completely unavailable.
and controllable pre-copy implementation.
A more attractive alternative is pre-copy migration, in
which the memory image is transferred while the operat-
ing system (and hence all hosted services) continue to run. 4.1 Measuring Writable Working Sets
The drawback however, is the wasted overhead of trans-
ferring memory pages that are subsequently modified, and To trace the writable working set behaviour of a number of
hence must be transferred again. For many workloads there representative workloads we used Xen’s shadow page ta-
will be a small set of memory pages that are updated very bles (see Section 5) to track dirtying statistics on all pages
frequently, and which it is not worth attempting to maintain used by a particular executing operating system. This al-
coherently on the destination machine before stopping and lows us to determine within any time period the set of pages
copying the remainder of the VM. written to by the virtual machine.
The fundamental question for iterative pre-copy migration Using the above, we conducted a set of experiments to sam-
Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime
(Based on a page trace of Linux Kernel Compile) (Based on a page trace of OLTP Database Benchmark)
4 Migration throughput: 128 Mbit/sec 9000 4 Migration throughput: 128 Mbit/sec 8000
Rate of page dirtying (pages/sec)

3.5 8000 3.5 7000
Expected downtime (sec)

7000
3 3 6000
6000
2.5 2.5 5000
5000
2 2 4000
4000
1.5 1.5 3000
3000
1 2000 1 2000
0.5 1000 0.5 1000
0 0 0 0
0 100 200 300 400 500 600 0 200 400 600 800 1000 1200
Elapsed time (sec) Elapsed time (sec)
4 9000 4 8000

Migration throughput: 256 Mbit/sec Migration throughput: 256 Mbit/sec
3.5 8000 3.5 7000

7000
3 3 6000
6000
2.5 2.5 5000
5000
2 2 4000
4000
1.5 1.5 3000
3000
1 2000 1 2000
0.5 1000 0.5 1000
0 0 0 0
0 100 200 300 400 500 600 0 200 400 600 800 1000 1200
4 9000 4 8000

Migration throughput: 512 Mbit/sec Migration throughput: 512 Mbit/sec
3.5 8000 3.5 7000

7000
3 3 6000
6000
2.5 2.5 5000
5000
2 2 4000
4000
1.5 1.5 3000
3000
1 2000 1 2000
0.5 1000 0.5 1000
0 0 0 0
0 100 200 300 400 500 600 0 200 400 600 800 1000 1200
Figure 3: Expected downtime due to last-round memory Figure 4: Expected downtime due to last-round memory
copy on traced page dirtying of a Linux kernel compile. copy on traced page dirtying of OLTP.
Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime
(Based on a page trace of Quake 3 Server) (Based on a page trace of SPECweb)
0.5 Migration throughput: 128 Mbit/sec 600 Migration throughput: 128 Mbit/sec
9 14000

8
500 12000
0.4
7
400 10000
6
0.3
5 8000
300
0.2 4 6000
200 3
4000
0.1 2
100
2000
1
0 0 0 0
0 100 200 300 400 500 0 100 200 300 400 500 600 700
0.5 600
Migration throughput: 256 Mbit/sec 9 14000

Migration throughput: 256 Mbit/sec
8
500 12000
0.4
7
10000
400 6
0.3
5 8000
300
4 6000
0.2
200 3
4000
0.1 2
100
2000
1
0 0 0 0
0 100 200 300 400 500 0 100 200 300 400 500 600 700
0.5 600 9 14000



8
500 12000
0.4 7
10000
400 6
0.3
5 8000
300
4 6000
0.2
200 3
4000
0.1 2
100 2000
1
0 0 0 0
0 100 200 300 400 500 0 100 200 300 400 500 600 700
Figure 5: Expected downtime due to last-round memory Figure 6: Expected downtime due to last-round memory
copy on traced page dirtying of a Quake 3 server. copy on traced page dirtying of SPECweb.
ple the writable working set size for a variety of bench- the first thing to observe is that pre-copy migration al-
marks. Xen was running on a dual processor Intel Xeon ways performs considerably better than naive stop-and-
2.4GHz machine, and the virtual machine being measured copy. For a 512MB virtual machine this latter approach
had a memory allocation of 512MB. In each case we started would require 32, 16, and 8 seconds downtime for the
the relevant benchmark in one virtual machine and read 128Mbit/sec, 256Mbit/sec and 512Mbit/sec bandwidths re-
the dirty bitmap every 50ms from another virtual machine, spectively. Even in the worst case (the starting phase of
cleaning it every 8 seconds – in essence this allows us to SPECweb), a single pre-copy iteration reduces downtime
compute the WWS with a (relatively long) 8 second win- by a factor of four. In most cases we can expect to do
dow, but estimate it at a finer (50ms) granularity. considerably better – for example both the Linux kernel
compile and the OLTP benchmark typically experience a
The benchmarks we ran were SPEC CINT2000, a Linux
reduction in downtime of at least a factor of sixteen.
kernel compile, the OSDB OLTP benchmark using Post-
greSQL and SPECweb99 using Apache. We also measured The remaining three lines show, in order, the effect of per-
a Quake 3 server as we are particularly interested in highly forming a total of two, three or four pre-copy iterations
interactive workloads. prior to the final stop-and-copy round. In most cases we
see an increased reduction in downtime from performing
Figure 2 illustrates the writable working set curve produced these additional iterations, although with somewhat dimin-
for the SPEC CINT2000 benchmark run. This benchmark ishing returns, particularly in the higher bandwidth cases.
involves running a series of smaller programs in order and
measuring the overall execution time. The x-axis measures This is because all the observed workloads exhibit a small
elapsed time, and the y-axis shows the number of 4KB but extremely frequently updated set of ‘hot’ pages. In
pages of memory dirtied within the corresponding 8 sec- practice these pages will include the stack and local vari-
ond interval; the graph is annotated with the names of the ables being accessed within the currently executing pro-
sub-benchmark programs. cesses as well as pages being used for network and disk
traffic. The hottest pages will be dirtied at least as fast as
From this data we observe that the writable working set we can transfer them, and hence must be transferred in the
varies significantly between the different sub-benchmarks. final stop-and-copy phase. This puts a lower bound on the
For programs such as ‘eon’ the WWS is a small fraction of best possible service downtime for a particular benchmark,
the total working set and hence is an excellent candidate for network bandwidth and migration start time.
migration. In contrast, ‘gap’ has a consistently high dirty-
ing rate and would be problematic to migrate. The other This interesting tradeoff suggests that it may be worthwhile
benchmarks go through various phases but are generally increasing the amount of bandwidth used for page transfer
amenable to live migration. Thus performing a migration in later (and shorter) pre-copy iterations. We will describe
of an operating system will give different results depending our rate-adaptive algorithm based on this observation in
on the workload and the precise moment at which migra- Section 5, and demonstrate its effectiveness in Section 6.
tion begins.
5 Implementation Issues
4.2 Estimating Migration Effectiveness
We designed and implemented our pre-copying migration
We observed that we could use the trace data acquired to engine to integrate with the Xen virtual machine moni-
estimate the effectiveness of iterative pre-copy migration tor [1]. Xen securely divides the resources of the host ma-
for various workloads. In particular we can simulate a par- chine amongst a set of resource-isolated virtual machines
ticular network bandwidth for page transfer, determine how each running a dedicated OS instance. In addition, there is
many pages would be dirtied during a particular iteration, one special management virtual machine used for the ad-
and then repeat for successive iterations. Since we know ministration and control of the machine.
the approximate WWS behaviour at every point in time, we
can estimate the overall amount of data transferred in the fi- We considered two different methods for initiating and
nal stop-and-copy round and hence estimate the downtime. managing state transfer. These illustrate two extreme points
in the design space: managed migration is performed
Figures 3–6 show our results for the four remaining work- largely outside the migratee, by a migration daemon run-
loads. Each figure comprises three graphs, each of which ning in the management VM; in contrast, self migration is
corresponds to a particular network bandwidth limit for implemented almost entirely within the migratee OS with
page transfer; each individual graph shows the WWS his- only a small stub required on the destination machine.
togram (in light gray) overlaid with four line plots estimat-
In the following sections we describe some of the imple-
ing service downtime for up to four pre-copying rounds.
mentation details of these two approaches. We describe
Looking at the topmost line (one pre-copy iteration), how we use dynamic network rate-limiting to effectively
balance network contention against OS downtime. We then time for remaining inconsistent memory pages, and these
proceed to describe how we ameliorate the effects of rapid are transferred to the destination together with the VM’s
page dirtying, and describe some performance enhance- checkpointed CPU-register state.
ments that become possible when the OS is aware of its
Once this final information is received at the destination,
migration — either through the use of self migration, or by
the VM state on the source machine can safely be dis-
adding explicit paravirtualization interfaces to the VMM.
carded. Control software on the destination machine scans
the memory map and rewrites the guest’s page tables to re-
flect the addresses of the memory pages that it has been
5.1 Managed Migration allocated. Execution is then resumed by starting the new
VM at the point that the old VM checkpointed itself. The
Managed migration is performed by migration daemons OS then restarts its virtual device drivers and updates its
running in the management VMs of the source and destina- notion of wallclock time.
tion hosts. These are responsible for creating a new VM on
the destination machine, and coordinating transfer of live Since the transfer of pages is OS agnostic, we can easily
system state over the network. support any guest operating system – all that is required is
a small paravirtualized stub to handle resumption. Our im-
When transferring the memory image of the still-running plementation currently supports Linux 2.4, Linux 2.6 and
OS, the control software performs rounds of copying in NetBSD 2.0.
which it performs a complete scan of the VM’s memory
pages. Although in the first round all pages are transferred
to the destination machine, in subsequent rounds this copy- 5.2 Self Migration
ing is restricted to pages that were dirtied during the pre-
vious round, as indicated by a dirty bitmap that is copied In contrast to the managed method described above, self
from Xen at the start of each round. migration [18] places the majority of the implementation
During normal operation the page tables managed by each within the OS being migrated. In this design no modifi-
guest OS are the ones that are walked by the processor’s cations are required either to Xen or to the management
MMU to fill the TLB. This is possible because guest OSes software running on the source machine, although a migra-
are exposed to real physical addresses and so the page tation stub must run on the destination machine to listen for
bles they create do not need to be mapped to physical ad- incoming migration requests, create an appropriate empty
dresses by Xen. VM, and receive the migrated system state.
To log pages that are dirtied, Xen inserts shadow page ta- The pre-copying scheme that we implemented for self mi-
bles underneath the running OS. The shadow tables are gration is conceptually very similar to that for managed mi-
populated on demand by translating sections of the guest gration. At the start of each pre-copying round every page
page tables. Translation is very simple for dirty logging: mapping in every virtual address space is write-protected.
all page-table entries (PTEs) are initially read-only map- The OS maintains a dirty bitmap tracking dirtied physical
pings in the shadow tables, regardless of what is permitted pages, setting the appropriate bits as write faults occur. To
by the guest tables. If the guest tries to modify a page of discriminate migration faults from other possible causes
memory, the resulting page fault is trapped by Xen. If write (for example, copy-on-write faults, or access-permission
access is permitted by the relevant guest PTE then this per- faults) we reserve a spare bit in each PTE to indicate that it
mission is extended to the shadow PTE. At the same time, is write-protected only for dirty-logging purposes.
we set the appropriate bit in the VM’s dirty bitmap. The major implementation difficulty of this scheme is to
transfer a consistent OS checkpoint. In contrast with a
When the bitmap is copied to the control software at the
managed migration, where we simply suspend the migra-
start of each pre-copying round, Xen’s bitmap is cleared
tee to obtain a consistent checkpoint, self migration is far
and the shadow page tables are destroyed and recreated as
harder because the OS must continue to run in order to
the migratee OS continues to run. This causes all write per-
transfer its final state. We solve this difficulty by logically
missions to be lost: all pages that are subsequently updated
checkpointing the OS on entry to a final two-stage stop-
are then added to the now-clear dirty bitmap.
and-copy phase. The first stage disables all OS activity ex-
When it is determined that the pre-copy phase is no longer cept for migration and then peforms a final scan of the dirty
beneficial, using heuristics derived from the analysis in bitmap, clearing the appropriate bit as each page is trans-
Section 4, the OS is sent a control message requesting that ferred. Any pages that are dirtied during the final scan, and
it suspend itself in a state suitable for migration. This that are still marked as dirty in the bitmap, are copied to a
causes the OS to prepare for resumption on the destina- shadow buffer. The second and final stage then transfers the
tion machine; Xen informs the control software once the contents of the shadow buffer — page updates are ignored
OS has done this. The dirty bitmap is scanned one last during this transfer.
5.3 Dynamic Rate-Limiting 10000
Transferred pages
It is not always appropriate to select a single network 8000
bandwidth limit for migration traffic. Although a low

limit avoids impacting the performance of running services, 6000
4kB pages
analysis in Section 4 showed that we must eventually pay
in the form of an extended downtime because the hottest 4000
pages in the writable working set are not amenable to pre-
copy migration. The downtime can be reduced by increas-
2000
ing the bandwidth limit, albeit at the cost of additional net-
work contention.
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Our solution to this impasse is to dynamically adapt the Iterations
bandwidth limit during each pre-copying round. The ad-

ministrator selects a minimum and a maximum bandwidth Figure 7: Rogue-process detection during migration of a
limit. The first pre-copy round transfers pages at the mini- Linux kernel build. After the twelfth iteration a maximum
mum bandwidth. Each subsequent round counts the num- limit of forty write faults is imposed on every process, dras-
ber of pages dirtied in the previous round, and divides this tically reducing the total writable working set.
by the duration of the previous round to calculate the dirty-
ing rate. The bandwidth limit for the next round is then
unfortunate behaviour we scan the VM’s physical memory
determined by adding a constant increment to the previ-
space in a pseudo-random order.
ous round’s dirtying rate — we have empirically deter-
mined that 50Mbit/sec is a suitable value. We terminate
pre-copying when the calculated rate is greater than the ad-
ministrator’s chosen maximum, or when less than 256KB 5.5 Paravirtualized Optimizations
remains to be transferred. During the final stop-and-copy
phase we minimize service downtime by transferring mem- One key benefit of paravirtualization is that operating sys-
ory at the maximum allowable rate. tems can be made aware of certain important differences
between the real and virtual environments. In terms of mi-
As we will show in Section 6, using this adaptive scheme
gration, this allows a number of optimizations by informing
results in the bandwidth usage remaining low during the
the operating system that it is about to be migrated – at this
transfer of the majority of the pages, increasing only at
stage a migration stub handler within the OS could help
the end of the migration to transfer the hottest pages in the
improve performance in at least the following ways:
WWS. This effectively balances short downtime with low
average network contention and CPU usage.
Stunning Rogue Processes. Pre-copy migration works
best when memory pages can be copied to the destination
5.4 Rapid Page Dirtying host faster than they are dirtied by the migrating virtual ma-
chine. This may not always be the case – for example, a test
Our working-set analysis in Section 4 shows that every OS program which writes one word in every page was able to
workload has some set of pages that are updated extremely dirty memory at a rate of 320 Gbit/sec, well ahead of the
frequently, and which are therefore not good candidates transfer rate of any Ethernet interface. This is a synthetic
for pre-copy migration even when using all available net- example, but there may well be cases in practice in which
work bandwidth. We observed that rapidly-modified pages pre-copy migration is unable to keep up, or where migra-
are very likely to be dirtied again by the time we attempt tion is prolonged unnecessarily by one or more ‘rogue’ ap-
to transfer them in any particular pre-copying round. We plications.
therefore periodically ‘peek’ at the current round’s dirty
bitmap and transfer only those pages dirtied in the previ- In both the managed and self migration cases, we can miti-
ous round that have not been dirtied again at the time we gate against this risk by forking a monitoring thread within
scan them. the OS kernel when migration begins. As it runs within the
OS, this thread can monitor the WWS of individual pro-
We further observed that page dirtying is often physically cesses and take action if required. We have implemented
clustered — if a page is dirtied then it is disproportionally a simple version of this which simply limits each process
likely that a close neighbour will be dirtied soon after. This to 40 write faults before being moved to a wait queue – in
increases the likelihood that, if our peeking does not detect essence we ‘stun’ processes that make migration difficult.
one page in a cluster, it will detect none. To avoid this This technique works well, as shown in Figure 7, although
one must be careful not to stun important interactive ser- pass transfers 776MB and lasts for 62 seconds, at which
vices. point the migration algorithm described in Section 5 in-
creases its rate over several iterations and finally suspends
the VM after a further 9.8 seconds. The final stop-and-copy
Freeing Page Cache Pages. A typical operating system phase then transfer the remaining pages and the web server
will have a number of ‘free’ pages at any time, ranging resumes at full rate after a 165ms outage.
from truly free (page allocator) to cold buffer cache pages.
When informed a migration is to begin, the OS can sim- This simple example demonstrates that a highly loaded
ply return some or all of these pages to Xen in the same server can be migrated with both controlled impact on live
way it would when using the ballooning mechanism de- services and a short downtime. However, the working set
scribed in [1]. This means that the time taken for the first of the server in this case is rather small, and so this should
“full pass” iteration of pre-copy migration can be reduced, be expected to be a relatively easy case for live migration.
sometimes drastically. However should the contents of
these pages be needed again, they will need to be faulted
back in from disk, incurring greater overall cost. 6.3 Complex Web Workload: SPECweb99
A more challenging Apache workload is presented by

6 Evaluation SPECweb99, a complex application-level benchmark for
evaluating web servers and the systems that host them. The
workload is a complex mix of page requests: 30% require
In this section we present a thorough evaluation of our im-
dynamic content generation, 16% are HTTP POST opera-
plementation on a wide variety of workloads. We begin by
tions, and 0.5% execute a CGI script. As the server runs, it
describing our test setup, and then go on explore the mi-
generates access and POST logs, contributing to disk (and
gration of several workloads in detail. Note that none of
therefore network) throughput.
the experiments in this section use the paravirtualized opti-
mizations discussed above since we wished to measure the A number of client machines are used to generate the load
baseline performance of our system. for the server under test, with each machine simulating
a collection of users concurrently accessing the web site.
SPECweb99 defines a minimum quality of service that
6.1 Test Setup each user must receive for it to count as ‘conformant’; an
aggregate bandwidth in excess of 320Kbit/sec over a series
We perform test migrations between an identical pair of of requests. The SPECweb score received is the number
Dell PE-2650 server-class machines, each with dual Xeon of conformant users that the server successfully maintains.
2GHz CPUs and 2GB memory. The machines have The considerably more demanding workload of SPECweb
Broadcom TG3 network interfaces and are connected via represents a challenging candidate for migration.
switched Gigabit Ethernet. In these experiments only a sin-
We benchmarked a single VM running SPECweb and
gle CPU was used, with HyperThreading enabled. Storage
recorded a maximum score of 385 conformant clients —
is accessed via the iSCSI protocol from an NetApp F840
we used the RedHat gnbd network block device in place of
network attached storage server except where noted other-
iSCSI as the lighter-weight protocol achieves higher per-
wise. We used XenLinux 2.4.27 as the operating system in
formance. Since at this point the server is effectively in
all cases.
overload, we then relaxed the offered load to 90% of max-
imum (350 conformant connections) to represent a more
realistic scenario.
6.2 Simple Web Server
Using a virtual machine configured with 800MB of mem-
We begin our evaluation by examining the migration of an ory, we migrated a SPECweb99 run in the middle of its
Apache 1.3 web server serving static content at a high rate. execution. Figure 9 shows a detailed analysis of this mi-
Figure 8 illustrates the throughput achieved when continu- gration. The x-axis shows time elapsed since start of migra-
ously serving a single 512KB file to a set of one hundred tion, while the y-axis shows the network bandwidth being
concurrent clients. The web server virtual machine has a used to transfer pages to the destination. Darker boxes il-
memory allocation of 800MB. lustrate the page transfer process while lighter boxes show
the pages dirtied during each iteration. Our algorithm ad-
At the start of the trace, the server achieves a consistent
justs the transfer rate relative to the page dirty rate observed
throughput of approximately 870Mbit/sec. Migration starts
during the previous round (denoted by the height of the
twenty seven seconds into the trace but is initially rate-
lighter boxes).
limited to 100Mbit/sec (12% CPU), resulting in the server
throughput dropping to 765Mbit/s. This initial low-rate As in the case of the static web server, migration begins
Effect of Migration on Web Server Transmission Rate
1000 1st precopy, 62 secs further iterations
870 Mbit/sec
9.8 secs
765 Mbit/sec
Throughput (Mbit/sec)
800
600
694 Mbit/sec
400
165ms total downtime
200
512Kb files Sample over 100ms
100 concurrent clients Sample over 500ms
0
0 10 20 30 40 50 60 70 80 90 100 110 120 130
Elapsed time (secs)
Figure 8: Results of migrating a running web server VM.
600
Iterative Progress of Live Migration: SPECweb99
350 Clients (90% of max load), 800MB VM
Total Data Transmitted: 960MB (x1.20) In the final iteration, the domain is suspended. The remaining
18.2 MB of dirty pages are sent and the VM resumes execution
500 Area of Bars: on the remote machine. In addition to the 201ms required to 18.2 MB
VM memory transfered
copy the last round of data, an additional 9ms elapse while the 15.3 MB
Memory dirtied during this iteration
VM starts up. The total downtime for this experiment is 210ms.
14.2 MB
Transfer Rate (Mbit/sec)
400
16.7 MB
24.2 MB
300
The first iteration involves a long, relatively low-rate transfer of
the VM’s memory. In this example, 676.8 MB are transfered in
54.1 seconds. These early phases allow non-writable working
200
set data to be transfered with a low impact on active services. 28.4 MB
100
676.8 MB 126.7 MB 39.0 MB
0
0 50 55 60 65 70
Elapsed Time (sec)
Figure 9: Results of migrating a running SPECweb VM.
with a long period of low-rate transmission as a first pass conformant clients. This result is an excellent validation of
is made through the memory of the virtual machine. This our approach: a heavily (90% of maximum) loaded server
first round takes 54.1 seconds and transmits 676.8MB of is migrated to a separate physical host with a total migra-
memory. Two more low-rate rounds follow, transmitting tion time of seventy-one seconds. Furthermore the migra-
126.7MB and 39.0MB respectively before the transmission tion does not interfere with the quality of service demanded
rate is increased. by SPECweb’s workload. This illustrates the applicability
of migration as a tool for administrators of demanding live
The remainder of the graph illustrates how the adaptive al-
services.
gorithm tracks the page dirty rate over successively shorter
iterations before finally suspending the VM. When suspen-
sion takes place, 18.2MB of memory remains to be sent.
6.4 Low-Latency Server: Quake 3
This transmission takes 201ms, after which an additional
9ms is required for the domain to resume normal execu-
Another representative application for hosting environ-
tion.
ments is a multiplayer on-line game server. To determine
The total downtime of 210ms experienced by the the effectiveness of our approach in this case we config-
SPECweb clients is sufficiently brief to maintain the 350 ured a virtual machine with 64MB of memory running a
Packet interarrival time during Quake 3 migration
Packet flight time (secs)
0.12
Migration 2
Migration 1
downtime: 50ms
downtime: 48ms
0.1
0.08
0.06
0.04
0.02
0
0 10 20 30 40 50 60 7
Elapsed time (secs)
Figure 10: Effect on packet response time of migrating a running Quake 3 server VM.
450
Iterative Progress of Live Migration: Quake 3 Server 0.1 MB
6 Clients, 64MB VM 0.2 MB
Total Data Transmitted: 88MB (x1.37) The final iteration in this case leaves only 148KB of data to
0.8 MB
400 transmit. In addition to the 20ms required to copy this last
Area of Bars: round, an additional 40ms are spent on start-up overhead. The
VM memory transfered total downtime experienced is 60ms.
350
300
1.1 MB
250
1.2 MB
200
0.9 MB
1.2 MB
150
1.6 MB
100 56.3 MB 20.4 MB 4.6 MB
50
0
0 4.5 5 5.5 6 6.5 7
Elapsed Time (sec)
Figure 11: Results of migrating a running Quake 3 server VM.
Quake 3 server. Six players joined the game and started to a transient increase in response time of 50ms. In neither
play within a shared arena, at which point we initiated a case was this perceptible to the players.
migration to another machine. A detailed analysis of this
migration is shown in Figure 11.
The trace illustrates a generally similar progression as for 6.5 A Diabolical Workload: MMuncher
SPECweb, although in this case the amount of data to be
transferred is significantly smaller. Once again the trans- As a final point in our evaluation, we consider the situation
fer rate increases as the trace progresses, although the final in which a virtual machine is writing to memory faster than
stop-and-copy phase transfers so little data (148KB) that can be transferred across the network. We test this diaboli-
the full bandwidth is not utilized. cal case by running a 512MB host with a simple C program
that writes constantly to a 256MB region of memory. The
Overall, we are able to perform the live migration with a to- results of this migration are shown in Figure 12.
tal downtime of 60ms. To determine the effect of migration
on the live players, we performed an additional experiment In the first iteration of this workload, we see that half of
in which we migrated the running Quake 3 server twice the memory has been transmitted, while the other half is
and measured the inter-arrival time of packets received by immediately marked dirty by our test program. Our algo-
clients. The results are shown in Figure 10. As can be seen, rithm attempts to adapt to this by scaling itself relative to
from the client point of view migration manifests itself as the perceived initial rate of dirtying; this scaling proves in-
1000
Iterative Progress of Live Migration: Diabolical Workload 7.2 Wide Area Network Redirection
512MB VM, Constant writes to 256MB region.
Total Data Transmitted: 638MB (x1.25)
800 Area of Bars:

VM memory transfered Our layer 2 redirection scheme works efficiently and with
600 remarkably low outage on modern gigabit networks. How-
116.0 MB 222.5 MB ever, when migrating outside the local subnet this mech-
400 In the first iteration, the workload
dirties half of memory. The other half
anism will not suffice. Instead, either the OS will have to
44.0 MB
200
is transmitted, both bars are equal. obtain a new IP address which is within the destination sub-
255.4 MB
net, or some kind of indirection layer, on top of IP, must ex-
0 ist. Since this problem is already familiar to laptop users,
0 5 10 15 20 25
Elapsed Time (sec) a number of different solutions have been suggested. One
of the more prominent approaches is that of Mobile IP [19]
Figure 12: Results of migrating a VM running a diabolical where a node on the home network (the home agent) for-
workload. wards packets destined for the client (mobile node) to a
care-of address on the foreign network. As with all residual
dependencies this can lead to both performance problems
sufficient, as the rate at which the memory is being written and additional failure modes.
becomes apparent. In the third round, the transfer rate is
scaled up to 500Mbit/s in a final attempt to outpace the Snoeren and Balakrishnan [20] suggest addressing the
memory writer. As this last attempt is still unsuccessful, problem of connection migration at the TCP level, aug-
the virtual machine is suspended, and the remaining dirty menting TCP with a secure token negotiated at connection
pages are copied, resulting in a downtime of 3.5 seconds. time, to which a relocated host can refer in a special SYN
Fortunately such dirtying rates appear to be rare in real packet requesting reconnection from a new IP address. Dy-
workloads. namic DNS updates are suggested as a means of locating
hosts after a move.
7 Future Work
7.3 Migrating Block Devices
Although our solution is well-suited for the environment
we have targeted – a well-connected data-center or cluster Although NAS prevails in the modern data center, some
with network-accessed storage – there are a number of ar- environments may still make extensive use of local disks.
eas in which we hope to carry out future work. This would These present a significant problem for migration as they
allow us to extend live migration to wide-area networks, are usually considerably larger than volatile memory. If the
and to environments that cannot rely solely on network- entire contents of a disk must be transferred to a new host
attached storage. before migration can complete, then total migration times
may be intolerably extended.
This latency can be avoided at migration time by arrang-

7.1 Cluster Management ing to mirror the disk contents at one or more remote hosts.
For example, we are investigating using the built-in soft-
In a cluster environment where a pool of virtual machines ware RAID and iSCSI functionality of Linux to implement
are hosted on a smaller set of physical servers, there are disk mirroring before and during OS migration. We imag-
great opportunities for dynamic load balancing of proces- ine a similar use of software RAID-5, in cases where data
sor, memory and networking resources. A key challenge on disks requires a higher level of availability. Multiple
is to develop cluster control software which can make in- hosts can act as storage targets for one another, increasing
formed decision as to the placement and movement of vir- availability at the cost of some network traffic.
tual machines.
The effective management of local storage for clusters of
A special case of this is ‘evacuating’ VMs from a node that virtual machines is an interesting problem that we hope to
is to be taken down for scheduled maintenance. A sensible further explore in future work. As virtual machines will
approach to achieving this is to migrate the VMs in increas- typically work from a small set of common system images
ing order of their observed WWS. Since each VM migrated (for instance a generic Fedora Linux installation) and make
frees resources on the node, additional CPU and network individual changes above this, there seems to be opportu-
becomes available for those VMs which need it most. We nity to manage copy-on-write system images across a clus-
are in the process of building a cluster controller for Xen ter in a way that facilitates migration, allows replication,
systems. and makes efficient use of local disks.
8 Conclusion kernel-based systems. In Proceedings of the sixteenth
ACM Symposium on Operating System Principles,
pages 66–77. ACM Press, 1997.
By integrating live OS migration into the Xen virtual ma-
chine monitor we enable rapid movement of interactive [9] VMWare, Inc. VMWare VirtualCenter Version 1.2
workloads within clusters and data centers. Our dynamic User’s Manual. 2004.
network-bandwidth adaptation allows migration to proceed
[10] Michael L. Powell and Barton P. Miller. Process mi-
with minimal impact on running services, while reducing
gration in DEMOS/MP. In Proceedings of the ninth
total downtime to below discernable thresholds.
ACM Symposium on Operating System Principles,
Our comprehensive evaluation shows that realistic server pages 110–119. ACM Press, 1983.
workloads such as SPECweb99 can be migrated with just
[11] Marvin M. Theimer, Keith A. Lantz, and David R.
210ms downtime, while a Quake3 game server is migrated
Cheriton. Preemptable remote execution facilities for
with an imperceptible 60ms outage.
the V-system. In Proceedings of the tenth ACM Sym-
posium on Operating System Principles, pages 2–12.
ACM Press, 1985.
References
[12] Eric Jul, Henry Levy, Norman Hutchinson, and An-
[1] Paul Barham, Boris Dragovic, Keir Fraser, Steven drew Black. Fine-grained mobility in the emerald sys-
Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian tem. ACM Trans. Comput. Syst., 6(1):109–133, 1988.
Pratt, and Andrew Warfield. Xen and the art of virtu-
[13] Fred Douglis and John K. Ousterhout. Transparent
alization. In Proceedings of the nineteenth ACM sym-
process migration: Design alternatives and the Sprite
posium on Operating Systems Principles (SOSP19),
implementation. Software - Practice and Experience,
pages 164–177. ACM Press, 2003.
21(8):757–785, 1991.
[2] D. Milojicic, F. Douglis, Y. Paindaveine, R. Wheeler,
[14] A. Barak and O. La’adan. The MOSIX multicom-
and S. Zhou. Process migration. ACM Computing
puter operating system for high performance cluster
Surveys, 32(3):241–299, 2000.
computing. Journal of Future Generation Computer
[3] C. P. Sapuntzakis, R. Chandra, B. Pfaff, J. Chow, Systems, 13(4-5):361–372, March 1998.
M. S. Lam, and M.Rosenblum. Optimizing the mi- [15] J. K. Ousterhout, A. R. Cherenson, F. Douglis, M. N.
gration of virtual computers. In Proc. of the 5th Sym- Nelson, and B. B. Welch. The Sprite network oper-
posium on Operating Systems Design and Implemen- ating system. Computer Magazine of the Computer
tation (OSDI-02), December 2002. Group News of the IEEE Computer Group Society, ;
[4] M. Kozuch and M. Satyanarayanan. Internet sus- ACM CR 8905-0314, 21(2), 1988.
pend/resume. In Proceedings of the IEEE Work- [16] E. Zayas. Attacking the process migration bottle-
shop on Mobile Computing Systems and Applications, neck. In Proceedings of the eleventh ACM Symposium
2002. on Operating systems principles, pages 13–24. ACM
[5] Andrew Whitaker, Richard S. Cox, Marianne Shaw, Press, 1987.
and Steven D. Gribble. Constructing services with [17] Peter J. Denning. Working Sets Past and Present.
interposable virtual hardware. In Proceedings of the IEEE Transactions on Software Engineering, SE-
First Symposium on Networked Systems Design and 6(1):64–84, January 1980.
Implementation (NSDI ’04), 2004.
[18] Jacob G. Hansen and Eric Jul. Self-migration of op-
[6] S. Osman, D. Subhraveti, G. Su, and J. Nieh. The de- erating systems. In Proceedings of the 11th ACM
sign and implementation of zap: A system for migrat- SIGOPS European Workshop (EW 2004), pages 126–
ing computing environments. In Proc. 5th USENIX 130, 2004.
Symposium on Operating Systems Design and Im-
plementation (OSDI-02), pages 361–376, December [19] C. E. Perkins and A. Myles. Mobile IP. Pro-
2002. ceedings of International Telecommunications Sym-
posium, pages 415–419, 1997.
[7] Jacob G. Hansen and Asger K. Henriksen. Nomadic
operating systems. Master’s thesis, Dept. of Com- [20] Alex C. Snoeren and Hari Balakrishnan. An end-to-
puter Science, University of Copenhagen, Denmark, end approach to host mobility. In Proceedings of the
2002. 6th annual international conference on Mobile com-
puting and networking, pages 155–166. ACM Press,
[8] Hermann Härtig, Michael Hohmuth, Jochen Liedtke, 2000.
and Sebastian Schönberg. The performance of micro-

Live Migration of Virtual Machines

Uploaded by

Copyright:

Available Formats

Live Migration of Virtual Machines

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Live Migration of Virtual Machines

Uploaded by

Copyright:

Available Formats

Live Migration of Virtual Machines

Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen † ,

Abstract certain system calls or even memory accesses on behalf of

Another option is pure demand-migration [16] in which a

To address these requirements we observed that in a clus- Downtime

Rate of page dirtying (pages/sec)

Rate of page dirtying (pages/sec)

Expected downtime (sec)

0.5 1000 0.5 1000

Rate of page dirtying (pages/sec)

Rate of page dirtying (pages/sec)

Expected downtime (sec)

0.5 1000 0.5 1000

Rate of page dirtying (pages/sec)

Rate of page dirtying (pages/sec)

Expected downtime (sec)

0.5 1000 0.5 1000

Rate of page dirtying (pages/sec)

Migration throughput: 256 Mbit/sec 9 14000

Rate of page dirtying (pages/sec)

Expected downtime (sec)

0.5 600 9 14000

Migration throughput: 512 Mbit/sec

Migration throughput: 512 Mbit/sec

Expected downtime (sec)

It is not always appropriate to select a single network 8000

bandwidth limit for migration traffic. Although a low

bandwidth limit during each pre-copying round. The ad-

A more challenging Apache workload is presented by

Figure 8: Results of migrating a running web server VM.

Figure 9: Results of migrating a running SPECweb VM.

100 56.3 MB 20.4 MB 4.6 MB

Figure 11: Results of migrating a running Quake 3 server VM.

800 Area of Bars:

This latency can be avoided at migration time by arrang-

You might also like