A Survey of Present Research On Virtual Cluster

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

A Survey of Present Research on Virtual Cluster

Computer Architecture Group Chemnitz University of Technology September 10, 2010

Abstract
As the demand of computing power keeps on growing, modern computing cluster expands in both performance and dimension by comprising more and more hardwares. This requires signicant eort for system maintenance. Meanwhile, although the application of virtual machines on high performance computing is considered as an eective solution to simlify hardware management issues, it had not gained much popular use on HPC as expected. The aim of this survey is to identify the factors which limites the application of virtual machine on HPC and lay the groundwork for the virtual cluster project by providing relatively comprehensive, up to date information on current research related to virtual machines for use in the context of a large cluster. This survey covers a broad range of information sources, namely the proceedings of conferences, journal, doctral dissertations and papers available on the Internet. We summarize key papers and make listings of research groups undertaking projects that we feel are worth taking note of. From the gathered information we identify unsolved problems and potential areas of future work.

Contents
1 Completed research work 1.1 Introduction . . . . . . . . . . . . . . . . . . 1.2 OS-customization and Ease of Management 1.3 Performance and Resource Isolation . . . . 1.4 Checkpoint/Restart and Fault Tolerance . . 1.5 VM Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 3 4 5 6 6 7 7 7 1

Bottlenecks and Challenges 2.1 Performance Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Incompatilbility problem with Paravirtualization . . . . . . . . . . . . . . 2.3 Challenges for VM application on HPC . . . . . . . . . . . . . . . . . . . Batch System for Virtual Machines

Opptunities for our research

8 8

References

1 Completed research work


1.1 Introduction
The application of virtual machines on high performance computing oers many desirable features such as system security, ease of management, OS-customization, performance isolation, checkpoint/restart and live migration, which can be very benecial for the performance as well as maintenance of the computing cluster. We make a brief survey on some of the onging and completed works dealing with various aspects of virtual machines on high performance computing below:

1.2 OS-customization and Ease of Management


Limited by the physical hardware, traditional high performance computing cluster can oer only a specic operating system and libaries, no freedom for the user to make their own choice. Besides, management of virtual machines on a large scale computing cluster poses much challenge today. By their very nature, the application of virtual machines on high performance computing can remove such a limitation by enabling instantiation of new, independently congurated guest operating systems on a multiple of ecient, isolated virtual machines over a single physical machine.[1] Shi etc. al. [2] introduces a design of VNIX, a system very helpful for the computing cluster management. VNIX provides a more exible way to manage virtual machines in a complex cluster environment. It also provides a whole set of tools for monitoring, deploying and controlling virtual machines. These features can be very useful for administrators to improve resource utilization and isolation of various users on practical high performance computing cluster or grid computing. Grit etc. al. [3] focus on achitectural and algorithmic issues for resource management policy and presents Shirako, a system for on-demand leasing of shared networked resources for clusters. This system enables a exible way of distributing functions of resource management across the participants in a federated system. It can accommodate a range of distributed virtual computing models. In additon they also extends Shirako to provide ne-grained virtual machine slivers and drive virtual machine migration. As the problem of energy cost becomes a concern for high performance computing cluster, Verma etc. al [4] focuses on the use of power management techniques for high performance computing applications on modern power-ecient servers with virtulization support.

They considered to adopt techniques such as dynamic consolidation, dynamic power range enabled by low states on servers. Morerover, they identied application isolation and virtualization overhead with multiple virtual machines as the key bottlenecks for server consolidation. With these insight, a framework and methodology of power-aware application placement for high performance computing application were introduced. K. Begnum etc. al. [5] suggested three new metrics with which the state of infrastructure for virtualization can be described and analyzed in order to incorperate virtual machine management closer into policy. The metrics have been implemented into the latest release of their MLN(Manage Large Networks) tool for virtual machine management. An amazing thing introduced by R. G. Minnich etc. al. [6] is Megatux, a set of tools under developing known as booting one million Linux virtual machines on the Thunderbird (4660 nodes) and 555 000 on the Hyperion (1024 nodes) cluster. In contrast to existing high performance computing systems, Megatux allows all systems to be booted at the same time with a static conguration les dening the role of each node.

1.3 Performance and Resource Isolation


While multiple virtual machines running on the same physical hardware, resource consumption of a virtual machine may span several driver domains. A misbehaving virtual machine may impact the performance of other well-behaving ones because of hardware sharing and concurrence. This is undesirable for high performance comouting, which depends on a stable mutual cooperation among the nodes it hosts. Besides, security and reliability of the system are all common concerns from a users point of view. Performance isolation was then introduced to eliminate the negative impact by means of resource isolation. J. N. Matthews etc. al. [7] presented a design of performance isolation benchmark suite that measures the degree to which a virtualization system limits such an negative inuence. Their work demonstrated isolation degrees of virtualization. Full virtualization protects well against misbehaving virtual machines in all stress tests. Paravirtulization oers excellent resource isolation as well. While the results of the operating-system-level virtualization vary from case to case due to a tightly coupled system. Based on Xen, D. Gupta etc. al. [8] suggested a design of a set of primitives to address this issue by allocating, accounting and measuring the resource consumption per-VM accurately. The evaluation indicates an eective enforced performance isolation for a variety of workloads and congurations. T. Cucinotta etc. al. [9] tackled the problem of providing Quality of Service guarantees to virtualized applications on computing and networking. They suggested a possible implementation of a good-level of isolation between the concurrently running virtual machines based on a strategy of CPU-real-time scheduling at the virtualization layer.

1.4 Checkpoint/Restart and Fault Tolerance


Fault-tolerance and reliability can limite the application scalability especially for a large-scale computing clusters. Checkpoint/Restart enabled by virtual machines is regarded as an eective solution to issues as process migration, crash recovery/rollback transaction and system administration. With crash recovery and rollback transaction, a process can easily return to a previously checkpointed state. It is especially useful for long-running applications such as scientic computation. Incremental checkpoint can be used to reduce the overhead. System administrators can checkpoint processes before shutting down a machine and restart them after the machine is up again or on another machine. Although checkpoint/restart is useful, it still remains mainly a research subject rather than an adoption in production. The reasons are:[10] Lack of support from popular operating systems: Most operating systems such as Unix were not designed for checkpoint/restart. It is very hard to add such function without signicant modication of the kernel. Lack of commercial demand: Checkpoint/restart is primarily used for high performance distributed systems. There is not yet a large demand in the broader computing market. Transparency and reliability: Checkpoint/restart should be both transparent and reliable for general use. This is dicult. Zhong etc. al. [10] developed a general-purpose Checkpoint/Restart package CRAK for popular operating systems like Linux. CRAK enables transparent migration of Linux networked applications and computing environment without modifying, recompiling or relinking applications or the operating system. It supports migrating network sockets, without requiring neither modications of existing operating system nor modications of application code. Chanchio etc. al. [11] proposed CEVM (Checkpointing-Enabled Virtual Machine) system. It enables implicit system-level fault-tolerance also without modifying existing operating systems or applications. CEVM can minimize the space and time overheads. In an analysis report by B. J. Kim [12], a comprehensive comparision of several important Checkpoint Systems was drawn. Though CRAK and BLCR are relatively common used Checkpoint Systems, neither they support virtualization. Zap is a completion of CRAK by adding an operating system virtualization layer. It is one of the few checkpoint/restart systems with support to transparent migration of legacy and networked applications. However, the implementation of Zap is complex and frigile than that of the VMMigration.[13] Instead, Kevin Lee [14] demonstrates a better performance MIGSOCK gained over Zap, hence he suggests a hybrid system of MIGSOCK and Zap as a viable approach to checkpoint and restart network process. 4

M. H. Sun etc. al. [15] implemented a fast lightweight virtual machine checkpoint solution for Xen based on copy-on-write techniques. It outperforms the next best solution using a pre-copy-based strategy.

1.5 VM Migration
Migration is one of the most useful features with virtual machine technology. Performance, management and fault-tolerance can all be benetted from it. It enables to migrate the running operating system instances across distinct physical nodes to achieve a load-balance. This is also very helpful to reduce migration time as well as performance impact on hosted applications. Besides, resources can be exploited more fully by processes migrated from one node to another. Huang etc. al. [16] proposed a high performance virtual machine migration design by using RDMA (Remote Direct Memory Access). The total migration overhead is drastically reduced with the help of RDMA. Moreover, live migration of virtual machines often allows workload movement with a short service downtime. In his dissertation dedicated to implement a design of virtual machine mobility without adding functions to the virtual machine hosting environment, Dr. J. G. Hansen contributed an algorithm that can live-migrate a running operating system between two physical hosts, with very little downtime, and without any support from the underlying virtual machine monitor or host environment. Then based on these, he introduced a cluster-management system that supports job mobility and with a high degree of exibility. On top of this system it comes to a reality for the running of real-life workloads, rapid instantation of jobs using hypercube network-forks.[17] Other works on this area includes the following: Nomad [18], a design for migrating on a virtual cluster environments running virtual machines, is ecient to migrate the network resources, even in environments with stringent communication performance requirements. An implementation of fast transparent migration for virtual machines [19], allows the entire running virtual machine to be migrated from one physical node to another in a completely transparent (to the application, OS and remote clients) manner. The method of physical memory migrating is critical for it. Contrasting to the traditional pre-copy approach, an implementation of post-copy live migration of virtual machines [18] can reduce the total migration time while maintaining the liveness of the virtual machine. Last but not least, with a glance at the security with virtual machine live-migration, J. Oberheide etc. al. [20] identied three classes of threats to virtual machine 5

migration: control plane, data plane and migration module threats. With these expirence, strategies for reinforcing the security of softwares in virtualization and in live migration process are presented.

2 Bottlenecks and Challenges


2.1 Performance Overhead
Benets of virtual machines are obtained at the cost of performance overhead by adding a virtualization layer between the operating system instances and the native hardwares. One solution on modern virtual machine monitors such as Xen and VMware allows guest virtual machines to access hardwares directly without intervention of the virtual machine monitor. But such direct access optimizations are not always possible. Based on the concept of OS-bypass, Huang proposed the application of a similar concept, VMM-bybass, which allows guest OSes to access the physical hardware without intervention from virtual machine monitor whenever neccessary. To achieve a comparatively low overhead, Huang made a study on the following aspects [21]: To reduce the network I/O virtualization overhead in modern computing systems with high speed interconnects and multi-core architecture. To reduce the cost of inter-VM communication on the same physical host, which is especially important for multi-core architecture. To design ecient middleware so as to enable a transparent running for applications in VM-based environments. To further reduce management overhead for VM-based computing environments with the help of modern high speed interconnects. To a certain extent, with the application of VMM-bypass, a major performance bottleneck for vitual machines was removed. They use an inter-VM communication library [22, 23] to support shared memory communication between distinct virtual machines on the same physical host in a Xen environment. Further improvements are achieved with the applications of one-copy communication protocol and a RDMA (Remote Dynamic Memory Access)-based high performance virtual machine migration design. RDMA which not only speeds up the virtual machine migration process but also simplies management of the virtual machine migration. With KVM hypervisor and Qemu as its userspace device emulator, F. Diakhate etc. al. [24] elaborated the inter-VM communications by a design and implementation of a virtual device. Test demonstrated a near native performance in MPI pingpong benchmark and can even outperform the native MPI implementation without privileged code on the host.

2.2 Incompatilbility problem with Paravirtualization


Requirement of kernel modication for Xen-based virtual infrastructure may not be desirable. It suggests a nature of incompatibility with paravirtualization as the biggest drawback. Despite Vmware, the leader for x86 virtulization, provides with its on-the-y binary translator, the paravirtualized approach is still favored by the academic research projects. Latest improvement on the x86 processor from both Intel and AMD with their respective Vandelpool and Pacica features, oers direct excution virtualization. The limitation is likely to be removed with the use of new processors.[25]

2.3 Challenges for VM application on HPC


Challenges for applying virtual machines on the high performance computing can be classied as the followings: Development of a virtualization solution suitable for high performance computing Development of tools and methods for the management of virtual systems and, Use of advanced capabilities enabled by virtualization, such as virtual machine pause/unpause, virtual machine chackpoint/restart, and virtual machine migration. The solution proposed by S. L. Scott etc. al. [26] is a newly developed virtual machine application for high performance computing with a small footprint and a set of fault-tolerance. The system management tools are able to take advantage of the capabilities provided by the system-level virtulization. Correspondingly, their fault-tolerance eort led to the implementation of a framework for new reactive, proactive, or hybrid fault tolerance policies, based on capabilities such as virtual machine pause/unpause, checkpoint/restart, and migration. The system management eorts led to an integrated solution for the management of virtual systems.

3 Batch System for Virtual Machines


Batch system accepts computational jobs from a central access point and distributes the individual jobs to one or more nodes within the cluster. The Batch System itself does a wrap job rather than excute it. Among the much eorts towards applying batch system controlling the virtual machines on a cluster, the following works deserve to be specially mentioned: In 2005, W. Emenecker etc. al. designed and began to implement VMRM (virtual machine resource manager) mainly for an high performance computing application with many nodes in a single excution. VMRM is designed for supporting of many actions that atomically change the state of the system by creating, destroying, preserving, restoring, 7

checkpointing, and migrating. Atomic operations will ensure that each operation leaves the system in a consistent state. In additon, more than one types of virtual machines including Xen, VMware, Qemu etc. may be hosted by VMRM. Another particular feature with VMRM is a support for queries, responding with information about the state of the system, statistics and predictions about operations. Queries can determine how many virtual machines are in use, which nodes can support the dierent types of VMs, and predict how long a provisioning operation will run. Instead of limiting to use a single scheduler or user, VMRM will accept XML messages from the network, and respond with XML results. This will allow any program that can send and receive XML messages to use the VMRMs capabilities. For each image, the administrator can specify many IPs and hostnames. The VMRM accepts XML messages, and performs actions or responds to the queries based on the message. The XML document type denitions are dened to interact with the VMRM. So external programs and tools can use the VMRM without needing to worry about the detailed managing virtual machines.

4 Opptunities for our research


To increase the reliability of HPC system, one way is to use checkingpoint to save the state of an application. W. Emeneker etc. al suggests Lazy Synchronous Checkpoint as an implementation of Dynamic Virtual Clustering to enable complete transparent parallel checkpointing. Results from the prototype of Lazy Synchronous Checkpoint are encouraging without for a cluster without too many nodes. However, to apply it on a larger cluster requires much work to increase the rubustness of this solution. Therefore to extend LSC to enable parallel migration is the next task to increase cluster reliability with Dynamic Virtual Clusters. Furthermore, Parallel Checkpoint with advanced networks like Inniband is currently functioning, but does so only with BLCR. A similar extention of DVC parallel checkpoint working with Inniband is also desired, but requires to develop the drivers capable of excution in virtual machines.[27]

References
[1] W. Emenecker and A. Apon, HPC Virtual Machine Resource Management. MG08, January 29-February 3 2008, Baton Rouge, LA, USA, 2008. [2] X. H. Shi, H. Y. Tan, S. Wu and H. Jin, VNIX: Managing Virtual Machines on Clusters. 2008 Japan-China Joint Workshop on Frontier of Computer Science and Technology, Wuhan, China, December, 2008.

[3] L. Grit, D. Irwin, A. Yumerefendi and J. Chase, Virtual Machine Hosting for Networked Clusters: Building the Foundations for Autonomic Orchestration,. Department of Computer Science, Duke University. USA. [4] A. Verma, P. Ahuja and A. Neogi, Power-aware Dynamic Placement of HPC Applications. ICS08, Iune 7-12, 2008, Island of Kos, aegean Sea, Greece, 2008 [5] K. Begnum and M. Disney, Decision Support for Virtual Machine Re-Provisioning in Production Environments 21st Large Installation System Administration Conference (LISA 07), Oslo University College, Norway, Sep. 2007. [6] R. G. Minnich and D. W. Rudish, Ten Million and One Penguins, or, Lessons Learned from booting millions of virtual machines on HPC systems. Workshop on System-level Virtualization for High Performance Computing in conjunction with EuroSys 2010, Paris, France, April 13, 2010. [7] J. N. Matthews, W. Hu, M. Hapuarachchi, T. Deshane, D. Dimatos, G. Hamilton, M. McCabe and J. Owens, Quantifying the Performance Isolation Properties of Virtualization Systems. ExpCS07, 1314, San Diego, CA, USA, June, 2007. [8] D. Gupta, L. Cherkasova, R. Gardner and A. Vahdat, Enforcing Performance Isolation Across Virtual Machines in Xen. University of California, San Diego, CA, USA, 2008. [9] T. Cucinotta, D. Giani, D. Faggioli and F. Checconi, Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling. Scuola Superiore SantAnna, Pisa, Italy, July, 2010. [10] H. Zhong, J. Nieh, CRAK:Linux Checkpoint/Restart As a Kernel Module. Technical Report CUCS-014-01, Department of Computer Science, Columbia University, Columbia, USA, November 2001. [11] K. Chanchio, C. Leangsuksun, H. Ong, V. Ratanasamoot and A. Sha, An Efcient Virtual Machine Checkpointing Mechanism for Hypervisor-based HPC systems. USA, April 8, 2008. [12] B. J. Kim, Comparing of the Existing Checkpoint Systems. Watson/IBM, October 12, 2005. [13] T. Tannenbaum, Zap & VMMigration: Zap and Virtual Machine Process Migration. Distributed Systems, University of Wisconsin, Madison, USA, February 3, 2006. [14] K. Lee, MIGSOCK vs. Zap. Carnegie Mellon University, Pittsburg, USA, May 2, 2004 [15] M. H. Sun and D. M. Blough, Fast Lightweight Virtual Machine Checkpointing. Georgia Institute of Technology, May 2010

[16] W. Huang, Q. Gao, J. Liu and D. K. Panda, High Performance Virtual Machine Migration with RDMA over Modern Interconnects. Computer Science and Engineering, Ohio State University, Columbus, Ohio, USA, 2008. [17] J. G. Hansen, Virtual Machine Mobility with Self-Migration. Ph.D Thesis, Department of Computer Science, University of Copenhagen, Copenhagen, Denmark, April 7, 2009. [18] W. Huang, J. Liu, M. Koop, B. Abali and D. K. Panda, Nomad: Migrating OSbypass Networks in Virtual Machines. Computer Science and Engineering, Ohio State University, Columbus, Ohio, USA, June, 2006. [19] M. Nelson, B. Lim and G. Hutchins, Fast Tranparent Migration for Virtual Machines. VMware, Inc, Palo Alto, Canada, March, 2005. [20] M. R. Hines, U. Deshpande and K. Golalan, Post -Copy Live Migration of Virtual Machines. Computer Science, Binghamton University, May, 2009. [21] J. Oberheide, E. Cooke and F. Jahanian, Empirical Exploitation of Live Virtual Machine Migration. Electrical Engineering and Computer Science department, University of Michigan, Michigan, USA, June, 2007. [22] W. Huang, High Performance Network I/O in Virtual Machines Over Modern Interconnects. Dissertation Presented in Partial Fulllment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University, Ohio, USA, 2008. [23] W. Huang, M. J. Koop, Q. Gao and D. K. Panda, Virtual Machine Aware Communication Libraries for High Performance Computing. SC 07 November 10-16, 2007, Reno, Nevada, USA, 2007. [24] F. Diakhate, M. Perache, R. Namyst and H. Jourden, Ecient shared memory message passing for inter-VM communications. CEA DAM Ile de France, France, March, 2009. [25] I. Mevag, Towards Automatic Management and Live Migration of Virtual Machines. Master Thesis, Oslo University College, Oslo, Norway, May 23, 2007. [26] S. L. Scott, G. Vallee, T. Naughton, A. Tikotekar, C. Engelmann and H. Ong, System-Level Virtualization Research at Oak Ridge National Laboratory. Oak Ridge National Laboratory, Oak Ridge, USA, Jan, 2008. [27] W. Emenecker and D. Stanzione, High Performance Computing Initiative. Fulton School of Engineering Arizona State University, USA, October, 2006.

10

You might also like