A Survey of Present Research On Virtual Cluster
A Survey of Present Research On Virtual Cluster
A Survey of Present Research On Virtual Cluster
Abstract
As the demand of computing power keeps on growing, modern computing cluster expands in both performance and dimension by comprising more and more hardwares. This requires signicant eort for system maintenance. Meanwhile, although the application of virtual machines on high performance computing is considered as an eective solution to simlify hardware management issues, it had not gained much popular use on HPC as expected. The aim of this survey is to identify the factors which limites the application of virtual machine on HPC and lay the groundwork for the virtual cluster project by providing relatively comprehensive, up to date information on current research related to virtual machines for use in the context of a large cluster. This survey covers a broad range of information sources, namely the proceedings of conferences, journal, doctral dissertations and papers available on the Internet. We summarize key papers and make listings of research groups undertaking projects that we feel are worth taking note of. From the gathered information we identify unsolved problems and potential areas of future work.
Contents
1 Completed research work 1.1 Introduction . . . . . . . . . . . . . . . . . . 1.2 OS-customization and Ease of Management 1.3 Performance and Resource Isolation . . . . 1.4 Checkpoint/Restart and Fault Tolerance . . 1.5 VM Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 3 4 5 6 6 7 7 7 1
Bottlenecks and Challenges 2.1 Performance Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Incompatilbility problem with Paravirtualization . . . . . . . . . . . . . . 2.3 Challenges for VM application on HPC . . . . . . . . . . . . . . . . . . . Batch System for Virtual Machines
8 8
References
They considered to adopt techniques such as dynamic consolidation, dynamic power range enabled by low states on servers. Morerover, they identied application isolation and virtualization overhead with multiple virtual machines as the key bottlenecks for server consolidation. With these insight, a framework and methodology of power-aware application placement for high performance computing application were introduced. K. Begnum etc. al. [5] suggested three new metrics with which the state of infrastructure for virtualization can be described and analyzed in order to incorperate virtual machine management closer into policy. The metrics have been implemented into the latest release of their MLN(Manage Large Networks) tool for virtual machine management. An amazing thing introduced by R. G. Minnich etc. al. [6] is Megatux, a set of tools under developing known as booting one million Linux virtual machines on the Thunderbird (4660 nodes) and 555 000 on the Hyperion (1024 nodes) cluster. In contrast to existing high performance computing systems, Megatux allows all systems to be booted at the same time with a static conguration les dening the role of each node.
M. H. Sun etc. al. [15] implemented a fast lightweight virtual machine checkpoint solution for Xen based on copy-on-write techniques. It outperforms the next best solution using a pre-copy-based strategy.
1.5 VM Migration
Migration is one of the most useful features with virtual machine technology. Performance, management and fault-tolerance can all be benetted from it. It enables to migrate the running operating system instances across distinct physical nodes to achieve a load-balance. This is also very helpful to reduce migration time as well as performance impact on hosted applications. Besides, resources can be exploited more fully by processes migrated from one node to another. Huang etc. al. [16] proposed a high performance virtual machine migration design by using RDMA (Remote Direct Memory Access). The total migration overhead is drastically reduced with the help of RDMA. Moreover, live migration of virtual machines often allows workload movement with a short service downtime. In his dissertation dedicated to implement a design of virtual machine mobility without adding functions to the virtual machine hosting environment, Dr. J. G. Hansen contributed an algorithm that can live-migrate a running operating system between two physical hosts, with very little downtime, and without any support from the underlying virtual machine monitor or host environment. Then based on these, he introduced a cluster-management system that supports job mobility and with a high degree of exibility. On top of this system it comes to a reality for the running of real-life workloads, rapid instantation of jobs using hypercube network-forks.[17] Other works on this area includes the following: Nomad [18], a design for migrating on a virtual cluster environments running virtual machines, is ecient to migrate the network resources, even in environments with stringent communication performance requirements. An implementation of fast transparent migration for virtual machines [19], allows the entire running virtual machine to be migrated from one physical node to another in a completely transparent (to the application, OS and remote clients) manner. The method of physical memory migrating is critical for it. Contrasting to the traditional pre-copy approach, an implementation of post-copy live migration of virtual machines [18] can reduce the total migration time while maintaining the liveness of the virtual machine. Last but not least, with a glance at the security with virtual machine live-migration, J. Oberheide etc. al. [20] identied three classes of threats to virtual machine 5
migration: control plane, data plane and migration module threats. With these expirence, strategies for reinforcing the security of softwares in virtualization and in live migration process are presented.
checkpointing, and migrating. Atomic operations will ensure that each operation leaves the system in a consistent state. In additon, more than one types of virtual machines including Xen, VMware, Qemu etc. may be hosted by VMRM. Another particular feature with VMRM is a support for queries, responding with information about the state of the system, statistics and predictions about operations. Queries can determine how many virtual machines are in use, which nodes can support the dierent types of VMs, and predict how long a provisioning operation will run. Instead of limiting to use a single scheduler or user, VMRM will accept XML messages from the network, and respond with XML results. This will allow any program that can send and receive XML messages to use the VMRMs capabilities. For each image, the administrator can specify many IPs and hostnames. The VMRM accepts XML messages, and performs actions or responds to the queries based on the message. The XML document type denitions are dened to interact with the VMRM. So external programs and tools can use the VMRM without needing to worry about the detailed managing virtual machines.
References
[1] W. Emenecker and A. Apon, HPC Virtual Machine Resource Management. MG08, January 29-February 3 2008, Baton Rouge, LA, USA, 2008. [2] X. H. Shi, H. Y. Tan, S. Wu and H. Jin, VNIX: Managing Virtual Machines on Clusters. 2008 Japan-China Joint Workshop on Frontier of Computer Science and Technology, Wuhan, China, December, 2008.
[3] L. Grit, D. Irwin, A. Yumerefendi and J. Chase, Virtual Machine Hosting for Networked Clusters: Building the Foundations for Autonomic Orchestration,. Department of Computer Science, Duke University. USA. [4] A. Verma, P. Ahuja and A. Neogi, Power-aware Dynamic Placement of HPC Applications. ICS08, Iune 7-12, 2008, Island of Kos, aegean Sea, Greece, 2008 [5] K. Begnum and M. Disney, Decision Support for Virtual Machine Re-Provisioning in Production Environments 21st Large Installation System Administration Conference (LISA 07), Oslo University College, Norway, Sep. 2007. [6] R. G. Minnich and D. W. Rudish, Ten Million and One Penguins, or, Lessons Learned from booting millions of virtual machines on HPC systems. Workshop on System-level Virtualization for High Performance Computing in conjunction with EuroSys 2010, Paris, France, April 13, 2010. [7] J. N. Matthews, W. Hu, M. Hapuarachchi, T. Deshane, D. Dimatos, G. Hamilton, M. McCabe and J. Owens, Quantifying the Performance Isolation Properties of Virtualization Systems. ExpCS07, 1314, San Diego, CA, USA, June, 2007. [8] D. Gupta, L. Cherkasova, R. Gardner and A. Vahdat, Enforcing Performance Isolation Across Virtual Machines in Xen. University of California, San Diego, CA, USA, 2008. [9] T. Cucinotta, D. Giani, D. Faggioli and F. Checconi, Providing Performance Guarantees to Virtual Machines using Real-Time Scheduling. Scuola Superiore SantAnna, Pisa, Italy, July, 2010. [10] H. Zhong, J. Nieh, CRAK:Linux Checkpoint/Restart As a Kernel Module. Technical Report CUCS-014-01, Department of Computer Science, Columbia University, Columbia, USA, November 2001. [11] K. Chanchio, C. Leangsuksun, H. Ong, V. Ratanasamoot and A. Sha, An Efcient Virtual Machine Checkpointing Mechanism for Hypervisor-based HPC systems. USA, April 8, 2008. [12] B. J. Kim, Comparing of the Existing Checkpoint Systems. Watson/IBM, October 12, 2005. [13] T. Tannenbaum, Zap & VMMigration: Zap and Virtual Machine Process Migration. Distributed Systems, University of Wisconsin, Madison, USA, February 3, 2006. [14] K. Lee, MIGSOCK vs. Zap. Carnegie Mellon University, Pittsburg, USA, May 2, 2004 [15] M. H. Sun and D. M. Blough, Fast Lightweight Virtual Machine Checkpointing. Georgia Institute of Technology, May 2010
[16] W. Huang, Q. Gao, J. Liu and D. K. Panda, High Performance Virtual Machine Migration with RDMA over Modern Interconnects. Computer Science and Engineering, Ohio State University, Columbus, Ohio, USA, 2008. [17] J. G. Hansen, Virtual Machine Mobility with Self-Migration. Ph.D Thesis, Department of Computer Science, University of Copenhagen, Copenhagen, Denmark, April 7, 2009. [18] W. Huang, J. Liu, M. Koop, B. Abali and D. K. Panda, Nomad: Migrating OSbypass Networks in Virtual Machines. Computer Science and Engineering, Ohio State University, Columbus, Ohio, USA, June, 2006. [19] M. Nelson, B. Lim and G. Hutchins, Fast Tranparent Migration for Virtual Machines. VMware, Inc, Palo Alto, Canada, March, 2005. [20] M. R. Hines, U. Deshpande and K. Golalan, Post -Copy Live Migration of Virtual Machines. Computer Science, Binghamton University, May, 2009. [21] J. Oberheide, E. Cooke and F. Jahanian, Empirical Exploitation of Live Virtual Machine Migration. Electrical Engineering and Computer Science department, University of Michigan, Michigan, USA, June, 2007. [22] W. Huang, High Performance Network I/O in Virtual Machines Over Modern Interconnects. Dissertation Presented in Partial Fulllment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University, Ohio, USA, 2008. [23] W. Huang, M. J. Koop, Q. Gao and D. K. Panda, Virtual Machine Aware Communication Libraries for High Performance Computing. SC 07 November 10-16, 2007, Reno, Nevada, USA, 2007. [24] F. Diakhate, M. Perache, R. Namyst and H. Jourden, Ecient shared memory message passing for inter-VM communications. CEA DAM Ile de France, France, March, 2009. [25] I. Mevag, Towards Automatic Management and Live Migration of Virtual Machines. Master Thesis, Oslo University College, Oslo, Norway, May 23, 2007. [26] S. L. Scott, G. Vallee, T. Naughton, A. Tikotekar, C. Engelmann and H. Ong, System-Level Virtualization Research at Oak Ridge National Laboratory. Oak Ridge National Laboratory, Oak Ridge, USA, Jan, 2008. [27] W. Emenecker and D. Stanzione, High Performance Computing Initiative. Fulton School of Engineering Arizona State University, USA, October, 2006.
10