Abstract
Virtualization is a common strategy for improving the utilization of existing computing resources, particularly within data centers. However, its use for high performance computing (HPC) applications is currently limited despite its potential for both improving resource utilization as well as providing resource guarantees to its users. In this article, we systematically evaluate three major virtual machine implementations for computationally intensive HPC applications using various standard benchmarks. Using VMWare Server, Xen, and OpenVZ, we examine the suitability of full virtualization (VMWare), paravirtualization (Xen), and operating system-level virtualization (OpenVZ) in terms of network utilization, SMP performance, file system performance, and MPI scalability. We show that the operating system-level virtualization provided by OpenVZ provides the best overall performance, particularly for MPI scalability. With the knowledge gained by our VM evaluation, we extend OpenVZ to include support for checkpointing and fault-tolerance for MPI-based virtual server distributed computing.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adams K, Agesen O (2006) A comparison of software and hardware techniques for x86 virtualization. In: ASPLOS-XII: proceedings of the 12th international conference on architectural support for programming languages and operating systems, 2006. ACM Press, New York, pp 2–13
Ahmad I, Anderson JM, Holler AM, Kambo R, Makhija V (2003) An analysis of disk performance in VMware ESX server virtual machines. In: WWC ’03: proceedings of the 6th international workshop on workload characterization, 2003. IEEE Computer Society Press, Los Alamitos, pp 65–76
Altman ER, Kaeli D, Sheffer Y (2000) Guest editors’ introduction: welcome to the opportunities of binary translation. Computer 33(3):40–45
Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum L, Fatoohi RA, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The NAS parallel benchmarks. Int J High Perform Comput Appl 5(3):63–73
Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. In: SOSP ’03: proceedings of the 19th symposium on operating systems principles, 2003. ACM Press, New York, pp 164–177
Batsakis A, Burns R (2008) NFS-CD: write-enabled cooperative caching in NFS. IEEE Trans Parallel Distrib Syst 19(3):323–333
Beguelin A, Seligman E, Stephan P (1997) Application level fault tolerance in heterogeneous networks of workstations. J Parallel Distrib Comput 43(2):147–155
Bosilca G, Bouteiller A, Cappello F, Djilali S, Fedak G, Germain C, Herault T, Lemarinier P, Lodygensky O, Magniette F, Neri V, Selikhov A (2002) MPICH-V: toward a scalable fault tolerant MPI for volatile nodes. In: SC ’02: proceedings of the 19th annual supercomputing conference, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press, Los Alamitos, pp 1–18
Bronevetsky G, Marques D, Pingali K, Stodghill P (2003) Automated application-level checkpointing of MPI programs. In: PPoPP ’03: proceedings of the 9th symposium on principles and practice of parallel programming, 2003. ACM Press, New York, pp 84–94
Burns G, Daoud R, Vaigl J (1994) LAM: an open cluster environment for MPI. In: Proceedings of supercomputing symposium, 1994. IEEE Computer Society Press, Los Alamitos, pp 379–386
Cherkasova L, Gardner R (2005) Measuring CPU overhead for I/O processing in the Xen virtual machine monitor. In: USENIX 2005 annual technical conference, general track. USENIX Association, pp 387–390
Clark B, Deshane T, Dow E, Evanchik S, Finlayson M, Herne J, Matthews J (2004) Xen and the art of repeated research. In: USENIX technical conference FREENIX track, 2004. USENIX Association, pp 135–144
Dongarra JJ, Luszczek P, Petitet A (2003) The LINPACK benchmark: Past, present, and future. Concurr Comput Pract Exp 15:1–18
Duell J (2002) The design and implementation of Berkeley Lab’s Linux checkpoint/restart. Technical Report LBNL-54941, Lawrence Berkeley National Lab
Elnozahy EN, Alvisi L, Wang Y-M, Johnson DB (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv 34(3):375–408
Emeneker W, Stanzione D (2006) HPC cluster readiness of Xen and user mode Linux. In: CLUSTER ’06: proceedings of the international conference on cluster computing, 2006. IEEE Computer Society Press, Los Alamitos, pp 1–8
Goldberg RP (1974) Survey of virtual machine research. IEEE Comput 7(6):34–45
Graham RL, Choi SE, Daniel DJ, Desai NN, Minnich RG, Rasmussen CE, Risinger LD, Sukalski MW (2003) A network-failure-tolerant message-passing system for terascale clusters. Int J Parallel Program 31(4):285–303
Gropp WD, Lusk E (2004) Fault tolerance in MPI programs. Int J High Perform Comput Appl 18(3):363–372
Hewlett-Packard. Netperf. http://www.netperf.org
Litzkow M, Tannenbaum T, Basney J, Livny M (1997) Checkpoint and migration of Unix processes in the Condor distributed processing system. Technical Report 1346, University of Wisconsin-Madison
Liu J, Huang W, Abali B, Panda DK (2006) High performance VMM-bypass I/O in virtual machines. In: Proceedings of the USENIX annual technical conference, 2006. USENIX Association, pp 3–16
Menon A, Santos JR, Turner Y, Janakiraman G, Zwaenepoel W (2005) Diagnosing performance overheads in the Xen virtual machine environment. In: VEE ’05: proceedings of the 1st ACM/USENIX international conference on virtual execution environments, 2005. ACM Press, New York, pp 13–23
The MPI Forum (1993) MPI: A message passing interface. In: SC ’93: proceedings of the 6th annual supercomputing conference, 1993. IEEE Computer Society Press, Los Alamitos, pp 878–883
Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: ICS ’07: proceedings of the 21st annual international conference on supercomputing, 2007. ACM Press, New York, pp 23–32
Norcott WD, Capps D (2008) The IOZone filesystem benchmark. http://www.iozone.org
Plank JS, Beck M, Kingsley G, Li K (1994) Libckpt: transparent checkpointing under Unix. Technical Report UT-CS-94-242
Raj H, Schwan K (2007) High performance and scalable I/O virtualization via self-virtualized devices. In: HPDC ’07: proceedings of the international symposium on high performance distributed computing, 2007. IEEE Computer Society Press, Los Alamitos, pp 179–188
Sacerdoti F, Katz MJ, Massie ML, Culler DE (2003) Wide area cluster monitoring with Ganglia. In: CLUSTER ’03: the international conference on cluster computing, 2003. IEEE Computer Society Press, Los Alamitos, pp 289–298
Sankaran S, Squyres JM, Barrett B, Lumsdaine A, Duell J, Hargrove P, Roman E (2005) The LAM/MPI checkpoint/restart framework: system-initiated checkpointing. Int J High Perform Comput Appl 19(4):479–493
Smith JE, Nair R (2005) The architecture of virtual machines. Computer 38(5):32–38
Soltesz S, Pötzl H, Fiuczynski ME, Bavier A, Peterson L (2007) Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. SIGOPS Oper Syst Rev 41(3):275–287
Spainhower L, Gregg TA (1999) IBM S/390 parallel enterprise server G5 fault tolerance: a historical perspective. IBM J Res Devel 43(5/6):863–873
Squyres JM, Lumsdaine A (2003) A component architecture for LAM/MPI. In: Proceedings of the 10th European PVM/MPI users’ group meeting, 2003. LNCS, vol 2840. Springer, Berlin, pp 379–387
Sridhar S, Shapiro JS, Northup E, Bungale PP (2006) HDTrans: An open source, low-level dynamic instrumentation system. In: VEE ’06: proceedings of the 2nd international conference on virtual execution environments, 2006. ACM Press, New York, pp 175–185
SWSoft (2006) OpenVZ—server virtualization. http://www.openvz.org/
VMWare (2006) VMWare. http://www.vmware.com
Waldspurger CA (2002) Memory resource management in VMware ESX server. SIGOPS Oper Syst Rev 36(SI):181–194
Walters JP, Chaudhary V (2007) A scalable asynchronous replication-based strategy for fault tolerant MPI applications. In: HiPC ’07: the international conference on high performance computing, 2007. LNCS, vol 4873. Springer, Berlin, pp 257–268
Walters JP, Chaudhary V (2008) Replication-based fault-tolerance for MPI applications. IEEE Trans Parallel Distrib Syst. IEEE computer society digital library. IEEE Computer Society, 5 December 2008. http://doi.ieeecomputersociety.org/10.1109/TPDS.2008.172
Weiss A (2007) Computing in the clouds. netWorker 11(4):16–25
Wong FC, Martin RP, Arpaci-Dusseau RH, Culler DE (1999) Architectural requirements and scalability of the NAS parallel benchmarks. In: ICS ’99: proceedings of the 13th international conference on supercomputing, 1999. ACM Press, New York, pp 41–58
Zandy V (2000) Ckpt: User-level checkpointing. http://www.cs.wisc.edu/~zandy/ckpt/
Zhang Y, Wong D, Zheng W (2005) User-level checkpoint and recovery for LAM/MPI. SIGOPS Oper Syst Rev 39(3):72–81
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Walters, J.P., Chaudhary, V. A fault-tolerant strategy for virtualized HPC clusters. J Supercomput 50, 209–239 (2009). https://doi.org/10.1007/s11227-008-0259-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-008-0259-0