Skip to main content
Log in

A fault-tolerant strategy for virtualized HPC clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Virtualization is a common strategy for improving the utilization of existing computing resources, particularly within data centers. However, its use for high performance computing (HPC) applications is currently limited despite its potential for both improving resource utilization as well as providing resource guarantees to its users. In this article, we systematically evaluate three major virtual machine implementations for computationally intensive HPC applications using various standard benchmarks. Using VMWare Server, Xen, and OpenVZ, we examine the suitability of full virtualization (VMWare), paravirtualization (Xen), and operating system-level virtualization (OpenVZ) in terms of network utilization, SMP performance, file system performance, and MPI scalability. We show that the operating system-level virtualization provided by OpenVZ provides the best overall performance, particularly for MPI scalability. With the knowledge gained by our VM evaluation, we extend OpenVZ to include support for checkpointing and fault-tolerance for MPI-based virtual server distributed computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adams K, Agesen O (2006) A comparison of software and hardware techniques for x86 virtualization. In: ASPLOS-XII: proceedings of the 12th international conference on architectural support for programming languages and operating systems, 2006. ACM Press, New York, pp 2–13

    Chapter  Google Scholar 

  2. Ahmad I, Anderson JM, Holler AM, Kambo R, Makhija V (2003) An analysis of disk performance in VMware ESX server virtual machines. In: WWC ’03: proceedings of the 6th international workshop on workload characterization, 2003. IEEE Computer Society Press, Los Alamitos, pp 65–76

    Google Scholar 

  3. Altman ER, Kaeli D, Sheffer Y (2000) Guest editors’ introduction: welcome to the opportunities of binary translation. Computer 33(3):40–45

    Article  Google Scholar 

  4. Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum L, Fatoohi RA, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The NAS parallel benchmarks. Int J High Perform Comput Appl 5(3):63–73

    Article  Google Scholar 

  5. Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. In: SOSP ’03: proceedings of the 19th symposium on operating systems principles, 2003. ACM Press, New York, pp 164–177

    Chapter  Google Scholar 

  6. Batsakis A, Burns R (2008) NFS-CD: write-enabled cooperative caching in NFS. IEEE Trans Parallel Distrib Syst 19(3):323–333

    Article  Google Scholar 

  7. Beguelin A, Seligman E, Stephan P (1997) Application level fault tolerance in heterogeneous networks of workstations. J Parallel Distrib Comput 43(2):147–155

    Article  Google Scholar 

  8. Bosilca G, Bouteiller A, Cappello F, Djilali S, Fedak G, Germain C, Herault T, Lemarinier P, Lodygensky O, Magniette F, Neri V, Selikhov A (2002) MPICH-V: toward a scalable fault tolerant MPI for volatile nodes. In: SC ’02: proceedings of the 19th annual supercomputing conference, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press, Los Alamitos, pp 1–18

    Google Scholar 

  9. Bronevetsky G, Marques D, Pingali K, Stodghill P (2003) Automated application-level checkpointing of MPI programs. In: PPoPP ’03: proceedings of the 9th symposium on principles and practice of parallel programming, 2003. ACM Press, New York, pp 84–94

    Chapter  Google Scholar 

  10. Burns G, Daoud R, Vaigl J (1994) LAM: an open cluster environment for MPI. In: Proceedings of supercomputing symposium, 1994. IEEE Computer Society Press, Los Alamitos, pp 379–386

    Google Scholar 

  11. Cherkasova L, Gardner R (2005) Measuring CPU overhead for I/O processing in the Xen virtual machine monitor. In: USENIX 2005 annual technical conference, general track. USENIX Association, pp 387–390

  12. Clark B, Deshane T, Dow E, Evanchik S, Finlayson M, Herne J, Matthews J (2004) Xen and the art of repeated research. In: USENIX technical conference FREENIX track, 2004. USENIX Association, pp 135–144

  13. Dongarra JJ, Luszczek P, Petitet A (2003) The LINPACK benchmark: Past, present, and future. Concurr Comput Pract Exp 15:1–18

    Article  Google Scholar 

  14. Duell J (2002) The design and implementation of Berkeley Lab’s Linux checkpoint/restart. Technical Report LBNL-54941, Lawrence Berkeley National Lab

  15. Elnozahy EN, Alvisi L, Wang Y-M, Johnson DB (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv 34(3):375–408

    Article  Google Scholar 

  16. Emeneker W, Stanzione D (2006) HPC cluster readiness of Xen and user mode Linux. In: CLUSTER ’06: proceedings of the international conference on cluster computing, 2006. IEEE Computer Society Press, Los Alamitos, pp 1–8

    Chapter  Google Scholar 

  17. Goldberg RP (1974) Survey of virtual machine research. IEEE Comput 7(6):34–45

    Google Scholar 

  18. Graham RL, Choi SE, Daniel DJ, Desai NN, Minnich RG, Rasmussen CE, Risinger LD, Sukalski MW (2003) A network-failure-tolerant message-passing system for terascale clusters. Int J Parallel Program 31(4):285–303

    Article  MATH  Google Scholar 

  19. Gropp WD, Lusk E (2004) Fault tolerance in MPI programs. Int J High Perform Comput Appl 18(3):363–372

    Article  Google Scholar 

  20. Hewlett-Packard. Netperf. http://www.netperf.org

  21. Litzkow M, Tannenbaum T, Basney J, Livny M (1997) Checkpoint and migration of Unix processes in the Condor distributed processing system. Technical Report 1346, University of Wisconsin-Madison

  22. Liu J, Huang W, Abali B, Panda DK (2006) High performance VMM-bypass I/O in virtual machines. In: Proceedings of the USENIX annual technical conference, 2006. USENIX Association, pp 3–16

  23. Menon A, Santos JR, Turner Y, Janakiraman G, Zwaenepoel W (2005) Diagnosing performance overheads in the Xen virtual machine environment. In: VEE ’05: proceedings of the 1st ACM/USENIX international conference on virtual execution environments, 2005. ACM Press, New York, pp 13–23

    Chapter  Google Scholar 

  24. The MPI Forum (1993) MPI: A message passing interface. In: SC ’93: proceedings of the 6th annual supercomputing conference, 1993. IEEE Computer Society Press, Los Alamitos, pp 878–883

    Chapter  Google Scholar 

  25. Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: ICS ’07: proceedings of the 21st annual international conference on supercomputing, 2007. ACM Press, New York, pp 23–32

    Google Scholar 

  26. Norcott WD, Capps D (2008) The IOZone filesystem benchmark. http://www.iozone.org

  27. Plank JS, Beck M, Kingsley G, Li K (1994) Libckpt: transparent checkpointing under Unix. Technical Report UT-CS-94-242

  28. Raj H, Schwan K (2007) High performance and scalable I/O virtualization via self-virtualized devices. In: HPDC ’07: proceedings of the international symposium on high performance distributed computing, 2007. IEEE Computer Society Press, Los Alamitos, pp 179–188

    Chapter  Google Scholar 

  29. Sacerdoti F, Katz MJ, Massie ML, Culler DE (2003) Wide area cluster monitoring with Ganglia. In: CLUSTER ’03: the international conference on cluster computing, 2003. IEEE Computer Society Press, Los Alamitos, pp 289–298

    Google Scholar 

  30. Sankaran S, Squyres JM, Barrett B, Lumsdaine A, Duell J, Hargrove P, Roman E (2005) The LAM/MPI checkpoint/restart framework: system-initiated checkpointing. Int J High Perform Comput Appl 19(4):479–493

    Article  Google Scholar 

  31. Smith JE, Nair R (2005) The architecture of virtual machines. Computer 38(5):32–38

    Article  Google Scholar 

  32. Soltesz S, Pötzl H, Fiuczynski ME, Bavier A, Peterson L (2007) Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. SIGOPS Oper Syst Rev 41(3):275–287

    Article  Google Scholar 

  33. Spainhower L, Gregg TA (1999) IBM S/390 parallel enterprise server G5 fault tolerance: a historical perspective. IBM J Res Devel 43(5/6):863–873

    Article  Google Scholar 

  34. Squyres JM, Lumsdaine A (2003) A component architecture for LAM/MPI. In: Proceedings of the 10th European PVM/MPI users’ group meeting, 2003. LNCS, vol 2840. Springer, Berlin, pp 379–387

    Google Scholar 

  35. Sridhar S, Shapiro JS, Northup E, Bungale PP (2006) HDTrans: An open source, low-level dynamic instrumentation system. In: VEE ’06: proceedings of the 2nd international conference on virtual execution environments, 2006. ACM Press, New York, pp 175–185

    Google Scholar 

  36. SWSoft (2006) OpenVZ—server virtualization. http://www.openvz.org/

  37. VMWare (2006) VMWare. http://www.vmware.com

  38. Waldspurger CA (2002) Memory resource management in VMware ESX server. SIGOPS Oper Syst Rev 36(SI):181–194

    Article  Google Scholar 

  39. Walters JP, Chaudhary V (2007) A scalable asynchronous replication-based strategy for fault tolerant MPI applications. In: HiPC ’07: the international conference on high performance computing, 2007. LNCS, vol 4873. Springer, Berlin, pp 257–268

    Chapter  Google Scholar 

  40. Walters JP, Chaudhary V (2008) Replication-based fault-tolerance for MPI applications. IEEE Trans Parallel Distrib Syst. IEEE computer society digital library. IEEE Computer Society, 5 December 2008. http://doi.ieeecomputersociety.org/10.1109/TPDS.2008.172

  41. Weiss A (2007) Computing in the clouds. netWorker 11(4):16–25

    Article  Google Scholar 

  42. Wong FC, Martin RP, Arpaci-Dusseau RH, Culler DE (1999) Architectural requirements and scalability of the NAS parallel benchmarks. In: ICS ’99: proceedings of the 13th international conference on supercomputing, 1999. ACM Press, New York, pp 41–58

    Google Scholar 

  43. Zandy V (2000) Ckpt: User-level checkpointing. http://www.cs.wisc.edu/~zandy/ckpt/

  44. Zhang Y, Wong D, Zheng W (2005) User-level checkpoint and recovery for LAM/MPI. SIGOPS Oper Syst Rev 39(3):72–81

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Paul Walters.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Walters, J.P., Chaudhary, V. A fault-tolerant strategy for virtualized HPC clusters. J Supercomput 50, 209–239 (2009). https://doi.org/10.1007/s11227-008-0259-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-008-0259-0

Keywords

Navigation