Skip to main content

Scalable Fault Tolerant MPI: Extending the Recovery Algorithm

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 3666))

Abstract

Fault Tolerant MPI (FT-MPI) [6] was designed as a solution to allow applications different methods to handle process failures beyond simple check-point restart schemes. The initial implementation of FT-MPI included a robust heavy weight system state recovery algorithm that was designed to manage the membership of MPI communicators during multiple failures. The algorithm and its implementation although robust, was very conservative and this effected its scalability on both very large clusters as well as on distributed systems. This paper details the FT-MPI recovery algorithm and our initial experiments with new recovery algorithms that are aimed at being both scalable and latency tolerant. Our conclusions shows that the use of both topology aware collective communication and distributed consensus algorithms together produce the best results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agbaria, A., Friedman, R.: Starfish: Fault-tolerant dynamic mpi programs on clusters of workstations. In: 8th IEEE International Symposium on High Performance Distributed Computing (1999)

    Google Scholar 

  2. Batchu, R., Neelamegam, J., Cui, Z., Beddhua, M., Skjellum, A., Dandass, Y., Apte, M.: Mpi/ftTM: Architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computing. In: Proceedings of the 1st IEEE International Symposium of Cluster Computing and the Grid held in Melbourne, Australia (2001)

    Google Scholar 

  3. Beck, M., Dongarra, J.J., Fagg, G.E., Geist, G.A., Gray, P., Kohl, J., Migliardi, M., Moore, K., Moore, T., Papadopoulous, P., Scott, S.L., Sunderam, V.: HARNESS:a next generation distributed virtual machine. Future Generation Computer Systems 15 (1999)

    Google Scholar 

  4. Bosilca, G., Bouteiller, A., Cappello, F., Djilali, S., Fédak, G., Germain, C., Hérault, T., Lemarinier, P., Lodygensky, O., Magniette, F., Néri, V., Selikhov, A.: MPICH-v: Toward a scalable fault tolerant MPI for volatile nodes. In: SuperComputing, Baltimore USA (November 2002)

    Google Scholar 

  5. Burns, G., Daoud, R.: Robust MPI message delivery through guaranteed resources. In: MPI Developers Conference (June 1995)

    Google Scholar 

  6. Fagg, G.E., Bukovsky, A., Dongarra, J.J.: HARNESS and fault tolerant MPI. Parallel Computing 27, 1479–1496 (2001)

    Article  MATH  Google Scholar 

  7. Fagg, G.E., Moore, K., Dongarra, J.J.: Scalable networked information processing environment (SNIPE). Future Generation Computing Systems 15, 571–582 (1999)

    Article  Google Scholar 

  8. Graham, R.L., Choi, S.-E., Daniel, D.J., Desai, N.N., Minnich, R.G., Rasmussen, C.E., Risinger, L.D., Sukalski, M.W.: A network-failure-tolerant message-passing system for terascale clusters. In: ICS, New York, USA, June 22-26 (2002)

    Google Scholar 

  9. Louca, S., Neophytou, N., Lachanas, A., Evripidou, P.: Mpi-ft: Portable fault tolerance scheme for MPI. In: Parallel Processing Letters, vol. 10(4), pp. 371–382. World Scientific Publishing Company, Singapore (2000)

    Google Scholar 

  10. Message Passing Interface Forum. MPI: A Message Passing Interface Standard (June 1995), http://www.mpi-forum.org/

  11. Message Passing Interface Forum. MPI-2: Extensions to the Message Passing Interface (July 1997), http://www.mpi-forum.org/

  12. Stellner, G.: Cocheck: Checkpointing and process migration for MPI. In: Proceedings of the 10th International Parallel Processing Symposium (IPPS 1996), Honolulu, Hawaii (1996)

    Google Scholar 

  13. Vadhiyar, S.S., Fagg, G.E., Dongarra, J.J.: Performance modeling for self-adapting collective communications for MPI. In: LACSI Symposium, Eldorado Hotel, Santa Fe, NM, October 15-18. Springer, Heidelberg (2001)

    Google Scholar 

  14. Sankaran, S., Squyres, J.M., Barrett, B., Lumsdaine, A., Duell, J., Hargrove, P., Roman, E.: The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing. In: LACSI Symposium, Santa Fe, NM (October 2003)

    Google Scholar 

  15. Gabriel, E., Fagg, G.E., Bosilica, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Proceedings 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungry (2004)

    Google Scholar 

  16. Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MagPIe: MPI’s collective communication operations for clustered wide area systems. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 1999), May 1999, vol. 34(8), pp. 131–140 (1999)

    Google Scholar 

  17. Tanenbaum, A.S., van Steen, M.: Distributed Systems: Principles and Paradigms. Prentice Hall, Englewood Cliffs (2002)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fagg, G.E., Angskun, T., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J.J. (2005). Scalable Fault Tolerant MPI: Extending the Recovery Algorithm. In: Di Martino, B., Kranzlmüller, D., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2005. Lecture Notes in Computer Science, vol 3666. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557265_13

Download citation

  • DOI: https://doi.org/10.1007/11557265_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29009-4

  • Online ISBN: 978-3-540-31943-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics