Skip to main content

Applicability of Generic Naming Services and Fault-Tolerant Metacomputing with FT-MPI

  • Conference paper
Book cover Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2005)

Abstract

There is a growing interest in deploying MPI over multiple, heterogenous and geographically distributed resources for performing very large scale computations. However, increasing the amount of geographical distribution and resources creates problems with interoperability and fault-tolerance. FT-MPI presents an interesting solution for adding fault-tolerance to MPI, but suffers from interoperability limitations and potential single points of failure when crossing multiple administrative domains. We propose to overcome these limitations by adding “pluggability” for one potential single point of failure – the name service used by FT-MPI – and combining FT-MPI with the H2O metacomputing framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kurzyniec, D., Sunderam, V.: Combining FT-MPI with H20: Fault-tolerant MPI across administrative boundaries. In: Proceedings of th HCW 2005-14th Heterogeneous Computing Workshop (2005) (accepted)

    Google Scholar 

  2. Agbaria, A., Friedman, R.: Starfish: Fault-tolerant dynamic MPI programs on clusters of workstations. In: Eighth IEEE International Symposium on High Performance Distributed Computing, p. 31 (1999)

    Google Scholar 

  3. Bouteiller, A., Cappello, F., Herault, T., Krawezik, G., Lemarinier, P., Magniette, F.: MPICH-V2: a fault tolerant MPI for volatile nodes based on pessimistic sender based message logging. In: ACM/IEEE SC 2003 Conference, p. 25 (2003)

    Google Scholar 

  4. Chen, Y., Li, K., Plank, J.S.: CLIP: A checkpointing tool for message-passing parallel programs (1997). Available at http://citeseerist.psu.edu/chen97clip.html

  5. Chin, J., Coveney, P.V.: Towards tractable toolkits for the Grid: a plea for lightweight, usable middleware. Available at http://www.realitygrid.org/lgpaper21.pdf

  6. Elnozahy, E., Zwaenepoel, W.: Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output. IEEE Transactions on Computers, Special Issue on Fault-Tolerant Computing 41(5), 526–531 (1992)

    Google Scholar 

  7. Fagg, G., Gabriel, E., Chen, Z., Angskun, T., Bosilca, G., Pjesivac-Grbovic, J., Dongarra, J.: Process fault-tolerance: Sematics, design and applications for high-performance computing. In: International Journal for High Performance Applications and Supercomputing (2004)

    Google Scholar 

  8. Imamura, T., Tsujita, Y., Koide, H., Takemiya, H.: An architecture of Stampi: MPI library on a cluster of parallel computers. In: 7th European PVM/MPI Users’ Group Meeting, pp. 4–18 (2000)

    Google Scholar 

  9. Karonis, N., Toonen, B., Foster, I.: MPICH-G2: A grid-enabled implementation of the Message Passing Interface. Journal of Parallel and Distributed Computing (JPDC) 63(5), 551–563 (2003)

    Article  MATH  Google Scholar 

  10. Keller, R., Krammer, B., Mueller, M.S., Resch, M.M., Gabriel, E.: MPI development tools and applications for the grid. In: Workshop on Grid Applications and Programming Tools (2003)

    Google Scholar 

  11. Kurzyniec, D., Wrzosek, T., Drzewiecki, D., Sunderam, V.: Towards self-organising distributed computing frameworks: The H2O approach. Parallel Processing Letters 13(2), 273–290 (2003)

    Article  MathSciNet  Google Scholar 

  12. Louca, S., Neophytou, N., Lachanas, A., Eviripidou, P.: MPI-FT: Portable fault-tolerance scheme for MPI. Parallel Processing Letters 10(4), 371–382 (2000)

    Article  Google Scholar 

  13. Stellner, G.: CoCheck: Checkpointing and process migration for MPI. In: 10th International Parallel Processing Symposium, 526–531 (1996)

    Google Scholar 

  14. Tyrakowski, T., Sunderam, V.S., Migliardi, M.: Distributed Name Service in Harness. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS-ComputSci 2001. LNCS, vol. 2073, pp. 345–354. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dewolfs, D., Kurzyniec, D., Sunderam, V., Broeckhove, J., Dhaene, T., Fagg, G. (2005). Applicability of Generic Naming Services and Fault-Tolerant Metacomputing with FT-MPI. In: Di Martino, B., Kranzlmüller, D., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2005. Lecture Notes in Computer Science, vol 3666. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557265_36

Download citation

  • DOI: https://doi.org/10.1007/11557265_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29009-4

  • Online ISBN: 978-3-540-31943-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics