Skip to main content

Design and Implementation of Dynamic Process Management for Grid-Enabled MPICH

  • Conference paper
Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2840))

Abstract

This paper presents the design and impementation of MPI_Rejoin() for MPICH-GF, a grid-enabled fault tolerant MPICH implementation. To provide fault tolerance to the MPI applications, it is mandatory for a failed process to recover and continue execution. However, current MPI implementations do not support dynamic process management and it is not possible to restore the information regarding communication channels. The ‘rejoin’ operation allows the restored process to rejoin the existing group by updating the corresponding entries of the channel table with the new physical address. We have verified that our implementation can correctly reconstruct the MPI communication structure by running NPB applications. We also report on the cost of ‘rejoin’ operation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fagg, G.E., Dongarra, J.: FP-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds.) PVM/MPI 2000. LNCS, vol. 1908, pp. 346–353. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Foster, I., Karonis, N.T.: A grid-enabled MPI: Message passing in heterogeneous distributed computing systems. In: Proceedings of SC 1998, ACM Press, New York (1998)

    Google Scholar 

  3. Li, W.J., Tsay, J.J.: Checkpointing message-passing interface(MPI) parallel programs. In: Pacific Rim International Symposium on Fault-Tolerant Systems(PRFTS) (1997)

    Google Scholar 

  4. Louca, S., Neophytou, N., Lachanas, A., Evripidou, P.: Portable fault tolerance scheme for MPI. Parallel Processing Letters 10(4), 371–382 (2000)

    Article  Google Scholar 

  5. Menden, J., Stellner, G.: Proving properties of pvm applications - a case study with cocheck. In: Ludwig, T., Sunderam, V.S., Bode, A., Dongarra, J. (eds.) PVM/MPI 1996 and EuroPVM 1996. LNCS, vol. 1156, pp. 134–141. Springer, Heidelberg (1996)

    Google Scholar 

  6. Woo, N., Choi, S., Jung, H., Moon, J., Yeom, H.Y., Park, T., Park, H.: MPICH-GF: providing fault tolerance on grid environments. In: The 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid(CCGrid2003) (May 2003) (the poster and research demo session)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, S., Woo, N., Yeom, H.Y., Park, T., Park, HW. (2003). Design and Implementation of Dynamic Process Management for Grid-Enabled MPICH. In: Dongarra, J., Laforenza, D., Orlando, S. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2003. Lecture Notes in Computer Science, vol 2840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39924-7_87

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39924-7_87

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20149-6

  • Online ISBN: 978-3-540-39924-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics