Abstract
This talk will describe an implementation of MPI which extends the message passing model to allow for recovery in the presence of a faulty process. Our implementation allows a user to catch the fault and then provide for a recovery.
We will also touch on the issues related to using diskless checkpointing to allow for effective recovery of an application in the presence of a process fault.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dongarra, J.J. (2004). Fault Tolerance in Message Passing and in Action. In: KranzlmĂĽller, D., Kacsuk, P., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2004. Lecture Notes in Computer Science, vol 3241. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30218-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-30218-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23163-9
Online ISBN: 978-3-540-30218-6
eBook Packages: Springer Book Archive