Skip to main content

Checkpointing and Communication Pattern-Neutral Algorithm for Removing Messages Logged by Senders

  • Conference paper
High Performance Computing and Communications (HPCC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4208))

Abstract

The traditional sender-based message logging protocols use a garbage collection algorithm to result in a large number of additional messages and forced checkpoints. So, in our previous work, an algorithm was introduced to allow each process to autonomously remove useless log information in its volatile storage by piggybacking only some additional information without requiring any extra message and forced checkpoint. However, even after a process has executed the algorithm, its storage buffer may still be overloaded in some communication and checkpointing patterns. This paper proposes a new garbage collection algorithm CCPNA for sender-based message logging to address all the problems mentioned above. The algorithm considerably reduces the number of processes to participate in the garbage collection by using the size of the log information of each process. Thus, CCPNA incurs more additional messages and forced checkpoints than our previous algorithm. However, it can avoid the risk of overloading the storage buffers regardless of the specific checkpointing and communication patterns. Also, CCPNA reduces the number of additional messages and forced checkpoints compared with the traditional algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahn, J.: An Efficient Algorithm for Removing Useless Logged Messages in SBML Protocols. In: Chakraborty, G. (ed.) ICDCIT 2005. LNCS, vol. 3816, pp. 166–171. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Bouteiller, A., Cappello, F., Hérault, T., Krawezik, G., Lemarinier, P., Magniette, F.: MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging. In: Proc. of the 15th International Conference on High Performance Networking and Computing (SC 2003) (November 2003)

    Google Scholar 

  3. Chandy, K.M., Lamport, L.: Distributed Snapshots: Determining Global States of Distributed Systems. ACM Transactions on Computer Systems 3(1), 63–75 (1985)

    Article  Google Scholar 

  4. Johnson, D.B., Zwaenpoel, W.: Sender-Based Message Logging. In: Digest of Papers: 17th International Symposium on Fault-Tolerant Computing, pp. 14–19 (1987)

    Google Scholar 

  5. Elnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)

    Article  Google Scholar 

  6. Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM 21, 558–565 (1978)

    Article  MATH  Google Scholar 

  7. McNab, R., Howell, F.W.: simjava: a discrete event simulation package for Java with applications in computer systems modelling. In: Proc. First International Conference on Web-based Modelling and Simulation (1998)

    Google Scholar 

  8. Powell, M.L., Presotto, D.L.: Publishing: A reliable broadcast communication mechanism. In: Proc. of the 9th International Symposium on Operating System Principles, pp. 100–109 (1983)

    Google Scholar 

  9. Sens, P., Folliot, B.: The STAR Fault Tolerant manager for Distributed Operating Environments. Software Practice and Experience 28(10), 1079–1099 (1998)

    Article  Google Scholar 

  10. Schlichting, R.D., Schneider, F.B.: Fail-stop processors: an approach to designing fault-tolerant distributed computing systems. ACM Transactions on Computer Systems 1, 222–238 (1985)

    Article  Google Scholar 

  11. Strom, R.E., Bacon, D.F., Yemeni, S.A.: Volatile Logging in n-Fault-Tolerant Distributed Systems. In: Digest of Papers: the 18th International Symposium on Fault-Tolerant Computing, pp. 44–49 (1988)

    Google Scholar 

  12. Strom, R.E., Yemeni, S.A.: Optimistic recovery in distributed systems. ACM Transactions on Computer Systems 3, 204–226 (1985)

    Article  Google Scholar 

  13. Xu, J., Netzer, R.B., Mackey, M.: Sender-based message logging for reducing rollback propagation. In: Proc. of the 7th International Symposium on Parallel and Distributed Processing, pp. 602–609 (1995)

    Google Scholar 

  14. Yao, B., Ssu, K.-F., Fuchs, W.K.: Message Logging in Mobile Computing. In: Proc. of the 29th International Symposium on Fault-Tolerant Computing, pp. 14–19 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ahn, J. (2006). Checkpointing and Communication Pattern-Neutral Algorithm for Removing Messages Logged by Senders. In: Gerndt, M., Kranzlmüller, D. (eds) High Performance Computing and Communications. HPCC 2006. Lecture Notes in Computer Science, vol 4208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11847366_8

Download citation

  • DOI: https://doi.org/10.1007/11847366_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39368-9

  • Online ISBN: 978-3-540-39372-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics