Skip to main content

An Efficient Computing-Checkpoint Based Coordinated Checkpoint Algorithm

  • Conference paper
Embedded and Ubiquitous Computing (EUC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 4096))

Included in the following conference series:

  • 738 Accesses

Abstract

In this paper, the concept of “computing checkpoint” is introduced, and then an efficient coordinated checkpoint algorithm is proposed. The algorithm combines the two approaches of reducing the overhead associated with coordinated checkpointing, which one is to minimize the processes which take checkpoints and the other is to make the checkpointing process non-blocking. Through piggybacking the information including which processes have taken new checkpoint in the broadcast committing message, the checkpoint sequence number of every process can be kept consistent in all processes, so that the unnecessary checkpoints and orphan messages can be avoided in the future running. Evaluation result shows that the number of redundant computing checkpoints is less than 1/10 of the number of tentative checkpoints. Analyses and experiments show that the overhead of our algorithm is lower than that of other coordinated checkpoint algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)

    Article  Google Scholar 

  2. Kalaiselvi, S., Rajaramana, V.: A Survey of Checkpointing Algorithms for Parallel and Distributed Computers. Sadhana Academy Proceedings in Engineering Sciences 25(5), 489–510 (2000)

    Google Scholar 

  3. Koo, R., Toueg, S.: Checkpointing and Rollback-Recovery for Distributed Systems. IEEE Transactions on Software Engineering 13, 23–31 (1987)

    Article  MATH  Google Scholar 

  4. Kim, J.L., Park, T.: An Efficient Protocol for Checkpointing Recovery in Distributed Systems. IEEE Transactions on Parallel and Distributed Systems 5(8), 955–960 (1993)

    Article  Google Scholar 

  5. Deng, Y., Park, E.K.: Checkpointing and Rollback-Recovery Algorithms in Distributed Systems. Journal of Systems Software 4, 59–71 (1994)

    Article  Google Scholar 

  6. Guohong, C., Singhal, M.: On the Impossibility of Min-Process Non-Blocking Checkpointing and an Efficient Checkpointing Algorithm for Mobile Computing Systems. In: Proceedings of the 27th int’l International Conference on Parallel Processing, Minneapolis, USA, pp. 37–44 (1998)

    Google Scholar 

  7. Elnozahy, E.N., Johnson, D.B., Zwaenepoel, W.: The Performance of Consistent Checkpointing. In: Proceedings of the 11th Symposium on Reliable Distributed Systems, Houston, pp. 39–47 (1992)

    Google Scholar 

  8. Silva, L.M., Silva, J.G.: Global Checkpointing for Distributed Programs. In: Proceedings of the 11th Symposium on Reliable Distributed Systems, Houston, pp. 155–162 (1992)

    Google Scholar 

  9. Helery, J.M., Mostefaoui, A., Raynal, M.: Communication-Induced Determination of Consistent Snapshots. IEEE Transactions on Parallel and Distributed Systems 10(9), 865–877 (1999)

    Article  Google Scholar 

  10. Helary, J.M., Mostefaoui, A., Netzer, R.H.B., Raynal, M.: Preventing Useless Checkpoints in Distributed Computations. In: Proceedings of the 16th Symposium on Reliable Distributed Systems, pp. 183–190 (1997)

    Google Scholar 

  11. Guohong, C., Singhal, M.: Checkpointing with Mutable Checkpoints. Theoretical Computer Science 290, 1127–1148 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  12. Prakash, R., Singhal, M.: Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems. IEEE Transactions on Parallel Distributed System 7(10), 1035–1048 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chaoguang, M., Dongsheng, W., Yunlong, Z. (2006). An Efficient Computing-Checkpoint Based Coordinated Checkpoint Algorithm. In: Sha, E., Han, SK., Xu, CZ., Kim, MH., Yang, L.T., Xiao, B. (eds) Embedded and Ubiquitous Computing. EUC 2006. Lecture Notes in Computer Science, vol 4096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11802167_12

Download citation

  • DOI: https://doi.org/10.1007/11802167_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36679-9

  • Online ISBN: 978-3-540-36681-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics