Skip to main content
Log in

A New Approach for High Performance Computing Systems with Various Checkpointing Schemes

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Roll-forward recovery schemes were proposed to enhance the performance of fault tolerant systems employing checkpointing approach. In the roll-forward schemes, multiple processors are used for simultaneous roll-forward and validation processing. This paper proposes the sample comparison approach along with the checkpointing, which further improves the performance by reducing the overhead imposed by the checkpointing. We also develop general analytical models for estimating the availability, which are applicable for any checkpointing scheme. Performance comparisons reveal that the availabilities of the checkpointing schemes with sample comparison are higher than those of the schemes without it, while the required checkpoint interval is larger.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. A. Agbaria, A. Freund, and R. Friedman. Evaluating distributed checkpointing protocols. 23rd Intl. Conf. Dist. Comput. Syst., May 2003, pp. 266–273.

  2. L. Alvisi, E. Elnozahy, S. Rao, S. A. Husain, and A. D. Mel. An analysis of communication induced checkpointing. 29th Fault-Tolerance Comput. Symp., June 1999, pp. 242–249.

  3. R. Baldoni, J. M. Helary, and M. Raynal. Rollback-dependency trackability: A minimal characterization and its protocol. Inform, and Comput., 2001.

  4. G. Gao and M. Singhal. Mutable checkpoints: A new checkpointing approach for mobile computing systems. IEEE Trans. Parallel Dist. Syst., 12(2):157–172, 2001.

    Article  Google Scholar 

  5. J. M. Helary, A. Mostefaoui, R. H. B. Netzer, and M. Raynal. Communication-based prevention of useless checkpoints in distributed computations. Distributed Comput., 13:29–43, 2000.

    Article  Google Scholar 

  6. B. Lee, T. Park, and H. Y. Yeom. On the impossibility of non-blocking consistent casual recovery. IEICE Trnas. Inform. Syst. E83-D, (2):291–294, 2000.

    Google Scholar 

  7. J. Long, W. K. Fuchs, and J. A. Abraham. Compiler-assisted static checkpoint insertion. 22nd Intl. Symp. Fault-Tolerant Computing, July 1992, pp. 58–65.

  8. J. Long, W. K. Fuchs, and J. A. Abraham. Implementing forward recovery using checkpoints in distributed systems. IFIP Work. Conf. Dependable Comput. for Critical Appl., 1992, pp. 27–46.

  9. D. Manivannan and M. Singhal. Quazi-synchronous checkpoint: Models, characterization, and classification. IEEE Trans. Parallel and Distributed Systems, 10(7):703–713, 1999.

    Article  Google Scholar 

  10. T. Park and H. Y. Yeom. An asychronous recovery scheme based on optimistic message logging for mobile computing systems. 20th Intl. Conf. Dist. Comput. Syst., April 2000, pp. 436–443.

  11. G.-L. Park, H. Y. Youn, and H.-S. Choo. Optimal checkpoint interval analysis using stochastic petri net. IEEE Intl. Symp. Dependable Computing, Dec. 2001, pp. 57–60.

  12. D. K. Pradhan and N. H. Vaidya. Roll-forward checkpointing scheme: A novel fault tolerant architecture. IEEE Trans. Computers, 43(10):1163–1174, 1994.

    Article  Google Scholar 

  13. S. Rao, L. Alvisi, and H. M. Vin. The cost of recovery in message logging protocols. IEEE Trans. Knowledge Data Eng., 12(2):160–173, 2000.

    Article  Google Scholar 

  14. J. Tsai, S. Y. Kuo, and Y. M. Wang. Evaluation on dominio-free communication-induced checkpointing protocols. Inform. Process. Lett., 69(1):31–37, 1999.

    Article  MathSciNet  Google Scholar 

  15. B. Yao, K.-F. Ssu, and W. K. Fuchs. Message logging in mobile computing. 29th Intl. Symp. on Fault-Tolerant Computing, 1999, pp. 14–19.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gyung-Leen Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, GL., Yong, H.Y. A New Approach for High Performance Computing Systems with Various Checkpointing Schemes. J Supercomput 33, 65–78 (2005). https://doi.org/10.1007/s11227-005-0221-3

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-005-0221-3

Keywords

Navigation