Skip to main content
Log in

Checkpoint Management with Double Modular Redundancy Based on the Probability of Task Completion

  • Short Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

This paper proposes a checkpoint rollback strategy for real-time systems with double modular redundancy. Without built-in fault-detection and spare processors, our scheme is able to recover from both transient and permanent faults. Two comparisons are conducted at each checkpoint. First, the states stored in two consecutive checkpoints of one processor are compared for checking integrity of the processor. The states of two processors are also compared for detecting faults and the system rolls back to the previous checkpoint whenever required by logic of the proposed scheme. A Markov model is induced by the fault recovery scheme and analyzed to provide the probability of task completion within its deadline. The optimal number of checkpoints is selected so as to maximize the probability of task completion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Young J W. A first order approximation to the optimal check-point intervals. Commun. the ACM, 1974, 17(9): 530–531.

    Article  MATH  Google Scholar 

  2. Naruse K, Umemura S, Nakagawa, S. Optimal checkpointing interval for two-level recovery schemes. Computers and Mathematics with Applications, 2006, 51(2): 371–376.

    Article  MathSciNet  MATH  Google Scholar 

  3. Ziv A, Bruck J. Performance optimization of checkpointing schemes with task duplication. IEEE Transactions on Computers, 1997, 46(12): 1381–1386.

    Article  MathSciNet  Google Scholar 

  4. Nakagawa S, Fukumoto S, Ishii N. Optimal checkpointing intervals for a double modular redundancy with signatures. Comput. and Math. with Applicat., 2003, 46(7): 1089–1094.

    Article  MATH  Google Scholar 

  5. Krishina C M, Shin K G. Real-Time Systems. McGraw-Hill, 1997.

  6. Pradhan D K, Vaidya N H. Roll-forward checkpointing scheme: A novel fault-tolerant architecture. IEEE Transactions on Computers, 1994, 43(10): 1163–1174.

    Article  MATH  Google Scholar 

  7. Ziv A, Bruck J. Analysis of checkpointing schemes with task duplication. IEEE Trans. Computers, 1998, 47(2): 222–227.

    Article  Google Scholar 

  8. Pradhan D K, Vaidya N H. Roll-forward and rollback recovery: Performance-reliability trade-off. IEEE Transactions on Computers, 1997, 46(3): 372–378.

    Article  Google Scholar 

  9. Tiwari A, Tomko K A. Enhanced reliability of finite-state machines in FPGA through efficient fault detection and correction. IEEE Transactions on Reliability, 2005, 54(3): 459–467.

    Article  Google Scholar 

  10. Yang J M, Kwak S W. A checkpoint scheme with task duplication considering transient and permanent fault. In Proc. IEEE Int. Conf. Industrial Engineering and Engineering Management (IEEM2010), Dec. 2010, pp.606–610.

  11. Karpovsky M, Su S Y H. Detection and location of input and feedback bridging faults among input and output lines. IEEE Transactions on Computers, 1980, C-29(6): 523–527.

    Article  MathSciNet  Google Scholar 

  12. Hashizume M, Yotsuyanagi H, Tamesada T. Identification of feedback bridging faults with oscillation. In Proc. the 8th Asian Test Symposium, Nov. 1999, pp.25–30.

  13. Konuk H, Ferguson F J. Oscillation and sequential behavior caused by opens in the routing in digital CMOS circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1998, 17(11): 1200–1210.

    Article  Google Scholar 

  14. Berdjag D, Zolghadri A, Cieslak J, Goupil P. Fault detection and isolation for redundant aircraft sensors. In Proc. SysTol 2010, Oct. 2010, pp.137–142.

  15. Kwak S W, Choi B J, Kim B K. Optimal checkpointing strategy for real-time control systems under faults with exponential duration. IEEE Trans. Reliability, 2001, 50(3): 293–301.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seong Woo Kwak.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 77.3 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kwak, S.W., You, KH. & Yang, JM. Checkpoint Management with Double Modular Redundancy Based on the Probability of Task Completion. J. Comput. Sci. Technol. 27, 273–280 (2012). https://doi.org/10.1007/s11390-012-1222-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-012-1222-3

Keywords

Navigation