Abstract
This paper proposes a checkpoint rollback strategy for real-time systems with double modular redundancy. Without built-in fault-detection and spare processors, our scheme is able to recover from both transient and permanent faults. Two comparisons are conducted at each checkpoint. First, the states stored in two consecutive checkpoints of one processor are compared for checking integrity of the processor. The states of two processors are also compared for detecting faults and the system rolls back to the previous checkpoint whenever required by logic of the proposed scheme. A Markov model is induced by the fault recovery scheme and analyzed to provide the probability of task completion within its deadline. The optimal number of checkpoints is selected so as to maximize the probability of task completion.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Young J W. A first order approximation to the optimal check-point intervals. Commun. the ACM, 1974, 17(9): 530–531.
Naruse K, Umemura S, Nakagawa, S. Optimal checkpointing interval for two-level recovery schemes. Computers and Mathematics with Applications, 2006, 51(2): 371–376.
Ziv A, Bruck J. Performance optimization of checkpointing schemes with task duplication. IEEE Transactions on Computers, 1997, 46(12): 1381–1386.
Nakagawa S, Fukumoto S, Ishii N. Optimal checkpointing intervals for a double modular redundancy with signatures. Comput. and Math. with Applicat., 2003, 46(7): 1089–1094.
Krishina C M, Shin K G. Real-Time Systems. McGraw-Hill, 1997.
Pradhan D K, Vaidya N H. Roll-forward checkpointing scheme: A novel fault-tolerant architecture. IEEE Transactions on Computers, 1994, 43(10): 1163–1174.
Ziv A, Bruck J. Analysis of checkpointing schemes with task duplication. IEEE Trans. Computers, 1998, 47(2): 222–227.
Pradhan D K, Vaidya N H. Roll-forward and rollback recovery: Performance-reliability trade-off. IEEE Transactions on Computers, 1997, 46(3): 372–378.
Tiwari A, Tomko K A. Enhanced reliability of finite-state machines in FPGA through efficient fault detection and correction. IEEE Transactions on Reliability, 2005, 54(3): 459–467.
Yang J M, Kwak S W. A checkpoint scheme with task duplication considering transient and permanent fault. In Proc. IEEE Int. Conf. Industrial Engineering and Engineering Management (IEEM2010), Dec. 2010, pp.606–610.
Karpovsky M, Su S Y H. Detection and location of input and feedback bridging faults among input and output lines. IEEE Transactions on Computers, 1980, C-29(6): 523–527.
Hashizume M, Yotsuyanagi H, Tamesada T. Identification of feedback bridging faults with oscillation. In Proc. the 8th Asian Test Symposium, Nov. 1999, pp.25–30.
Konuk H, Ferguson F J. Oscillation and sequential behavior caused by opens in the routing in digital CMOS circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1998, 17(11): 1200–1210.
Berdjag D, Zolghadri A, Cieslak J, Goupil P. Fault detection and isolation for redundant aircraft sensors. In Proc. SysTol 2010, Oct. 2010, pp.137–142.
Kwak S W, Choi B J, Kim B K. Optimal checkpointing strategy for real-time control systems under faults with exponential duration. IEEE Trans. Reliability, 2001, 50(3): 293–301.
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Kwak, S.W., You, KH. & Yang, JM. Checkpoint Management with Double Modular Redundancy Based on the Probability of Task Completion. J. Comput. Sci. Technol. 27, 273–280 (2012). https://doi.org/10.1007/s11390-012-1222-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-012-1222-3