Abstract
A new class of fault tolerance techniques is introduced: Time-staggered redundancy is a modification of static redundancy (replication of processes and fault masking). Some of the replicas are executed in parallel, others with an adjustable delay. The latter contribute to n-out-of-m majority voting as usual, and to backward error recovery, too. The delayed processes represent former state information of the process system and therefore can be taken as a recovery point. Staggered execution of process copies enables the concepts of static and dynamic redundancy at a time — without additional checkpointing overhead. As comparison tests and acceptance tests can be applied both, a higher degree of fault tolerance is achieved. Moreover, testing the results of the early processes detects when wrong input data have been processed. In this case improved input data are requested for the late processes. Finally correct output data are chosen among the results of all processes (early and late ones). Time-staggered redundancy should be preferred if multiple faults of different types have to be tolerated, and if time redundancy is limited, but sufficient for delayed process execution. In contrast to periodic or event-driven checkpointing, available time redundancy can be used completely for backward error recovery at any time: The late processes serve as “computing recovery points” with “continuous checkpointing”.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. G. Akt On the Security of Compressed Encodings. Crypto 83, Conf. Proc., Plenum Press, New York, 1984, pp. 209–230.
T. Anderson, P. A. Lee: Fault Tolerance — Principles and Practice. Prentice Hall, London, 1981.
A. Avizienis et at The UCLA Dedix System: A Distributed Testbed for Multiple-Version Software. FTCS-15, Conf. Proc., IEEE, 1986, pp. 126-134.
O. Babaoglu, R. Drummond, P. Stephenson: The Impact of Communication Network Properties on Reliable Broadcast Protocols. FTCS-16, Conf. Proc., IEEE, 1986, pp. 212-217.
W. Bücken: Synchronisierung von Prozeßexemplaren bei zeitgestaffelter statischer Redundanz. Diplomarbeit, Fak. für Informatik, Univ. Karlsruhe, 1986.
J. M. Chang, N. F. Maxemchuk: Reliable Broadcast Protocols. ACM ToCS 2, No. 3,1984, pp. 251–273.
B. Chor, B. Coan: A Simple and Efficient Randomized Byzantine Agreement Algorithm. IEEE Trans. Softw. Eng. SE-11, No. 6, 1985, pp. 531–539.
E C. Cooper: Replicated Distributed Programs. ACM Operating Systems Review 19, No. 5, 1985, pp. 53–78.
F. Cristian, H Aghili, R. Strong: Atomic Broadcast: From simple Message Diffusion to Byzantine Agreement. FTCS-15, Conf. Proc., IEEE, 1985, pp. 200-206.
M. Dal Cin et al: ATTEMPTO, a Fault-Tolerant Multiprocessor Working Station, Design and Concepts. FTCS-13, Conf. Proc., IEEE, 1983, pp. 10-13.
Denning Cryptography and Data Security. Addison Wesley Publishing Company, London, 1982.
F. Demmelmeier, W. Ries: Implementierung von anwendungsspezifischer Fehlertoleranz für Prozeßautomatisierungssysteme. IFB 54, Springer, Heidelberg, 1982, pp. 299–314.
M. Dertinger: Vergleichende Bewertung von Fehlertoleranz-Verfahren aufgrund zeitgestaffelter statischer Redundanz. Diplomarbeit, Fak. für Informatik, Univ. Karlsruhe, 1986.
K. Echtle: Fehlermaskierende verteilte Systeme zur Erfüllung hoher Zuverlässigkeitsanforderungen in Prozeßrechner-Netzen. IFB 78, Springer, Heidelberg, 1984, pp. 315–328.
K. Echtle: Fehlermodellierung bei Simulation und Verifikation von Fehlertoleranz-Algorithmen für verteilte Systeme. IFB 83, Springer, Heidelberg, 1984, pp. 73–88.
K. Echtle: Fehlermaskierung durch verteilte Systeme. PhD-Thesis, IFB 121, Springer, Heidelberg, 1986.
K. Echtle: Fault-Masking with Reduced Redundant Communication. FTCS-16, Conf. Proc, IEEE, 1986, pp. 178-183.
K. Echtle: Fault Masking and Sequence Agreement by a Voting Protocol with Low Message Number. 6th Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc. IEEE, 1987.
R. A. Frohwerk: Signature Analysis: A New Digital Field Service Method Hewlett-Packard Journal, May 1977, pp. 2-8.
P. Gunningberg Voting and Redundancy Management implemented by Protocols in Distributed Systems. FTCS-13, Conf. Proc., IEEE, 1983, pp. 182-185.
R. Hofmann: Fehlerbehandlung bei zeitgestaffelter statischer Redundanz. Diplomarbeit, Fak. für Informatik, Univ. Karlsruhe, 1986.
K. Küspert: Datenbank-Recovery und Fehlertoleranz in Datenbanksystemen. Newsletter of GI-NTG-GMA-Fachgruppe Fehlertolerierende Rechensysteme, Jan. 1986, pp. 4-19.
L. Lamport, R. Shostak, M. Pease: The Byzantine Generals Problem. ACM ToPLaS 4, No. 3, 1982, pp. 382–401.
G. LeLann: Issues in Fault-Tolerant Real-Time Local Area Networks. 5th Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc., IEEE, 1986, pp. 28-32.
N. Lynch, M. Fischer, R. Fowler: A Simple and Efficient Byzantine Generals Algorithm. 2nd Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc., IEEE, 1982, pp. 46-52.
L. Mancini: Modular Redundancy in a Message Passing System. IEEE Trans. Softw. Eng SE-12, No. 1, 1986, pp. 79–86.
F. Ptteli, H. Garcia-Molina: Database Processing with Triple Modular Redundancy. 5th Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc., IEEE, 1986, pp. 95-103.
M. L. Powell, D. L. Presotto: Publishing A Reliable Broadcast Communication Mechanism. ACM Operating Systems Review 17, No. 5, 1983, pp. 100–109.
D. Pradham, S. M. Reddy: A Fault-Tolerant Communication Architecture for Distributed Systems. FTCS-11, Conf. Proc., IEEE, 1981, pp. 214-220.
R. K. Scott, J. W. Gault, D. F. McAllister: The consensus recovery block. Total systems reliability symposium, U. S. National Bureau of Standards NBS, Gaithersburg 12 /1983, pp. 74-85.
F. B. Schneiden Byzantine Generals in Action: Implementing Fail-Stop Processors. ACM ToCS 2, No. 2, 1984, pp. 145–154.
H. R. Strong D. Dolev: Byzantine Agreement. Comosac 83, Conf. Proc., IEEE 1983, pp. 77-81.
N. Theuretzbacher: VOTRICS: Voting Triple Modular Computing System FTCS-16, Conf. Proc., IEEE, 1986, pp. 144-150.
M. N. Wegman, L Carter: New Classes and Applications of Hash Functions. 20th Annual Symp. on Foundations of Computer Science, Conf. Proc, 1979, pp. 175-182.
G. York, D. Siewiorek, Z. Segall: Software-Voting in Asynchronous NMR Computer Structures. Int. Report CMU CS 83 128, Carnegie-Melon Uni, 1983.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1987 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Echtle, K. (1987). Fault Tolerance based on Time-Staggered Redundancy. In: Belli, F., Görke, W. (eds) Fehlertolerierende Rechensysteme / Fault-Tolerant Computing Systems. Informatik-Fachberichte, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45628-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-45628-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-18294-8
Online ISBN: 978-3-642-45628-2
eBook Packages: Springer Book Archive