Abstract
We present a formal method based on graph rewriting systems for the specifications and the proofs of fault-tolerant distributed algorithms. Our method deals with crash failures. In a crash failure system the process can fail by crashing, i.e. by permanently halting. The faulty processes are the processes contaminated by the crashes. The methodology is formalized in two phases. In the first phase, we build the set of illegitimate configurations to specify the faults and the faulty processes. The second phase is devoted to the addition of correction rules in the initial graph rewriting system used to encode the distributed algorithm. These rules are able to detect and eliminate the faults locally during the computation. This method can be implemented under an asynchronous message passing system which notifies the faults. To illustrate this approach, we present examples of fault-tolerant distributed spanning tree algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anagnostou, E., Hadzilacos, V.: Tolerating transient and permanent failures. In: Schiper, A. (ed.) WDAG 1993. LNCS, vol. 725, pp. 174–188. Springer, Heidelberg (1993)
Arora, A., Gouda, M.: Closure and convergence: A foundation of fault-tolerant computing. IEEE Trans. Softw. Eng. 19(11), 1015–1027 (1993)
Attie, P.C., Arora, A., Emerson, E.A.: Synthesis of fault-tolerant concurrent programs. ACM Trans. Program. Lang. Syst. 26(1), 125–185 (2004)
Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed system. Journal of the ACM 43(2), 225–267 (1996)
Dijkstra, E.W.: Self stabilizing systems in spite of distributed control. Communications of the ACM 17(11), 643–644 (1974)
Fischer, M.J., Lynch, N.A., Merritt, M.: Easy impossibility proofs for distributed consensus problems. In: PODC 1985: Proceedings of the fourth annual ACM symposium on Principles of distributed computing, pp. 59–70. ACM Press, New York (1985)
Fisher, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. Journal of the ACM 32(2), 374–382 (1985)
Gartner, F.: Fundamentals of fault-tolerant distributed computing in asynchronous environments. ACM Comput. Surv. 31(1), 1–26 (1999)
Hamid, B., Mosbah, M.: An automatic approach to self-stabilization. In: 6th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2005), Baltimore, USA, May 2005, pp. 129–132 (2005) (to appear)
Hamid, B., Mosbah, M.: An implementation of a failure detector for local computations in graphs. In: Proccedings of the 23rd IASTED International multi-conference on parallel and distributed computing and networks (February 2005)
Kulkarni, S.S., Arora, A.: Automating the addition of fault-tolerance. In: Proceedings of the 6th International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems, pp. 82–93. Springer, Heidelberg (2000)
Kutten, S., Peleg, D.: Tight fault locality. SIAM J. Comput. 30(1), 247–268 (2000)
Lamport, L., Shostak, R., Pease, M.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)
Laprie, J.C.: Dependability—Basic Concepts and Terminology. Dependable Computing and Fault-tolerant Systems, vol. 5. Springer, Heidelberg (1992), IFIP WG 10.4
Litovsky, I., Métivier, Y., Sopena, E.: Graph relabeling systems and distributed algorithms. In: Ehrig, H., Kreowski, H.J., Montanari, U., Rozenberg, G. (eds.) Handbook of graph grammars and computing by graph transformation, vol. III, pp. 1–56. World Scientific Publishing, Singapore (1999)
Métivier, Y., Mosbah, M., Sellami, A.: Proving distributed algorithmes by graph relabeling systems: Example of tree in networks with processor identities. In: Applied Graph Transformations (AGT 2002), Grenoble (April 2002)
Porat, A.: Maintenance of a spanning tree in dynamic networks. In: PODC 1999: Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing, p. 282. ACM Press, New York (1999)
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hamid, B., Mosbah, M. (2005). A Formal Model for Fault-Tolerance in Distributed Systems. In: Winther, R., Gran, B.A., Dahll, G. (eds) Computer Safety, Reliability, and Security. SAFECOMP 2005. Lecture Notes in Computer Science, vol 3688. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563228_9
Download citation
DOI: https://doi.org/10.1007/11563228_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29200-5
Online ISBN: 978-3-540-32000-5
eBook Packages: Computer ScienceComputer Science (R0)