Skip to main content

A Formal Model for Fault-Tolerance in Distributed Systems

  • Conference paper
Computer Safety, Reliability, and Security (SAFECOMP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 3688))

Included in the following conference series:

Abstract

We present a formal method based on graph rewriting systems for the specifications and the proofs of fault-tolerant distributed algorithms. Our method deals with crash failures. In a crash failure system the process can fail by crashing, i.e. by permanently halting. The faulty processes are the processes contaminated by the crashes. The methodology is formalized in two phases. In the first phase, we build the set of illegitimate configurations to specify the faults and the faulty processes. The second phase is devoted to the addition of correction rules in the initial graph rewriting system used to encode the distributed algorithm. These rules are able to detect and eliminate the faults locally during the computation. This method can be implemented under an asynchronous message passing system which notifies the faults. To illustrate this approach, we present examples of fault-tolerant distributed spanning tree algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anagnostou, E., Hadzilacos, V.: Tolerating transient and permanent failures. In: Schiper, A. (ed.) WDAG 1993. LNCS, vol. 725, pp. 174–188. Springer, Heidelberg (1993)

    Google Scholar 

  2. Arora, A., Gouda, M.: Closure and convergence: A foundation of fault-tolerant computing. IEEE Trans. Softw. Eng. 19(11), 1015–1027 (1993)

    Article  Google Scholar 

  3. Attie, P.C., Arora, A., Emerson, E.A.: Synthesis of fault-tolerant concurrent programs. ACM Trans. Program. Lang. Syst. 26(1), 125–185 (2004)

    Article  Google Scholar 

  4. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed system. Journal of the ACM 43(2), 225–267 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  5. Dijkstra, E.W.: Self stabilizing systems in spite of distributed control. Communications of the ACM 17(11), 643–644 (1974)

    Article  MATH  Google Scholar 

  6. Fischer, M.J., Lynch, N.A., Merritt, M.: Easy impossibility proofs for distributed consensus problems. In: PODC 1985: Proceedings of the fourth annual ACM symposium on Principles of distributed computing, pp. 59–70. ACM Press, New York (1985)

    Chapter  Google Scholar 

  7. Fisher, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. Journal of the ACM 32(2), 374–382 (1985)

    Article  Google Scholar 

  8. Gartner, F.: Fundamentals of fault-tolerant distributed computing in asynchronous environments. ACM Comput. Surv. 31(1), 1–26 (1999)

    Article  MathSciNet  Google Scholar 

  9. Hamid, B., Mosbah, M.: An automatic approach to self-stabilization. In: 6th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2005), Baltimore, USA, May 2005, pp. 129–132 (2005) (to appear)

    Google Scholar 

  10. Hamid, B., Mosbah, M.: An implementation of a failure detector for local computations in graphs. In: Proccedings of the 23rd IASTED International multi-conference on parallel and distributed computing and networks (February 2005)

    Google Scholar 

  11. Kulkarni, S.S., Arora, A.: Automating the addition of fault-tolerance. In: Proceedings of the 6th International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems, pp. 82–93. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  12. Kutten, S., Peleg, D.: Tight fault locality. SIAM J. Comput. 30(1), 247–268 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  13. Lamport, L., Shostak, R., Pease, M.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)

    Article  MATH  Google Scholar 

  14. Laprie, J.C.: Dependability—Basic Concepts and Terminology. Dependable Computing and Fault-tolerant Systems, vol. 5. Springer, Heidelberg (1992), IFIP WG 10.4

    MATH  Google Scholar 

  15. Litovsky, I., Métivier, Y., Sopena, E.: Graph relabeling systems and distributed algorithms. In: Ehrig, H., Kreowski, H.J., Montanari, U., Rozenberg, G. (eds.) Handbook of graph grammars and computing by graph transformation, vol. III, pp. 1–56. World Scientific Publishing, Singapore (1999)

    Google Scholar 

  16. Métivier, Y., Mosbah, M., Sellami, A.: Proving distributed algorithmes by graph relabeling systems: Example of tree in networks with processor identities. In: Applied Graph Transformations (AGT 2002), Grenoble (April 2002)

    Google Scholar 

  17. Porat, A.: Maintenance of a spanning tree in dynamic networks. In: PODC 1999: Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing, p. 282. ACM Press, New York (1999)

    Chapter  Google Scholar 

  18. Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hamid, B., Mosbah, M. (2005). A Formal Model for Fault-Tolerance in Distributed Systems. In: Winther, R., Gran, B.A., Dahll, G. (eds) Computer Safety, Reliability, and Security. SAFECOMP 2005. Lecture Notes in Computer Science, vol 3688. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563228_9

Download citation

  • DOI: https://doi.org/10.1007/11563228_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29200-5

  • Online ISBN: 978-3-540-32000-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics