Skip to main content

Transient fault detectors

Extended abstract

  • Contributed Papers
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1499))

Abstract

In this paper we present failure detectors that detect transient failures, i.e. corruption of the system state without corrupting the program of the processors. We distinguish task which is the problem to solve, from implementation which is the algorithm that solve the problem. A task is specified as a desired output of the distributed system. The mechanism used to produce this output is not a concern of the task but a concern of the implementation.

In addition we are able to classify both the distance locality and the history locality property of tasks. The distance locality is related to the diameter of the system configuration that a failure detector has to maintain in order to detect a transient fault. The history locality is related to the number of consecutive system configurations that a failure detector has to maintain in order to detect a transient fault.

Part of this research was done while visiting the Laboratoire de Recherche en Informatique, Bâtiment 490, Université de Paris Sud. Partly supported by the Israeli ministry of science and arts grant #6756195.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Y. Afek, and S. Dolev, “Local Stabilizer,” Proc. of the 5th Israeli Symposium on Theory of Computing and Systems, pp. 74–84, 1997. Brief announcement in Proc. of the 16th Annual ACM Symp. on Principles of Distributed Computing, pp. 287, 1997.

    Google Scholar 

  2. Y. Afek, S. Kutten, and M. Yung, “Memory efficient self-stabilization on general networks”, Proc. 4th Workshop on Distributed Algorithms, pp. 15–28, 1990.

    Google Scholar 

  3. B. Awerbuch, B. Patt-Shamir and G. Varghese, “Self-stabilization by local checking and correction,” Proc. 32nd IEEE Symp. on Foundations of Computer Science, pp. 268–277, 1991.

    Google Scholar 

  4. B. Awerbuch, B. Patt-Shamir, G. Varghese, and S. Dolev, “Self-Stabilization by Local Checking and Global Reset,” Proc. of the 8th International Workshop on Distributed Algorithms, pp. 226–239, 1994.

    Google Scholar 

  5. T. Chandra and S. Toueg, “Unreliable failure detectors for asynchronous systems,” Journal of the ACM, 43(2):225–267, March 1996.

    Article  MATH  MathSciNet  Google Scholar 

  6. E. W. Dijkstra, “Self-Stabilizing Systems in Spite of Distributed Control”, Communications of the ACM 17,11 (1974), pp. 643–644.

    Article  MATH  Google Scholar 

  7. S. Dolev, “Self-Stabilizing Routing and Related Protocols,” Journal of Parallel and Distributed Computing, 42, 122–127, 1997.

    Article  Google Scholar 

  8. S. Dolev, T. Herman, “SuperStabilizing protocols for dynamic distributed systems,” Proceedings of the Second Workshop on Self-Stabilizing Systems, 1995.

    Google Scholar 

  9. S. Dolev, G. Gouda, and M. Schneider, “Memory Requirements for Silent Stabilization,” Proc. of the 15th Annual ACM Symp. on Principles of Distributed Computing, pp. 27–34, 1996.

    Google Scholar 

  10. S. Dolev, A. Israeli and S. Moran, “Self Stabilization of Dynamic Systems Assuming Only Read/Write Atomicity”, Distributed Computing, Vol. 7, pp. 3–16, 1993. Proc. of the Ninth Annual ACM Symposium on Principles of Distributed Computation, Montreal, August 1990, pp. 103–117. Proc. of the first Workshop on Self-Stabilizing Systems, 1989.

    Article  Google Scholar 

  11. S. Dolev, A. Israeli, and S. Moran. “Analyzing expected time by scheduler-luck games”, IEEE Transactions on Software Engineering, Vol. 21, pp. 429–439, 1995.

    Article  Google Scholar 

  12. S. Dolev, E. Kranakis, D. Krizanc and D. Peleg. “Bubbles: Adaptive Routing Scheme for High-Speed Dynamic Networks”, Proc. of the 27th Annual ACM Symposium on the Theory of Computing, pp. 528–536, 1995. To appear in SIAM Journal on Computing.

    Google Scholar 

  13. S. Ghosh, A. Gupta, T. Herman and S. V. Pemmaraju, “Fault-Containing Self-Stabilizing Algorithms”, Proc. of the Fifteenth Annual ACM Symposium on Principles of Distributed Computation, Philadelphia, May 1996, pp. 45–54.

    Google Scholar 

  14. J. E Burns, M. G Gouda, and R. E Miller. “Stabilization and pseudo-stabilization”, Distributed Computing, Vol. 7, pp. 35–42, 1993.

    Article  MATH  Google Scholar 

  15. S. Katz and K. J. Perry, “Self-stabilizing extensions for message-passing systems”, Distributed Computing, Vol. 7, pp. 17–26, 1993. Proc. of the Ninth Annual ACM Symposium on Principles of Distributed Computation, Montreal, August 1990, pp. 91–101. Proc. of the first Workshop on Self-Stabilizing Systems, 1989.

    Article  Google Scholar 

  16. S. Kutten and B. P. Shamir, “Time-Adaptive Self Stabilization”, Proc. of the Sixteenth Annual ACM Symposium on Principles of Distributed Computing pp. 149–158, 1997.

    Google Scholar 

  17. C. Lin and J. Simon, “Observing self-stabilization,” Proc. of the Eleventh Annual ACM Symposium on Principles of Distributed Computing pp. 113–123, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Shay Kutten

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Beauquier, J., DelaËt, S., Dolev, S., Tixeuil, S. (1998). Transient fault detectors. In: Kutten, S. (eds) Distributed Computing. DISC 1998. Lecture Notes in Computer Science, vol 1499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0056474

Download citation

  • DOI: https://doi.org/10.1007/BFb0056474

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65066-9

  • Online ISBN: 978-3-540-49693-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics