Skip to main content

Failure Detectors

  • Reference work entry
  • First Online:
  • 127 Accesses

Years and Authors of Summarized Original Work

  • 1996; Chandra, Toueg

Problem Definition

A distributed system is comprised of a collection of processes. The processes typically seek to achieve some common task by communicating through message passing or shared memory. Most interesting tasks require, at least at certain points of the computation, some form of agreement between the processes. An abstract form of such agreement is consensus where processes need to agree on a single value among a set of proposed values. Solving this seemingly elementary problem is at the heart of reliable distributed computing and, in particular, of distributed database commitment, total ordering of messages, and emulations of many shared object types.

Fischer, Lynch, and Paterson's seminal result in the theory of distributed computing [13] says that consensus cannot be deterministically solved in an asynchronousdistributed system that is prone to process failures. This impossibility holds consequently for...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   1,599.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   1,999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Aguilera MK, Delporte-Gallet C, Fauconnier H, Toueg S (2003) On implementing omega with weak reliability and synchrony assumptions. In: 22th ACM symposium on principles of distributed computing, pp 306–314

    Google Scholar 

  2. Bertier M, Marin O, Sens P (2003) Performance analysis of a hierarchical failure detector. In: Proceedings 2003 international conference on dependable systems and networks (DSN 2003), San Francisco, 22–25 June 2003, pp 635–644

    Google Scholar 

  3. Boroswsky E, Gafni E (n.d.) Generalized FLP impossibility result for t-resilient asynchronous computations. In: Proceedings of the 25th ACM symposium on theory of computing. ACM, pp 91–100

    Google Scholar 

  4. Chandra TD, Hadzilacos V, Toueg S (1996) The weakest failure detector for solving consensus. J ACM 43(4):685–722

    Article  MathSciNet  MATH  Google Scholar 

  5. Chandra TD, Toueg S (1996) Unreliable failure detectors for reliable distributed systems. J ACM 43(2):225–267

    Article  MathSciNet  MATH  Google Scholar 

  6. Chauduri S (1993) More choices allow more faults: set consensus problems in totally asynchronous systems. Inf Comput 105(1):132–158

    Article  MathSciNet  MATH  Google Scholar 

  7. Chen W, Toueg S, Aguilera MK (2002) On the quality of service of failure detectors. IEEE Trans Comput 51(1):13–32

    Article  MathSciNet  Google Scholar 

  8. Delporte-Gallet C, Fauconnier H, Guerraoui R (2002) Failure detection lower bounds on registers and consensus. In: Proceedings of the 16th international symposium on distributed computing, LNCS, vol 2508

    Google Scholar 

  9. Delporte-Gallet C, Fauconnier H, Guerraoui R (2005) Implementing atomic objects in a message passing system. Technical report, EPFL Lausanne

    Google Scholar 

  10. Dwork C, Lynch NA, Stockmeyer L (1988) Consensus in the presence of partial synchrony. J ACM 35(2):288–323

    Article  MathSciNet  Google Scholar 

  11. Felber P, Guerraoui R, Fayad M (1999) Putting oo distributed programming to work. Commun ACM 42(11):97–101

    Article  Google Scholar 

  12. Fernández A, Jiménez E, Raynal M (2006) Eventual leader election with weak assumptions on initial knowledge, communication reliability and synchrony. In: Proceedings of the international symposium on dependable systems and networks (DSN), pp 166–178

    Google Scholar 

  13. Fischer MJ, Lynch NA, Paterson MS (1985) Impossibility of distributed consensus with one faulty process. J ACM 32(2):374–382

    Article  MathSciNet  MATH  Google Scholar 

  14. Guerraoui R (2000) Indulgent algorithms. In: Proceedings of the 19th annual ACM symposium on principles of distributed computing, ACM, Portland, pp 289–297

    Google Scholar 

  15. Herlihy M (1991) Wait-free synchronization. ACM Trans Program Lang Syst 13(1):123–149

    Article  Google Scholar 

  16. Herlihy M, Shavit N (1993) The asynchronous computability theorem for t-resilient tasks. In: Proceedings of the 25th ACM symposium on theory of computing, pp 111–120

    Google Scholar 

  17. Keidar I, Rajsbaum S (2002) On the cost of fault-tolerant consensus when there are no faults-a tutorial. In: Tutorial 21st ACM symposium on principles of distributed computing

    Google Scholar 

  18. Lamport L (1998) The part-time parliament. ACM Trans Comput Syst 16(2):133–169

    Article  Google Scholar 

  19. Lo W-K, Hadzilacos V (1994) Using failure detectors to solve consensus in asynchronous shared memory systems. In: Proceedings of the 8th international workshop on distributed algorithms. LNCS, vol 857, pp 280–295

    Google Scholar 

  20. Lynch N (1996) Distributed algorithms. Morgan Kauffman

    MATH  Google Scholar 

  21. Michel R, Corentin T (2006) In search of the holy grail: looking for the weakest failure detector for wait-free set agreement. Technical Report TR 06-1811, INRIA

    Google Scholar 

  22. Saks M, Zaharoglou F (1993) Wait-free k-set agreement is impossible: the topology of public knowledge. In: Proceedings of the 25th ACM symposium on theory of computing, ACM, pp 101–110

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rachid Guerraoui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this entry

Cite this entry

Guerraoui, R. (2016). Failure Detectors. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_140

Download citation

Publish with us

Policies and ethics