Skip to main content
Log in

On implementing omega in systems with weak reliability and synchrony assumptions

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

We study the feasibility and cost of implementing Ω—a fundamental failure detector at the core of many algorithms—in systems with weak reliability and synchrony assumptions. Intuitively, Ω allows processes to eventually elect a common leader. We first give an algorithm that implements Ω in a weak system S where (a) except for some unknown timely process s, all processes may be arbitrarily slow or may crash, and (b) only the output links of s are eventually timely (all other links can be arbitrarily slow and lossy). Previously known algorithms for Ω worked only in systems that are strictly stronger than S in terms of reliability or synchrony assumptions.We next show that algorithms that implement Ω in system S are necessarily expensive in terms of communication complexity: all correct processes (except possibly one) must send messages forever; moreover, a quad-ratic number of links must carry messages forever. This result holds even for algorithms that tolerate at most one crash. Finally, we show that with a small additional assumption to system S—the existence of some unknown correct process whose links can be arbitrarily slow and lossy but fair—there is a communication-efficient algorithm for Ω such that eventually only one process (the elected leader) sends messages. Some recent experimental results indicate that two of the algorithms for Ω described in this paper can be used in dynamically-changing systems and work well in practice [Schiper, Toueg in Proceedings of the 38th International Conference on Dependable Systems and Networks, pp. 207–216 (2008)].

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Type fairness and a comparison with other link fairness properties (in preparation)

  2. Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Stable leader election. In: Proceedings of the 15th International Symposium on Distributed Computing, pp. 108–122. LNCS, vol. 2180. Springer, Heidelberg (2001)

  3. Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: On implementing Omega with weak reliability and synchrony assumptions. In: Proceedings of the 22nd ACM Symposium on Principles of Distributed Computing, pp. 306–314 (2003)

  4. Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Communication-efficient leader election and consensus with limited link synchrony. In: Proceedings of the 23rd ACM Symposium on Principles of Distributed Computing, pp. 328–337 (2004)

  5. Bertier, M., Marin, O., Sens, P.: Implementation and performance evaluation of an adaptable failure detector. In: Proceedings of the 2002 International Conference on Dependable Systems and Networks, pp. 354–363 (2002)

  6. Castro M., Liskov B.: Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comp. Syst. 20(4), 398–461 (2002)

    Article  Google Scholar 

  7. Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering perspective (invited talk). In: Proceedings of the 26th ACM Symposium on Principles of Distributed Computing, pp. 398–407 (2007)

  8. Chandra T.D., Hadzilacos V., Toueg S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)

    Article  MathSciNet  Google Scholar 

  9. Chandra T.D., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)

    Article  MathSciNet  Google Scholar 

  10. Chen W., Toueg S., Aguilera M.K.: On the quality of service of failure detectors. IEEE Trans. Comp. 51(5), 561–580 (2002)

    Article  MathSciNet  Google Scholar 

  11. Chu F.: Reducing Ω to ◊ W. Inf. Process. Lett. 67(6), 298–293 (1998)

    Google Scholar 

  12. Deianov, B., Toueg, S.: Failure detector service for dependable computing. In: Proceedings of the 2000 International Conference on Dependable Systems and Networks, pp. B14–B15 (2000)

  13. Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: Shared memory vs. message passing. Research Report IC/2003/77, EPFL (2003)

  14. Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Hadzilacos, V., Kouznetsov, P., Toueg, S.: The weakest failure detectors to solve certain fundamental problems in distributed computing. In: Proceedings of the 23rd ACM Symposium on Principles of Distributed Computing, pp. 338–346 (2004)

  15. Dolev D., Dwork C., Stockmeyer L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77–97 (1987)

    Article  MathSciNet  Google Scholar 

  16. Dutta, P., Guerraoui, R.: Fast indulgent consensus with zero degradation. In: Proceedings of the 4th European Dependable Computing Conference, pp. 191–208. LNCS, vol. 2485. Springer, Heidelberg (2002)

  17. Dwork C., Lynch N.A., Stockmeyer L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)

    Article  MathSciNet  Google Scholar 

  18. Eisler J., Hadzilacos V., Toueg S.: The weakest failure detector to solve nonuniform consensus. Distributed Comput. 19(4), 335–359 (2007)

    Article  Google Scholar 

  19. Fernández, A., Raynal, M.: From an intermittent rotating star to a leader. Tech. Rep. 1810, IRISA, Université de Rennes, France (2006)

  20. Fetzer, C., Raynal, M., Tronel, F.: An adaptive failure detection protocol. In: Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing, pp. 146–153 (2001)

  21. Gafni E., Lamport L.: Disk Paxos. Distrib. Comp. 16(1), 1–20 (2003)

    Article  Google Scholar 

  22. Hutle, M., Malkhi, D., Schmid, U., Zhou, L.: Chasing the weakest system model for implementing Omega and consensus. Research Report 74/2005, Technische Universität Wien, Institut für Technische Informatik (2005)

  23. Jiménez E., Arévalo S., Fernández A.: Implementing unreliable failure detectors with unknown membership. Inf. Process. Lett. 100(2), 60–63 (2006)

    Article  Google Scholar 

  24. Keidar, I., Rajsbaum, S.: On the cost of fault-tolerant consensus when there are no faults—a tutorial. Slides of tutorial presentation in the 21th ACM Symposium on Principles of Distributed Computing (2002)

  25. Lamport L.: The part-time parliament. ACM Trans. Comp. Syst. 16(2), 133–169 (1998)

    Article  Google Scholar 

  26. Lamport L.: Paxos made simple. SIGACT News 32(4), 18–25 (2001)

    Google Scholar 

  27. Larrea, M., Arévalo, S., Fernández, A.: Efficient algorithms to implement unreliable failure detectors in partially synchronous systems. In: Proceedings of the 13th International Symposium on Distributed Algorithms, pp. 34–48. LNCS, vol. 1693. Springer, Heidelberg (1999). A revised version of this paper appeared in IEEE Trans. on Comp. 53(7):815–828, July 2004

    Google Scholar 

  28. Larrea, M., Fernández, A., Arévalo, S.: Optimal implementation of the weakest failure detector for solving consensus. In: Proceedings of the 19th Symposium on Reliable Distributed Systems, pp. 52–59 (2000)

  29. Larrea M., Fernández A., Arévalo S.: Eventually consistent failure detectors. J. Parallel Distrib. Comp. 65(3), 361–373 (2005)

    Google Scholar 

  30. Malkhi, D., Oprea, F., Zhou, L.: Omega meets Paxos: leader election and stability without eventual timely links. In: Proceedings of the 19th International Conference on Distributed Computing, pp. 199–213. LNCS, vol. 3724. Springer, Heidelberg (2005)

  31. Mostefaoui, A., Mourgaya, E., Raynal, M.: Asynchronous implementation of failure detectors. In: Proceedings of the 2003 International Conference on Dependable Systems and Networks, pp. 351–360 (2003)

  32. Mostefaoui A., Raynal M.: Leader-based consensus. Parallel Process. Lett. 11(1), 95–107 (2001)

    Article  MathSciNet  Google Scholar 

  33. Mostefaoui A., Raynal M., Travers C.: Time-free and timer-based assumptions can be combined to obtain eventual leadership. IEEE Trans. Parallel Distrib. Syst. 17(7), 656–666 (2006)

    Article  Google Scholar 

  34. Prisco R.D., Lampson B., Lynch N.A.: Revisiting the Paxos algorithm. Theor. Comput. Sci. 243, 35–91 (2000)

    Article  Google Scholar 

  35. van Renesse, R., Minsky, Y., Hayden, M.: A gossip-style failure detection service. In: Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing, pp. 55–70 (1998)

  36. Schiper, N., Toueg, S.: A robust and lightweight stable Leader Election Service for dynamic systems. In: Proceedings of the 38th International Conference on Dependable Systems and Networks, pp. 207–216 (2008)

Download references

Author information

Authors and Affiliations

Authors

Additional information

This paper was originally invited to the special issue of Distributed Computing based on selected papers presented at the 22nd ACM Symposium on Principles of Distributed Computing (PODC 2003). It appears separately due to publication delays.

Research supported in part by the National Science and Engineering Research Council of Canada.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H. et al. On implementing omega in systems with weak reliability and synchrony assumptions. Distrib. Comput. 21, 285–314 (2008). https://doi.org/10.1007/s00446-008-0068-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-008-0068-y

Keywords

Navigation