On implementing omega in systems with weak reliability and synchrony assumptions

Aguilera, Marcos K.; Delporte-Gallet, Carole; Fauconnier, Hugues; Toueg, Sam

doi:10.1007/s00446-008-0068-y

On implementing omega in systems with weak reliability and synchrony assumptions

Published: 26 August 2008

Volume 21, pages 285–314, (2008)
Cite this article

Distributed Computing Aims and scope Submit manuscript

Marcos K. Aguilera¹^nAff2,
Carole Delporte-Gallet³,
Hugues Fauconnier³ &
…
Sam Toueg⁴

156 Accesses
37 Citations
Explore all metrics

Abstract

We study the feasibility and cost of implementing Ω—a fundamental failure detector at the core of many algorithms—in systems with weak reliability and synchrony assumptions. Intuitively, Ω allows processes to eventually elect a common leader. We first give an algorithm that implements Ω in a weak system S where (a) except for some unknown timely process s, all processes may be arbitrarily slow or may crash, and (b) only the output links of s are eventually timely (all other links can be arbitrarily slow and lossy). Previously known algorithms for Ω worked only in systems that are strictly stronger than S in terms of reliability or synchrony assumptions.We next show that algorithms that implement Ω in system S are necessarily expensive in terms of communication complexity: all correct processes (except possibly one) must send messages forever; moreover, a quad-ratic number of links must carry messages forever. This result holds even for algorithms that tolerate at most one crash. Finally, we show that with a small additional assumption to system S—the existence of some unknown correct process whose links can be arbitrarily slow and lossy but fair—there is a communication-efficient algorithm for Ω such that eventually only one process (the elected leader) sends messages. Some recent experimental results indicate that two of the algorithms for Ω described in this paper can be used in dynamically-changing systems and work well in practice [Schiper, Toueg in Proceedings of the 38th International Conference on Dependable Systems and Networks, pp. 207–216 (2008)].

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Packet Efficient Implementation of the Omega Failure Detector

Optimistically tuning synchronous byzantine consensus: another win for null messages

Article 10 June 2021

Packet Efficient Implementation of the Omega Failure Detector

Article 23 February 2018

References

Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Type fairness and a comparison with other link fairness properties (in preparation)
Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Stable leader election. In: Proceedings of the 15th International Symposium on Distributed Computing, pp. 108–122. LNCS, vol. 2180. Springer, Heidelberg (2001)
Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: On implementing Omega with weak reliability and synchrony assumptions. In: Proceedings of the 22nd ACM Symposium on Principles of Distributed Computing, pp. 306–314 (2003)
Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Communication-efficient leader election and consensus with limited link synchrony. In: Proceedings of the 23rd ACM Symposium on Principles of Distributed Computing, pp. 328–337 (2004)
Bertier, M., Marin, O., Sens, P.: Implementation and performance evaluation of an adaptable failure detector. In: Proceedings of the 2002 International Conference on Dependable Systems and Networks, pp. 354–363 (2002)
Castro M., Liskov B.: Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comp. Syst. 20(4), 398–461 (2002)
Article Google Scholar
Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering perspective (invited talk). In: Proceedings of the 26th ACM Symposium on Principles of Distributed Computing, pp. 398–407 (2007)
Chandra T.D., Hadzilacos V., Toueg S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)
Article MathSciNet Google Scholar
Chandra T.D., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)
Article MathSciNet Google Scholar
Chen W., Toueg S., Aguilera M.K.: On the quality of service of failure detectors. IEEE Trans. Comp. 51(5), 561–580 (2002)
Article MathSciNet Google Scholar
Chu F.: Reducing Ω to ◊ W. Inf. Process. Lett. 67(6), 298–293 (1998)
Google Scholar
Deianov, B., Toueg, S.: Failure detector service for dependable computing. In: Proceedings of the 2000 International Conference on Dependable Systems and Networks, pp. B14–B15 (2000)
Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: Shared memory vs. message passing. Research Report IC/2003/77, EPFL (2003)
Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Hadzilacos, V., Kouznetsov, P., Toueg, S.: The weakest failure detectors to solve certain fundamental problems in distributed computing. In: Proceedings of the 23rd ACM Symposium on Principles of Distributed Computing, pp. 338–346 (2004)
Dolev D., Dwork C., Stockmeyer L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77–97 (1987)
Article MathSciNet Google Scholar
Dutta, P., Guerraoui, R.: Fast indulgent consensus with zero degradation. In: Proceedings of the 4th European Dependable Computing Conference, pp. 191–208. LNCS, vol. 2485. Springer, Heidelberg (2002)
Dwork C., Lynch N.A., Stockmeyer L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)
Article MathSciNet Google Scholar
Eisler J., Hadzilacos V., Toueg S.: The weakest failure detector to solve nonuniform consensus. Distributed Comput. 19(4), 335–359 (2007)
Article Google Scholar
Fernández, A., Raynal, M.: From an intermittent rotating star to a leader. Tech. Rep. 1810, IRISA, Université de Rennes, France (2006)
Fetzer, C., Raynal, M., Tronel, F.: An adaptive failure detection protocol. In: Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing, pp. 146–153 (2001)
Gafni E., Lamport L.: Disk Paxos. Distrib. Comp. 16(1), 1–20 (2003)
Article Google Scholar
Hutle, M., Malkhi, D., Schmid, U., Zhou, L.: Chasing the weakest system model for implementing Omega and consensus. Research Report 74/2005, Technische Universität Wien, Institut für Technische Informatik (2005)
Jiménez E., Arévalo S., Fernández A.: Implementing unreliable failure detectors with unknown membership. Inf. Process. Lett. 100(2), 60–63 (2006)
Article Google Scholar
Keidar, I., Rajsbaum, S.: On the cost of fault-tolerant consensus when there are no faults—a tutorial. Slides of tutorial presentation in the 21th ACM Symposium on Principles of Distributed Computing (2002)
Lamport L.: The part-time parliament. ACM Trans. Comp. Syst. 16(2), 133–169 (1998)
Article Google Scholar
Lamport L.: Paxos made simple. SIGACT News 32(4), 18–25 (2001)
Google Scholar
Larrea, M., Arévalo, S., Fernández, A.: Efficient algorithms to implement unreliable failure detectors in partially synchronous systems. In: Proceedings of the 13th International Symposium on Distributed Algorithms, pp. 34–48. LNCS, vol. 1693. Springer, Heidelberg (1999). A revised version of this paper appeared in IEEE Trans. on Comp. 53(7):815–828, July 2004
Google Scholar
Larrea, M., Fernández, A., Arévalo, S.: Optimal implementation of the weakest failure detector for solving consensus. In: Proceedings of the 19th Symposium on Reliable Distributed Systems, pp. 52–59 (2000)
Larrea M., Fernández A., Arévalo S.: Eventually consistent failure detectors. J. Parallel Distrib. Comp. 65(3), 361–373 (2005)
Google Scholar
Malkhi, D., Oprea, F., Zhou, L.: Omega meets Paxos: leader election and stability without eventual timely links. In: Proceedings of the 19th International Conference on Distributed Computing, pp. 199–213. LNCS, vol. 3724. Springer, Heidelberg (2005)
Mostefaoui, A., Mourgaya, E., Raynal, M.: Asynchronous implementation of failure detectors. In: Proceedings of the 2003 International Conference on Dependable Systems and Networks, pp. 351–360 (2003)
Mostefaoui A., Raynal M.: Leader-based consensus. Parallel Process. Lett. 11(1), 95–107 (2001)
Article MathSciNet Google Scholar
Mostefaoui A., Raynal M., Travers C.: Time-free and timer-based assumptions can be combined to obtain eventual leadership. IEEE Trans. Parallel Distrib. Syst. 17(7), 656–666 (2006)
Article Google Scholar
Prisco R.D., Lampson B., Lynch N.A.: Revisiting the Paxos algorithm. Theor. Comput. Sci. 243, 35–91 (2000)
Article Google Scholar
van Renesse, R., Minsky, Y., Hayden, M.: A gossip-style failure detection service. In: Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing, pp. 55–70 (1998)
Schiper, N., Toueg, S.: A robust and lightweight stable Leader Election Service for dynamic systems. In: Proceedings of the 38th International Conference on Dependable Systems and Networks, pp. 207–216 (2008)

Download references

Author information

Marcos K. Aguilera
Present address: Microsoft Research Silicon Valley, 1065 La Avenida, Mountain View, CA, 94043, USA

Authors and Affiliations

HP Laboratories, 1501 Page Mill Road, Palo Alto, CA, 94304, USA
Marcos K. Aguilera
LIAFA, Université Paris Diderot-Paris 7, Case 7014, 75205, Paris Cedex 13, France
Carole Delporte-Gallet & Hugues Fauconnier
Department of Computer Science, University of Toronto, Toronto, ON, M5S 3G4, Canada
Sam Toueg

Authors

Marcos K. Aguilera
View author publications
You can also search for this author in PubMed Google Scholar
Carole Delporte-Gallet
View author publications
You can also search for this author in PubMed Google Scholar
Hugues Fauconnier
View author publications
You can also search for this author in PubMed Google Scholar
Sam Toueg
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This paper was originally invited to the special issue of Distributed Computing based on selected papers presented at the 22nd ACM Symposium on Principles of Distributed Computing (PODC 2003). It appears separately due to publication delays.

Research supported in part by the National Science and Engineering Research Council of Canada.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H. et al. On implementing omega in systems with weak reliability and synchrony assumptions. Distrib. Comput. 21, 285–314 (2008). https://doi.org/10.1007/s00446-008-0068-y

Download citation

Received: 28 November 2003
Accepted: 03 June 2008
Published: 26 August 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s00446-008-0068-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On implementing omega in systems with weak reliability and synchrony assumptions

Abstract

Access this article

Similar content being viewed by others

Packet Efficient Implementation of the Omega Failure Detector

Optimistically tuning synchronous byzantine consensus: another win for null messages

Packet Efficient Implementation of the Omega Failure Detector

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On implementing omega in systems with weak reliability and synchrony assumptions

Abstract

Access this article

Similar content being viewed by others

Packet Efficient Implementation of the Omega Failure Detector

Optimistically tuning synchronous byzantine consensus: another win for null messages

Packet Efficient Implementation of the Omega Failure Detector

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation