Skip to main content
Log in

Reaching agreement on processor-group membrship in synchronous distributed systems

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Reaching agreement on the identity of correctly functioning processors of a distributed system in the presence of random communication delays, failures and processor joins is a fundamental problem in fault-tolerant distributed systems. Assuming a synchronous communication network that is not subject to partition occurrences, we specify the processor-group membership problem and we propose three simple protocols for solving it. The protocols provide all correct processors with consistent views of the processor-group membership and guarantee bounded processor failure detection and join delays.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Birman K, Joseph T: Reliable communication in the presence of failures. ACM Trans Comput Syst 5(1): 47–76 (1987)

    Google Scholar 

  2. Carr R: The tandem global update protocol. Tandem Systems Review, June 1985

  3. Chang JM, Maxemchuk N: Reliable broadcast protocols. ACM Trans Comput Syst 2(3): 251–273 (1984)

    Google Scholar 

  4. Cristian F, Aghili H, Strong R, Dolev D: Atomic broadcast: from simple diffusion to Byzantine agreement. 15th Int Conf on Fault-tolerant computing, Ann Arbor, Michigan, 1985

  5. Cristian F, Aghili H, Strong R: Approximate clock synchronization despite omission and performance failures and processor joins. 16th Int Conf on Fault-tolerant computing, Wien, Austria, 1986

  6. Cristian F: Agreeing on who is present and who is absent in a synchronous distributed system. 18th Int Conf on Fault-tolerant computing, Tokyo, Japan, 1988

  7. Cristian F: Synchronous atomic broadcast for redundant broadcast channels. J Real-Time Syst 2: 195–212 (1990)

    Google Scholar 

  8. Cristian F: Understanding fault-tolerant distributed systems. IBM Res Rep RJ6980, 1990 (to appear in Communications of the ACM, 1991)

  9. El Abbadi A, Skeen D, Cristian F: An efficient fault-tolerant protocol for replicated data management. Proc. 4th Annual ACM Conference on Principles of Database Systems, Portland, Oregon, 1985

  10. Garcia-Molina H: Elections in a distributed computing system. IEEE Trans Comput C-31(1): 48–59 (1982)

    Google Scholar 

  11. Kopetz H, Grünsteidl G, Reisinger J: Fault-tolerant membership service in a synchronous distributed real-time system. Proc. IFIP Working Conference on “Dependable Computing for Critical Applications”, Santa Barbara, August 1989

  12. Kronenberg N, Levy H, Strecker W. VAX clusters, a closely coupled distributed system. ACM Trans Comput Syst 4(2): 130–146 (1986)

    Google Scholar 

  13. Lamport L: Using time instead of timeout for fault tolerant distributed systems. ACM Trans Program Lang Syst 6(2):254–280 (1984)

    Google Scholar 

  14. Le Lann G: Algorithms for distributed data sharing systems which use tickets. Proc 3rd Berkeley workshop on distributed data management and computer networks, 1982

  15. Strong R, Skeen D, Cristian F, Aghili H: Handshake protocols. 7th Int Conf on Distributed Computing Systems, Berlin, September 1987

  16. Walter B: A robust and efficient protocol for checking the availability of remote sites. 6th Berkeley workshop on distributed data management and computer networks, 1982

Download references

Author information

Authors and Affiliations

Authors

Additional information

Flaviu Cristian is a computer scientist at the IBM Almaden Research Center in San Jose, California. He received his PhD from the University of Grenoble, France, in 1979. After carrying out research in operating systems and programming methodology in France and working on the specification, design, and verification of fault-tolerant software in England, he joined IBM in 1982. Since then he has worked in the area of fault-tolerant distributed systems and protocols. He has participated in the design and implementation of a highly available distributed system prototype at the Almaden Research Center, has reviewed and consulted for several fault-tolerant distributed system designs, both in Europe and the American divisions of IBM, and is now a technical leader in the design of a new Air Traffic Control System for the US which must satisfy very stringent availability requirements.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cristian, F. Reaching agreement on processor-group membrship in synchronous distributed systems. Distrib Comput 4, 175–187 (1991). https://doi.org/10.1007/BF01784719

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01784719

Key words

Navigation