Abstract
When a group of processors in a distributed system cooperate with each other on processing of a common task, it is often necessary for the non-faulty processors to have a mutually consistent knowledge of the set of processors that can be considered to be non-faulty. The set of non-faulty processors in the group — known as the group membership — will change for example when a processor crashes or when a crashed processor, after restart, joins the group. These changes should be known by all non-faulty processors as quickly as possible within a known bounded time interval. We present an algorithm by which non-faulty processors of a group of bounded size will be able to maintain a consistent and timely knowledge of the group membership. Processors in the group are assumed to execute the algorithm in a synchronous manner and at periodic intervals or cycles of some fixed length. In an execution of the proposed algorithm, every non-faulty processor knows of any processor failure within at most two cycles following the cycle in which the failure occurred, and a restarted processor can join the group in two cycles. At most less than half the number of processors are assumed to fail in any three consecutive cycles.
Preview
Unable to display preview. Download preview PDF.
References
Birman, K.; Joseph, T. "Reliable Communication in the Presence of Failures". ACM Transactions on Computer Systems, Vol. 5, No 1. February 1985. pp 47–76.
Cristian, F.; Aghili, H.; Strong, R.; Dolev, D. "Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement". Proceedings 15th International Symposium on Fault-Tolerant COmputing. Ann Arbor, MI. June 1985. pp 200–206.
Cristian, F. "Agreeing on who is Present and who is Absent in a Synchronous Distributed System". 18th International Symposium on Fault-Tolerant Computing. Tokyo, Japan. June 1988. pp 206–211.
Cristian, F. "Synchronous Atomic Broadcast for Redundant Broadcast Channels". IBM Research Report RJ7203. April 1990.
Ezhilchelvan, P.D.; Lemos, R. "A Robust Group Membership Algorithm for Distributed Real-Time Systems". Proceedings of the 11th Real-Time Systems Symposium. Orlando, Florida. December 1990.
Kopetz, H.; Grunsteidl, G.; Reisinger, J. "Fault-Tolerant Membership Service in a Distributed Real-Time System". Int. Conference on Dependable Computing for Critical Applications. Santa Barbara, CA. August, 1989. pp 167–174.
Melliar-Smith, P.M.; Moser, L.M.; Agarwala. "Broadcast Protocols for Distributed Systems". IEEE Transactions on Parallel and Distributed Systems Vol.1, No 1. January 1990. pp 17–25.
Navaratnam, S.; Chanson, S.; Neufeld, G. "Reliable Group Communication in Distributed Systems". Proc 8th International Conference on Distributed Computing Systems. June, 1988. pp 439–446.
Peterson, L.; Buchholz, N.C.; Schlichting, R.D. "Preserving and Using Context Information in Interprocess Communication". ACM TOCS Vol. 7, No. 3. August 1989. pp 217–246.
Powell, D. et al. "The Delta-4 Approach to Dependability in Open Distributed Computing Systems. 18th International Symposium on Fault-Tolerant Computing. Tokyo, Japan. June 1988. pp 83–93.
Schlichting, R.D.; Schneider, F.B. "Fail-Stop Processors: An Approach to Design Fault-Tolerant Computing Systems". ACM Transactions on Computer Systems, Vol 1, No 3. August 1983. pp 222–234.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Lemos, R., Ezhilchelvan, P.D. (1991). Agreement on the group membership in synchronous distributed systems. In: van Leeuwen, J., Santoro, N. (eds) Distributed Algorithms. WDAG 1990. Lecture Notes in Computer Science, vol 486. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-54099-7_24
Download citation
DOI: https://doi.org/10.1007/3-540-54099-7_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54099-1
Online ISBN: 978-3-540-47405-0
eBook Packages: Springer Book Archive