Abstract
Reaching agreement in a distributed system is a fundamental issue of both theoretical and practical importance. Consensus, Atomic Commitment, Atomic Broadcast, Group Membership which are different versions of this paradigmunderly much of existing fault-tolerant distributed systems. We describe these problems, explain their relationships, and state some fundamental results on their solvability, depending on the system model. We then review and compare basic techniques to circumvent impossibility results in asynchronous systems: randomization, models of partial synchrony, unreliable failure detection.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Y. Afek, H. Attiya, A. D. Fekete, M. Fischer, N. Lynch, Y. Mansour, D. Wang, and L. Zuck. Reliable communication over unreliable channels. Journal of the ACM, 41(6):1267–1297, 1994.
MarkosT Aguilera and Sam Toueg. A simple bivalency-based proof that t-resilient consensus requires t + 1 rounds. Information Processing Letters, 71(4):155–158, 1999.
Yair Amir, Danny Dolev, Shlomo Kramer, and Dalia Malki. Membership algorithms for multicast communication groups. In Proceedings of the Sixth International Workshop on Distributed Algorithms, volume 647 of Lecture Notes on Computer Science, pages 292–312. Springer-Verlag, November 1992.
Emmanuelle Anceaume, Bernadette Charron-Bost, Pascale Minet, and Sam Toueg. On the formal specification of group membership services. Technical report, INRIA, Rocquencourt, July 1995.
Özalp Babaoğlu, Renzo Davoli, Luigi-Alberto Giachini, and Mary Gray Baker. RELACS: a communications infrastructure for constructing reliable applications in large-scale distributed systems. BROADCAST Project deliverable report, 1994. Department of Computing Science, University of Newcastle upon Tyne, UK.
Z. Bar-Joseh and Michael Ben-Or. A tight lower bound for randomized synchronous consensus. In Proceedings of the Seventeenth ACM Symposium on Principles of Distributed Computing, pages 193–199, August 1998.
K. A. Bartlett, R. A. Scantlebury, and P. T. Wilkinson. A note on reliable fullduplex transmission over half-duplex links. Communication of the ACM, 12(5):260–261, 1969.
A. Basu, B. Charron-Bost, and S. Toueg. Simulating reliable links with unreliable links in the presence of process crashes. In Ö. Babaoğlu and K. Marzullo, editors, Proceedings of the Tenth International Workshop on Distributed Algorithms, volume 1151 of Lecture Notes on Computer Science, pages 105–122. Springer-Verlag, October 1996.
Michael Ben-Or. Another advantage of free choice: Completely asynchronous agreement protocols. In Proceedings of the Second ACM Symposium on Principles of Distributed Computing, pages 27–30, August 1983.
P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987.
Gabriel Bracha and Sam Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM, 32(4):824–840, October 1985.
T. D. Chandra, V. Hadzilacos, and S. Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 43(4):685–722, July 1996.
T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. Journal of the ACM, 43(2):225–267, March 1996.
Tushar Deepak Chandra, Vassos Hadzilacos, Sam Toueg, and Bernadette Charron-Bost. On the impossibility of group membership. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, pages 322–330, Philadelphia, Pennsylvania, USA, May 1996.
B. Charron-Bost and A. Schiper. Reliable broadcast is not so easy. Unpublished manuscript., July 2000.
B. Charron-Bost and A. Schiper. Uniformconsensus is harder than consensus. Technical Report DSC/2000/028, Département Systèmes de Communication, EPFL, May 2000.
Bernadette Charron-Bost. The weakest failure detector for solving atomic commitment. In preparation, July 2001.
Bernadette Charron-Bost and Sam Toueg. Comparing the atomic commitment and consensus problems. In preparation, January 2001.
Benny Chor and Cynthia Dwork. Randomization in byzantine agreement. Advances in Computer Research, 5:443–497, 1989.
Flaviu Cristian. Reaching agreement on processor group membership in synchronous distributed systems. Distributed Computing, 4(4):175–187, April 1991.
D. Dolev, C. Dwork, and L. Stockmeyer. On the minimal synchronism needed for distributed consensus. Journal of the ACM, 34(1):77–97, January 1987.
Danny Dolev, Rüdiger Reischuk, and H. Raymond Strong. Early stopping in Byzantine agreement. Journal of the ACM, 37(4):720–741, October 1990.
C. Dwork, N. A. Lynch, and L. Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):288–323, April 1988.
C. Dwork and D. Skeen. Patterns of communication in consensus protocols. In Proceedings of the 3rd Annual ACM Symposium on Principles of Distributed Computing, pages 143–153, August 1984.
Cynthia Dwork and Yoram Moses. Knowledge and common knowledge in a Byzantine environment: Crash failures. Information and Computation, 88(2):156–186, October 1990.
Paul D. Ezhilchelvan, Raimundo A. Macědo, and Santosh K. Shrivastava. Newtop: a fault-tolerant group communication protocol. In Proceedings of the 15th International Conference on Distributed Computing Systems, Vancouver, BC, Canada, June 1995.
M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382, April 1985.
Michael J. Fischer and Nancy A. Lynch. A lower bound for the time to assure interactive consistency. Information Processing Letters, 14:183–186, 1982.
James N. Gray. Notes on database operating systems. In R. Bayer, R. M. Graham, and G. Seegmuller, editors, Operating Systems: An Advanced Course, volume 66 of Lecture Notes on Computer Science. Springer-Verlag, 1978. Also appears as IBM Research Laboratory Technical report RJ2188.
V. Hadzilacos and S. Toueg. A modular approach to fault-tolerant broadcasts and related problems. Technical ReportTR 94-1425, Cornell University, Dept. of Computer Science, May 1994.
Vassos Hadzilacos. On the relationship between the atomic commitment and consensus problems. Workshop on Fault-Tolerant Distributed Computing, March 17-19, 1986, Pacific Grove, CA. Lecture Notes in Computer Science, Vol. 448. Springer-Verlag., 1986.
Matti A. Hiltunen and Richard D. Schlichting. Properties of membership services. In Proceedings of the Second International Symposium on Autonomous Decentralized Systems, Phoenix, AZ, April 1995.
Farnam Jahanian, Sameh Fakhouri, and Ragunathan Rajkumar. Processor group membership protocols: specification, design and implementation. In Proceeding of the Twelfth IEEE Symposium on Reliable Distributed Systems, pages 2–11, Princeton, October 1993.
M. Frans Kaashoek and Andrew S. Tanenbaum. Group communication in the amoeba distributed operating system. In Proceedings of the Eleventh International Conference on Distributed Computer Systems, pages 222–230, Arlington, TX, May 1991.
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565, July 1978.
Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401, July 1982.
N. A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.
P.M. Melliar-Smith, Louise Moser, and Vivek Agrawala. Processor membership in asynchronous distributed systems. IEEE Transactions on Parallel and Distributed Systems, 5(5):459–473, May 1994.
Shivakant Mishra, Larry L. Peterson, and Richard D. Schlichting. A membership protocol based on partial order. In Proceedings of the IEEE International Working Conference on Dependable Computing For Critical Applications, pages 137–145, Tucson, AZ, February 1991.
Yoram Moses and Sergio Rajsbaum. The unified structure of consensus: a layered analysis approach. In Proceedings of the Seventeenth ACM Symposium on Principles of Distributed Computing, pages 123–132, August 1998.
G. Neiger and S. Toueg. Automatically increasing the fault-tolerance of distributed algorithms. Journal of Algorithms, 11(3):374–419, 1990.
Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agreement in the presence of faults. Journal of the ACM, 27(2):228–234, April 1980.
Michael Rabin. Randomized Byzantine generals. In Proceedings of the Twenty-Fourth Symposium on Foundations of Computer Science, pages 403–409. IEEE Computer Society Press, November 1983.
Rüdiger Reischuk. A new solution for the Byzantine general’s problem. Technical Report RJ 3673, IBM Research Laboratory, November 1982.
Aleta Ricciardi and Ken Birman. Using process groups to implement failure detection in asynchronous environments. In Proceedings of the Tenth ACM Symposium on Principles of Distributed Computing, pages 341–351. ACM Press, August 1991.
Fred B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299–319, December 1990.
Dale Skeen. Nonblocking commit protocols. In Proceedings of the ACM SIGMOD Conf. on Management of Data, pages 133–147. ACM, June 1982.
N. V. Stenning. A data transfer protocol. Computer Networks, 1(2):99–110, 1976.
Robbert van Renesse, Kenneth P. Birman, Robert Cooper, Bradford Glade, and Patrick Stephenson. The horus system. In Kenneth P. Birman and Robbert van Renesse, editors, Reliable Distributed Computing with the Isis Toolkit, pages 133–147. IEEE Computer Society Press, Los Alamitos, CA, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Charron-Bost, B. (2001). Agreement Problems in Fault-Tolerant Distributed Systems. In: Pacholski, L., Ružička, P. (eds) SOFSEM 2001: Theory and Practice of Informatics. SOFSEM 2001. Lecture Notes in Computer Science, vol 2234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45627-9_2
Download citation
DOI: https://doi.org/10.1007/3-540-45627-9_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42912-8
Online ISBN: 978-3-540-45627-8
eBook Packages: Springer Book Archive