Abstract
Animportant problem in the construction of fault-tolerant distributed database systems is the design of nonblocking transaction commit protocols. This problem has been extensively studied for synchronous systems (i.e., systems where no messages ever arrive late). In this paper, the synchrony assumption is relaxed. A new partially synchronous timing model is described. Developed for this model is a new nonblocking randomized transaction commit protocol, which incorporates an agreement protocol of Ben-Or. The new protocol works as long as fewer than half the processors fail. A matching lower bound is proved, showing that the number of processor faults tolerated is optimal. If half or more of the processors fail, the protocol degrades gracefully: it blocks, but no processor produces a wrong answer. A notion of asynchronous round is defined, and the protocol is shown to terminate in a small constant expected number of asynchronous rounds. In contrast it is shown that no protocol in this model can guarantee that a processor terminates in a bounded expected number of its own steps, even if processors are synchronous.
Similar content being viewed by others
References
Ben-Or M: Another advantage of free choice: Completely asynchronous agreement protocols. In: Proc 2nd Annu ACM Symp Principles Distrib Comput 1983, pp 27–30
Coan BA, Lundelius J: Transaction commit in a realistic fault model. In: Proc 5th Annu ACM Symp Principles Distrib Comput 1986, pp 40–51
Chor B, Merritt M, Shmoys D: Simple constant-time consensus protocols in realistic failure models. J ACM 36:591–614 (1989)
Dolev D, Dwork C, Stockmeyer L: On the minimal synchronism needed for distributed consensus. J ACM 36:77–97 (1987)
Dwork C, Lynch NA, Stockmeyer L: Consensus in the presence of partial synchrony. J ACM 35:288–323 (1988)
Dwork C, Skeen D: The inherent cost of nonblocking commitment. In: Proc 2nd Annu ACM Symp Principles Distrib Comput 1983, pp 1–11
Dwork C, Skeen D: Patterns of communication in consensus protocols. In: Proc 3rd Annu ACM Symp Principles Distrib Comput 1984, pp 143–153
Fischer MJ, Lynch NA, Paterson MS: Impossibility of distributed consensus with one faulty process. J ACM 32:374–382 (1985)
Gray J: Notes on database operating systems. In: Bayer R, Graham RM, Seegmüller G (eds) Operating systems: an advanced course. Lect Notes Comput Sci, vol 60 Springer, Berlin Heidelberg New York 1978, pp 393–481
Halpern JY, Moses YO. Knowledge and common knowledge in a distributed environment. In: Proc 3rd Annu ACM Symp Principles Distrib Comput 1984, pp 50–61 (revised as of Jan. 1986 as IBM-RJ-4421)
Rabin MO: Randomized Byzantine generals. In: Proc 24th Annu IEEE Symp Found Comput Sci 1983, pp 403–409
Skeen D: Crash recovery in a distributed database system. Ph.D. dissertation, University of California, Berkeley 1982 (available as UCB/ERL M82/45)
Author information
Authors and Affiliations
Additional information
Brian A. Coan received the B.S.E. degree in electrical engineering and computer science from Princeton University, Princeton, New Jersey, in 1977; the M.S. degree in computer engineering from Stanford University, Stanford, California, in 1979; and the Ph.D. degree in computer science from the Massachusetts Institute of Technology, Cambridge, Massachusetts, in 1987. He has worked for Amdahl Corporation and AT & T Bell Laboratories. Currently he is a member of the technical staff at Bellcore. His main research interest is fault tolerance in distributed systems.
Jennifer Lundelius Welch received her B.A. in 1979 from the University of Texas at Austin, and her S.M. and Ph.D. from the Massachusetts Institute of Technology in 1984 and 1988 respecively. She was a member of technical staff at GTE Laboratories Incorporated in Waltham, Massachusetts, from 1988 to 1989. She is currently an assistant professor at the University of North Carolina in Chapel Hill. Her research interests include algorithms and lower bounds for distributed computing.
The authors were with the MIT Laboratory for Computer Science when the bulk of this work was done. This work was supported in part by the Advanced Research Projects Agency of the Department of Defense under Contract N00014-83-K-0125, the National Science Foundation under Grant DCR-83-02391, the Office of Army Research under Contract DAAG29-84-K-0058, and the Office of Naval Research under Contract N00014-85-K-0168. A preliminary version of this paper appears in theProceedings of the Fifth Annual ACM Symposium on Principles of Distributed Computing [2]
Rights and permissions
About this article
Cite this article
Coan, B.A., Welch, J.L. Transaction commit in a realistic timing model. Distrib Comput 4, 87–103 (1990). https://doi.org/10.1007/BF01786634
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01786634