A continuum of failure models for distributed computing

Garay, Juan A.; Perry, Kenneth J.

doi:10.1007/3-540-56188-9_11

Juan A. Garay^1,2 &
Kenneth J. Perry²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 647))

Included in the following conference series:

International Workshop on Distributed Algorithms

221 Accesses
17 Citations

Abstract

A range of models of distributed computing is presented in which processors may fail either by crashing or by exhibiting arbitrary (Byzantine) behavior. In these models, the total number of faulty processors is bounded from above by a constant t subject to the proviso that no more than b <= t of these processors are Byzantine. At the two extremes of the range (i.e., b=0 or b=t) we get models that are equivalent to the traditional models of either pure crash failures or pure Byzantine failures. For 0<b<t, the models that we introduce accommodate “real-world” experience that shows that the overwhelming majority of failures are crashes but occasionally some number of less-restrictive failures occur. We examine the Reliable Broadcast and Consensus problems within this new family of models and prove lower bounds on the relationship required between the number of processors, t, and b. We also present protocols to solve these problems, which match the lower bounds. In presenting the protocols, we emphasize new algorithmic techniques that are fruitful to use in the new models but which have limited value in either of the pure models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Better Sooner Rather Than Later

A Closer Look at Fault Tolerance

Article 15 May 2017

The missing piece: a distributed system-level diagnosis model for the implementation of unreliable failure detectors

Article 18 August 2023

References

R. Bazzi and G. Neiger. Optimally simulating crash failures. In Proceedings of the Fifth International Workshop on Distributed Algorithms. Springer-Verlag, 1991.
Google Scholar
P. Berman, J.A. Garay, and K.J. Perry. Towards optimal distributed consensus. In Proceedings of the Thirtieth Annual Symposium on Foundations of Computer Science, pages 410–415. IEEE Computer Society Press, 1989.
Google Scholar
D. Dolev. The byzantine generals strike again. Journal of Algorithms, 3(1):14–30, 1982.
Google Scholar
D. Dolev, R. Reischuk, and H.R. Strong. Early stopping in byzantine agreement. Journal of the ACM, 37(4):720–741, 1990.
Google Scholar
V. Hadzilacos. Issues of fault tolerance in concurrent computations. Ph.D. Dissertation, Harvard University, 1984.
Google Scholar
L. Lamport. The weak byzantine generals problem. Journal of the ACM, 30(3):668–676, 1983.
Google Scholar
L. Lamport and M. Fischer. Byzantine generals and transaction commit protocols. Technical Report Opus 62, SRI, 1982.
Google Scholar
L. Lamport, R.E. Shostak, and M. Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401, 1982.
Article Google Scholar
G. Neiger and S. Toueg. Automatically increasing the fault-tolerance of distributed algorithms. Journal of Algorithms, 11(3):374–419, 1990.
Google Scholar
M. Pease, R.E. Shostak, and L. Lamport. Reaching agreement in the presence of faults. Journal of the ACM, 27(2):228–234, 1980.
Google Scholar
K.J. Perry and S. Toueg. Distributed agreement in the presence of processor and communication faults. IEEE Transactions on Software Engineering, 12(3):477–482, 1986.
Google Scholar
K.J. Perry S. Toueg and T.K. Srikanth. Fast distributed agreement. SIAM Journal of Computing, 16(3):445–457, 1987.
Google Scholar
T.K. Srikanth and S. Toueg. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Distributed Computing, 2(2):80–94, 1987.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics and Computer Science, The Weizmann Institute of Science, 76100, Rehovot, Israel
Juan A. Garay
I.B.M. T.J. Watson Research Center, P.O. Box 704, 10598, Yorktown Heights, New York
Juan A. Garay & Kenneth J. Perry

Authors

Juan A. Garay
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth J. Perry
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Adrian Segall Shmuel Zaks

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garay, J.A., Perry, K.J. (1992). A continuum of failure models for distributed computing. In: Segall, A., Zaks, S. (eds) Distributed Algorithms. WDAG 1992. Lecture Notes in Computer Science, vol 647. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56188-9_11

Download citation

DOI: https://doi.org/10.1007/3-540-56188-9_11
Published: 04 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56188-0
Online ISBN: 978-3-540-47484-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A continuum of failure models for distributed computing

Abstract

Access this chapter

Preview

Similar content being viewed by others

Better Sooner Rather Than Later

A Closer Look at Fault Tolerance

The missing piece: a distributed system-level diagnosis model for the implementation of unreliable failure detectors

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A continuum of failure models for distributed computing

Abstract

Access this chapter

Preview

Similar content being viewed by others

Better Sooner Rather Than Later

A Closer Look at Fault Tolerance

The missing piece: a distributed system-level diagnosis model for the implementation of unreliable failure detectors

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation