Skip to main content
Log in

Fault-tolerant atomic computations in an object-based distributed system

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

A distributed system can support fault-tolerant applications by replicating data and computation at nodes that have independent failure modes. We present a scheme called parallel execution threads (PET) which can be used to implement fault-tolerant computations in an object-based distributed system. In a system that replicates objects, the PET scheme can be used to replicate a computation by creating a number of parallel threads which execute with different replicas of the invoked objects. A computation can be completed successfully if at least one thread does not encounter any failed nodes and its completion preserves the consistency of the objects. The PET scheme can tolerate failures that occur during the execution of the computation as long as all threads are not affected by the failures. We present the algorithms required to implement the PET scheme and also address some performance issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ahamad M, Dasgupta P, LeBlanc R, Wilkes T.: Fault-tolerant computing in object based distributed operating systems. In: Proc 6th Symp on Reliability in Distributed Systems, March 1987

  2. Avizienis A: Then-version approach to fault-tolerant software. IEEE Trans Software Eng 11 (12): 1491–1501 (1985)

    Google Scholar 

  3. Bernabéu Aubán JM, Hutto PW, Khalidi MYA, Ahamad M, Appelbe WF, Dasgupta P, LeBlanc RJ, Ramachandran U: The architecture ofRa: a kernel forClouds. In Proc 22nd Annu Hawaii Int Conf on System Sciences, January 1989

  4. Bernstein PA, Goodman N: An algorithm for concurrency control and recovery in replicated distributed databases. ACM Trans Database Syst 9(4):596–615 (1984)

    Google Scholar 

  5. Birman K, Joseph T, Raeuchle R, El Abbadi A: Implementing fault-tolerant distributed objects. IEEE Trans Software Eng 11(6):502–508 (1985)

    Google Scholar 

  6. Cooper E: Replicated distributed programs. In: Proc 10th ACM Symp on Operating Systems Principles, December 1985

  7. Dasgupta P, LeBlanc RJ, Appelbee W: TheClouds distributed operating system. In: Proc Int Conf on Distributed Systems, June 1988

  8. Garcia Molina H: Elections in a distributed computing system. IEEE Trans. Comput C-31(1):48–59 (1982)

    Google Scholar 

  9. Gifford D: Weighted voting for replicated data. In: Proc 7th Symp on Operating Systems (Pacific Grove, California). ACM, December 1979

  10. Ng TP, Shi SSB: Replicated transactions. In: Proc 9th Int Conf on Distributed Computing Systems, pp 474–480. IEEE, June 1989

  11. Oki B, Liskov B: Viewstamped replication: a general primary copy method to support highly-available distributed systems. In: Proc 7th Symp on Principles of Distributed Computing, August 1988

  12. Ramachandran U, Ahamad M, Khalidi MY: Unifying synchronization and data transfer in maintaining coherence of distributed shared memory. In: Proc Int Conf on Parallel Processing, August 1989

  13. Stonebreaker M: Concurrency control and consistency of multiple copies of data in distributed INGRES. IEEE Trans Software Eng 5(3):188–194 (1979)

    Google Scholar 

  14. Yap KS, Jalote P, Tripathi S: Fault tolerant remote procedure calls. In: 8th Int Conf on Distributed Computing, June 1988

Download references

Author information

Authors and Affiliations

Authors

Additional information

Mustaque Ahamad received his B.E. (Hons.) degree in Electrical Engineering from the Birla Institute of Technology and Science, Pilani, India. He obtained his M.S. and Ph.D. degrees in Computer Science from the State University of New York at Stony Brook in 1983 and 1985 respectively. Since September 1985, he is an Assistant Professor in the School of Information and Computer Science at the Georgia Institute of Technology, Atlanta. His research interests include distributed operating systems, distributed algorithms, faulttolerant systems and performance evaluation.

Partha Dasgupta is an Assistant Professor at Georgia Tech since 1984. He has a Ph.D. in Computer Science from the State University of New York at Stony Brook. He is the technical project director of the Clouds distributed operating systems project, as well as a coprincipal investigator of Georgia Tech's NSF-CER award. His research interests include building distributed operating systems, distributed algorithms, fault-tolerant systems and distributed programming support.

Richard J. LeBlanc, Jr. received the B.S. degree in physics from Louisiana State University in 1972 and the M.S. and Ph.D. degrees in computer sciences from the University of Wisconsin-Madison in 1974 and 1977, respectively. He is currently a Professor in the School of Information and Computer Science of the Georgia Institute of Technology. His research interests include programming language design and implementation, programming environments, and software engineering. Dr. LeBlanc's current research work involves application of these interests in distributed processing systems. As co-director of the Clouds Project, he is studying language concepts and software engineering methodology for utilizing a highly reliable, object-based distributed system. He is also interested in specification-based software development methodologies and tools. Dr. LeBlanc is a member of the Association for Computing Machinery, the IEEE Computer Society and Sigma Xi.

This work was supported in part by NSF grants CCR-8619886 and CCR-8806358, and RADC contract number F30602-86-C-0032

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahamad, M., Dasgupta, P. & LeBlanc, R.J. Fault-tolerant atomic computations in an object-based distributed system. Distrib Comput 4, 69–80 (1990). https://doi.org/10.1007/BF01786632

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01786632

Key words