Abstract
Fault tolerance will be a fundamental attribute of many future computing systems. We examine several technological trends and application requirements to justify this assertion. We identify some of the technical problems that have to be solved before large, complex fault-tolerant applications can be reliably developed. Considerations for selecting appropriate operating system abstractions towards supporting fault tolerance are discussed.
This work was supported in part by the Commission of European Communities under ESPRIT Programme Basic Research Action Number 3092 (Predictably Dependable Computing Systems) and the Italian Ministry of University, Research and Technology.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
L. Alvisi, A. Amoroso, A. Baronio, Ö. Babaoğlu, R. Davoli and L. A. Giachini. Parallel Scientific Computing in Distributed Systems: The Paralex Approach. In Proc. Sixth International Symposium on Computer and Information Sciences, Side, Antalya, Turkey, October 1991.
J. F. Bartlett, J. Gray and B. Host. Fault-Tolerance in Tandem Computer Systems. Proc. Symposium on the Evolution of Fault-Tolerant Computing, Baden, Austria, June 1986.
P.A. Bernstein. Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing. IEEE Computer, 21(2), pp. 37–45, February 1988.
K. P. Birman. Replication and Fault-Tolerance in the ISIS System. In Proc. Tenth Symposium on Operating System Principles, pp. 79–86, Orcas Island, Washington, December 1985.
A. Borg, W. Blau, W. Graetsch, F. Herrmann and W. Oberle. Fault Tolerance Under UNIX. ACM Transaction on Computer Systems, 7(1), pp. 1–24, February 1989.
F. Cristian, H. Aghili and R. Strong. Atomic Broadcasts: From Simple Message Diffusion to Byzantine Agreement. In Proc. 15th International Symposium on Fault-Tolerant Computing, pp. 200–206, July 1985.
F. Cristian, B. Dancey and J. Dehn. Fault-Tolerance in the Advanced Automation System. In Proc. 20th International Symposium on Fault-Tolerant Computing Systems, pp. 6–17, Newcastle upon Tyne, United Kingdom, June 1990.
J.-C. Laprie. Dependability: A Unifying Concept for Reliable Computing and Fault Tolerance. ESPRIT BRA Project PDCS Technical Report No. D1, 1989.
M. Pease, R. Shostak and L. Lamport. Reaching Agreement in the Presence of Faults. Journal of the ACM, vol. 27, no. 2, pp. 228–234, April 1980.
A. Z. Spector, D. S. Daniels, D. J. Duchamp, J. L. Eppinger and R. Pausch. Distributed Transactions for Reliable Systems. In Proc. Tenth Symposium on Operating System Principles, pp. 127–146, Orcas Island, Washington, December 1985.
R. E. Strom and S. Yemini. Optimistic Recovery in Distributed Systems. ACM Transaction on Computer Systems, vol. 3, no. 3, pp. 204–226, August 1985.
D. Taylor and G. Wilson. Stratus. In Dependability of Resilient Computers, pp. 222–236, T. Anderson (Ed.), BSP Professional Books, Oxford, England, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Babaoğlu, Ö. (1991). Fault tolerance support in future operating systems. In: Karshmer, A., Nehmer, J. (eds) Operating Systems of the 90s and Beyond. Lecture Notes in Computer Science, vol 563. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024537
Download citation
DOI: https://doi.org/10.1007/BFb0024537
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54987-1
Online ISBN: 978-3-540-46630-7
eBook Packages: Springer Book Archive