Abstract
An initial design for a fault tolerant, distributed version of UNIX was presented in an earlier paper [2]. That design left a number of open questions in two particular areas: Fault tolerance for server processes through which peripherals are accessed; recovery after a crash including the re-backup of processes. Since then, the fundamental design involving three-way message transmission has remained unchanged. However, server fault tolerance has been redesigned and is now more consistent with the fault tolerance of normal user processes. Recovery and re-backup have been completed in a more efficient manner than previously envisioned. In addition, important changes in the implementation have occurred. In this paper, we review the original design, borrowing heavily from the earlier paper in sections 1–3, and explain additions and modifications in later sections.
Preview
Unable to display preview. Download preview PDF.
References
Bartlett, J. A NonStop Kernel. Eighth Symposium on Operating Systems Principles, December, 1981.
Borg, A., Baumbach, J., Glazer, S. A Message System Supporting Fault Tolerance. Ninth Symposium on Operating Systems Principles, October, 1983.
Walter, B. A Robust and Efficient Protocol for Checking the Availability of Remote Sites. Sixth Workshop on Distributed Data Management and Computer Networks, December, 1982.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1990 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Borg, A., Blau, W., Oberle, W., Graetsch, W. (1990). Fault tolerance in distributed UNIX. In: Simons, B., Spector, A. (eds) Fault-Tolerant Distributed Computing. Lecture Notes in Computer Science, vol 448. Springer, New York, NY. https://doi.org/10.1007/BFb0042339
Download citation
DOI: https://doi.org/10.1007/BFb0042339
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-97385-2
Online ISBN: 978-0-387-34812-4
eBook Packages: Springer Book Archive