The Introduction of Fault-Tolerance in a Hierarchical Operating System

Soneru, Marius D.; Huen, Wing H.

doi:10.1007/978-3-642-69698-5_7

Marius D. Soneru⁴ &
Wing H. Huen⁴

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 84))

44 Accesses

Abstract

A general method for introducing fault-tolerance in a hierarchical operating system is presented here. First, a hierarchically structured conventional (non-fault-tolerant) operating system is described. In order to transform it into a fault-tolerant system, each conventional machine is augmented with an Error Detection and Recovery (EDR) mechanism, thus obtaining a corresponding fault-tolerant machine. It is determined that, from the standpoint of fault-tolerance, three types of machines can be identified: physical, kernel, and process type. The EDR mechanism makes a conventional machine fault-tolerant by transforming its conventional operations into fault-tolerant operations. To provide this transformation, a set of operations are defined for the EDR mechanism. A model for fault-tolerant operations is developed, such that known techniques for fault-tolerance (e.g. recovery block, N-version programming, etc.) can be represented as particular cases. The general fault-tolerant operating system obtained is a hierarchy of fault-tolerant machines, with the physical type machines at the bottom, followed by the kernel type machines above them, and the process type as the upper machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, T., Lee, P.A., and Shrivastava, S.K., “A model of recoverability in multilevel systems”, IEEE Trans. on Soft. Eng., Vol. SE-4, No. 6, Nov.1978.
Article Google Scholar
Anderson, T. and Lee, P.A., “The provision of recoverable interfaces”, The 9th Annual International Symposium on Fault-Tolerant Computing, June 1979, pp. 87–94.
Google Scholar
Anderson, T. and Randell, B., (editors), Computing systems reliability, Cambridge University Press, 1979, 482 p.
Google Scholar
Carter, W.C., “Fault detection and recovery algorithms for fault-tolerant systems”, EURO IFIP 1979.
Google Scholar
Chen, L. and Avizienis, A., “N-version programming: a fault-tolerance approach to reliability of software operation”, FTCS 8, June 1978, pp. 3–9.
Google Scholar
Denning, P., “Fault-tolerant operating systems”, ACM Computing Surveys, Vol. 8, No. 4, Dec.1976, pp. 359–389.
Article MATH Google Scholar
Dijkstra, E.W., “The structure of THE multiprogramming system”, CACM vol 11, no 5, 1968
Google Scholar
Haberman, A.N., Flon, L. and Cooprider, L., “Modularization and hierarchy in a family of operating systems”, Comm. ACM, Vol. 19, No. 5, May 1976, pp. 266–272.
Article Google Scholar
Hoare, C.A.R., “Monitors: an operating system structuring concept”, Comm. ACM, Oct.1974, pp. 549–557.
Google Scholar
Homing, J.J., Lauer, H.C., Melliar-Smith, P.M. and Randell, B., “A program structure for error detection and recovery”, Lecture Notes in Computer Science 16, Springer-Verlag, 1974, pp. 177–193.
Google Scholar
Kim, K.H., “Error detection, reconfiguration and recovery in distributed processing systems”, The 1st Int. Conf. on Distributed Computing Systems, Oct. 1979, pp. 284–295.
Google Scholar
Kopetz, H., “Software design for fault tolerance”, Proc. COMPSAC’80, Oct.1980.
Google Scholar
Liskov, B., “The design of the VENUS system”, Comm. ACM, Vol. 15, No. 3, March 1972, pp. 144–149.
Article Google Scholar
Neumann, P.G., Robinson, L., Lewitt, K.N., Boyer, R.S. and Feiertag, R.J., “A provable secure operating system: the system, its applications and proofs”, Project 4332 Final Report, Stanford Research Inst., Feb.1977.
Google Scholar
Ramamoorthy, C.V. and Cheung, R.C., “Design of fault-tolerant computing systems”, Applied computation theory: analysis, design, modeling, R.T.Yeh, editor, Prentice-Hall, 1976, pp. 281–373.
Google Scholar
Randell, B., “System structure for software fault tolerance”, IEEE Trans. Software Engineering SE-1, 2, June 1975, pp. 220–232.
Google Scholar
Randell, B., “Reliable computing systems”, Lecture Notes in Computer Science, Vol.60: Operating Systems, An Advanced Course, Springer-Verlag, 1978, pp. 282–391.
Google Scholar
Soneru, M.D., “Error management in fault-tolerant operating systems”, 1980 ACM Computer Science Conference, Kansas City, Feb. 1980.
Google Scholar
Soneru, M.D., “A methodology for the design and analysis of fault-tolerant operating systems”, Ph.D. dissertation, I.I.T., May 1981.
Google Scholar
Soneru, M.D., “Fault-Tolerant Operating Systems: A Point of View”, IBM Thomas J. Watson Research Center, Invited Talk, June 1982 Research Seminar.
Google Scholar
Soneru, M.D. and Suk, D.S., “Markov model for estimating the reliability of duplicated and repairable computing systems”, 19th Annual ACM/NBS Symposium, Maryland, June 1980.
Google Scholar
Yeh, R.T., editor, Special issue on fault-tolerant software, Computing Surveys, Vol. 8, No. 4, Dec.1976.
Google Scholar

Download references

Author information

Authors and Affiliations

AT&T Bell Laboratories, Naperville, USA
Marius D. Soneru & Wing H. Huen

Authors

Marius D. Soneru
View author publications
You can also search for this author in PubMed Google Scholar
Wing H. Huen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Systemtechnik (F2), Gesellschaft für Mathematik und Datenverarbeltung mbH Bonn, Schloß Birlinghoven, Postfach 1240, 5205, St. Augustin, USA
K.-E. Großpietsch
Institut für Informationsverarbeitung, Universität Tübingen, Köstlinstraße 6, 7400, Tübingen, Germany
M. Dal Cin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soneru, M.D., Huen, W.H. (1984). The Introduction of Fault-Tolerance in a Hierarchical Operating System. In: Großpietsch, KE., Dal Cin, M. (eds) Fehlertolerierende Rechensysteme. Informatik-Fachberichte, vol 84. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-69698-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-69698-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-13348-3
Online ISBN: 978-3-642-69698-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics