Abstract
A general method for introducing fault-tolerance in a hierarchical operating system is presented here. First, a hierarchically structured conventional (non-fault-tolerant) operating system is described. In order to transform it into a fault-tolerant system, each conventional machine is augmented with an Error Detection and Recovery (EDR) mechanism, thus obtaining a corresponding fault-tolerant machine. It is determined that, from the standpoint of fault-tolerance, three types of machines can be identified: physical, kernel, and process type. The EDR mechanism makes a conventional machine fault-tolerant by transforming its conventional operations into fault-tolerant operations. To provide this transformation, a set of operations are defined for the EDR mechanism. A model for fault-tolerant operations is developed, such that known techniques for fault-tolerance (e.g. recovery block, N-version programming, etc.) can be represented as particular cases. The general fault-tolerant operating system obtained is a hierarchy of fault-tolerant machines, with the physical type machines at the bottom, followed by the kernel type machines above them, and the process type as the upper machines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, T., Lee, P.A., and Shrivastava, S.K., “A model of recoverability in multilevel systems”, IEEE Trans. on Soft. Eng., Vol. SE-4, No. 6, Nov.1978.
Anderson, T. and Lee, P.A., “The provision of recoverable interfaces”, The 9th Annual International Symposium on Fault-Tolerant Computing, June 1979, pp. 87–94.
Anderson, T. and Randell, B., (editors), Computing systems reliability, Cambridge University Press, 1979, 482 p.
Carter, W.C., “Fault detection and recovery algorithms for fault-tolerant systems”, EURO IFIP 1979.
Chen, L. and Avizienis, A., “N-version programming: a fault-tolerance approach to reliability of software operation”, FTCS 8, June 1978, pp. 3–9.
Denning, P., “Fault-tolerant operating systems”, ACM Computing Surveys, Vol. 8, No. 4, Dec.1976, pp. 359–389.
Dijkstra, E.W., “The structure of THE multiprogramming system”, CACM vol 11, no 5, 1968
Haberman, A.N., Flon, L. and Cooprider, L., “Modularization and hierarchy in a family of operating systems”, Comm. ACM, Vol. 19, No. 5, May 1976, pp. 266–272.
Hoare, C.A.R., “Monitors: an operating system structuring concept”, Comm. ACM, Oct.1974, pp. 549–557.
Homing, J.J., Lauer, H.C., Melliar-Smith, P.M. and Randell, B., “A program structure for error detection and recovery”, Lecture Notes in Computer Science 16, Springer-Verlag, 1974, pp. 177–193.
Kim, K.H., “Error detection, reconfiguration and recovery in distributed processing systems”, The 1st Int. Conf. on Distributed Computing Systems, Oct. 1979, pp. 284–295.
Kopetz, H., “Software design for fault tolerance”, Proc. COMPSAC’80, Oct.1980.
Liskov, B., “The design of the VENUS system”, Comm. ACM, Vol. 15, No. 3, March 1972, pp. 144–149.
Neumann, P.G., Robinson, L., Lewitt, K.N., Boyer, R.S. and Feiertag, R.J., “A provable secure operating system: the system, its applications and proofs”, Project 4332 Final Report, Stanford Research Inst., Feb.1977.
Ramamoorthy, C.V. and Cheung, R.C., “Design of fault-tolerant computing systems”, Applied computation theory: analysis, design, modeling, R.T.Yeh, editor, Prentice-Hall, 1976, pp. 281–373.
Randell, B., “System structure for software fault tolerance”, IEEE Trans. Software Engineering SE-1, 2, June 1975, pp. 220–232.
Randell, B., “Reliable computing systems”, Lecture Notes in Computer Science, Vol.60: Operating Systems, An Advanced Course, Springer-Verlag, 1978, pp. 282–391.
Soneru, M.D., “Error management in fault-tolerant operating systems”, 1980 ACM Computer Science Conference, Kansas City, Feb. 1980.
Soneru, M.D., “A methodology for the design and analysis of fault-tolerant operating systems”, Ph.D. dissertation, I.I.T., May 1981.
Soneru, M.D., “Fault-Tolerant Operating Systems: A Point of View”, IBM Thomas J. Watson Research Center, Invited Talk, June 1982 Research Seminar.
Soneru, M.D. and Suk, D.S., “Markov model for estimating the reliability of duplicated and repairable computing systems”, 19th Annual ACM/NBS Symposium, Maryland, June 1980.
Yeh, R.T., editor, Special issue on fault-tolerant software, Computing Surveys, Vol. 8, No. 4, Dec.1976.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1984 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Soneru, M.D., Huen, W.H. (1984). The Introduction of Fault-Tolerance in a Hierarchical Operating System. In: Großpietsch, KE., Dal Cin, M. (eds) Fehlertolerierende Rechensysteme. Informatik-Fachberichte, vol 84. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-69698-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-69698-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-13348-3
Online ISBN: 978-3-642-69698-5
eBook Packages: Springer Book Archive