Skip to main content

The Introduction of Fault-Tolerance in a Hierarchical Operating System

  • Conference paper
Fehlertolerierende Rechensysteme

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 84))

  • 44 Accesses

Abstract

A general method for introducing fault-tolerance in a hierarchical operating system is presented here. First, a hierarchically structured conventional (non-fault-tolerant) operating system is described. In order to transform it into a fault-tolerant system, each conventional machine is augmented with an Error Detection and Recovery (EDR) mechanism, thus obtaining a corresponding fault-tolerant machine. It is determined that, from the standpoint of fault-tolerance, three types of machines can be identified: physical, kernel, and process type. The EDR mechanism makes a conventional machine fault-tolerant by transforming its conventional operations into fault-tolerant operations. To provide this transformation, a set of operations are defined for the EDR mechanism. A model for fault-tolerant operations is developed, such that known techniques for fault-tolerance (e.g. recovery block, N-version programming, etc.) can be represented as particular cases. The general fault-tolerant operating system obtained is a hierarchy of fault-tolerant machines, with the physical type machines at the bottom, followed by the kernel type machines above them, and the process type as the upper machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, T., Lee, P.A., and Shrivastava, S.K., “A model of recoverability in multilevel systems”, IEEE Trans. on Soft. Eng., Vol. SE-4, No. 6, Nov.1978.

    Article  Google Scholar 

  2. Anderson, T. and Lee, P.A., “The provision of recoverable interfaces”, The 9th Annual International Symposium on Fault-Tolerant Computing, June 1979, pp. 87–94.

    Google Scholar 

  3. Anderson, T. and Randell, B., (editors), Computing systems reliability, Cambridge University Press, 1979, 482 p.

    Google Scholar 

  4. Carter, W.C., “Fault detection and recovery algorithms for fault-tolerant systems”, EURO IFIP 1979.

    Google Scholar 

  5. Chen, L. and Avizienis, A., “N-version programming: a fault-tolerance approach to reliability of software operation”, FTCS 8, June 1978, pp. 3–9.

    Google Scholar 

  6. Denning, P., “Fault-tolerant operating systems”, ACM Computing Surveys, Vol. 8, No. 4, Dec.1976, pp. 359–389.

    Article  MATH  Google Scholar 

  7. Dijkstra, E.W., “The structure of THE multiprogramming system”, CACM vol 11, no 5, 1968

    Google Scholar 

  8. Haberman, A.N., Flon, L. and Cooprider, L., “Modularization and hierarchy in a family of operating systems”, Comm. ACM, Vol. 19, No. 5, May 1976, pp. 266–272.

    Article  Google Scholar 

  9. Hoare, C.A.R., “Monitors: an operating system structuring concept”, Comm. ACM, Oct.1974, pp. 549–557.

    Google Scholar 

  10. Homing, J.J., Lauer, H.C., Melliar-Smith, P.M. and Randell, B., “A program structure for error detection and recovery”, Lecture Notes in Computer Science 16, Springer-Verlag, 1974, pp. 177–193.

    Google Scholar 

  11. Kim, K.H., “Error detection, reconfiguration and recovery in distributed processing systems”, The 1st Int. Conf. on Distributed Computing Systems, Oct. 1979, pp. 284–295.

    Google Scholar 

  12. Kopetz, H., “Software design for fault tolerance”, Proc. COMPSAC’80, Oct.1980.

    Google Scholar 

  13. Liskov, B., “The design of the VENUS system”, Comm. ACM, Vol. 15, No. 3, March 1972, pp. 144–149.

    Article  Google Scholar 

  14. Neumann, P.G., Robinson, L., Lewitt, K.N., Boyer, R.S. and Feiertag, R.J., “A provable secure operating system: the system, its applications and proofs”, Project 4332 Final Report, Stanford Research Inst., Feb.1977.

    Google Scholar 

  15. Ramamoorthy, C.V. and Cheung, R.C., “Design of fault-tolerant computing systems”, Applied computation theory: analysis, design, modeling, R.T.Yeh, editor, Prentice-Hall, 1976, pp. 281–373.

    Google Scholar 

  16. Randell, B., “System structure for software fault tolerance”, IEEE Trans. Software Engineering SE-1, 2, June 1975, pp. 220–232.

    Google Scholar 

  17. Randell, B., “Reliable computing systems”, Lecture Notes in Computer Science, Vol.60: Operating Systems, An Advanced Course, Springer-Verlag, 1978, pp. 282–391.

    Google Scholar 

  18. Soneru, M.D., “Error management in fault-tolerant operating systems”, 1980 ACM Computer Science Conference, Kansas City, Feb. 1980.

    Google Scholar 

  19. Soneru, M.D., “A methodology for the design and analysis of fault-tolerant operating systems”, Ph.D. dissertation, I.I.T., May 1981.

    Google Scholar 

  20. Soneru, M.D., “Fault-Tolerant Operating Systems: A Point of View”, IBM Thomas J. Watson Research Center, Invited Talk, June 1982 Research Seminar.

    Google Scholar 

  21. Soneru, M.D. and Suk, D.S., “Markov model for estimating the reliability of duplicated and repairable computing systems”, 19th Annual ACM/NBS Symposium, Maryland, June 1980.

    Google Scholar 

  22. Yeh, R.T., editor, Special issue on fault-tolerant software, Computing Surveys, Vol. 8, No. 4, Dec.1976.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1984 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Soneru, M.D., Huen, W.H. (1984). The Introduction of Fault-Tolerance in a Hierarchical Operating System. In: Großpietsch, KE., Dal Cin, M. (eds) Fehlertolerierende Rechensysteme. Informatik-Fachberichte, vol 84. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-69698-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-69698-5_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-13348-3

  • Online ISBN: 978-3-642-69698-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics