Skip to main content

On Failures and Faults

  • Conference paper
  • First Online:
FME 2003: Formal Methods (FME 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2805))

Included in the following conference series:

Abstract.

Real computer-based systems fail, and hence are often far less dependable than their owners and users need and desire. Individuals, organisations and indeed the world at large are becoming more dependent on such systems, so there has been much work on trying to gain increased understanding of the many and varied types of faults that need to be prevented or tolerated in order to reduce the probability and severity of system failures. In this paper I analyze the concept of system faults and failures, and discuss the assumptions that are often made by computing system designers regarding faults, and a number of continuing research issues related to fault tolerance.

Much of this paper is based closely on some of the material in my BCS/IEE 1999 Turing Memorial Lecture [19]

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alexander, C.: Notes on the Synthesis of Form. Harvard University Press, Cambridge (1964)

    Google Scholar 

  2. Anderson, R.: How to Cheat at the Lottery (or, Massively Parallel Requirements Engineering). In: Proc. Computer Security Applications Conference, Phoenix, AZ (1999)

    Google Scholar 

  3. Avizienis, A., Laprie, J.C., Randell, B.: Fundamental Concepts of Dependability. In: Third IEEE Information Survivability Workshop, Cambridge, Mass. Software Engineering Institute, pp. 7–12. Carnegie-Mellon University, Pittsburg (2000)

    Google Scholar 

  4. Campbell, R.H., Randell, B.: Error Recovery in Asynchronous Systems. IEEE Trans. Software Engineering SE-12(8), 811–826

    Google Scholar 

  5. Caughey, S.J., Little, M.C., Shrivastava, S.K.: Checked Transactions in an Asynchronous Message Passing Environment. In: 1st IEEE International Symposium on Object-Oriented Real-time Distributed Computing, Kyoto, pp. 222–229 (1998)

    Google Scholar 

  6. Davies, C.T.: Data processing spheres of control. IBM Systems Journal 17(2), 179–198

    Google Scholar 

  7. Dobson, J.E., Randell, B.: Building Reliable Secure Systems out of Unreliable Insecure Components. In: Proc. Conf. on Security and Privacy, Oakland. IEEE, Los Alamitos (1986)

    Google Scholar 

  8. Gray, J., Reuter, A.: Transaction Processing: Concepts and techniques. Morgan Kaufmann, San Francisco (1993)

    MATH  Google Scholar 

  9. Horning, J.J., Lauer, H.C., Melliar-Smith, P.M., Randell, B.: A Program Structure for Error Detection and Recovery. In: Proc. Conf. on Operating Systems, Theoretical and Practical Aspects, 16th edn. Lecture Notes in Computer Science, IRIA, pp. 171–187. Springer, Heidelberg (1974)

    Google Scholar 

  10. Horning, J.J., Randell, B.: Process Structuring. ACM Computing Surveys 5(1), 5–30

    Google Scholar 

  11. Jones, C.B.: A Formal Basis for some Dependability Notions. In: Aichernig, B.K., Maibaum, T. (eds.) Formal Methods at the Crossroads: from Panacea to Foundational Support. Springer, Heidelberg (2003)

    Google Scholar 

  12. Laprie, J.C. (ed.): Dependability: Basic concepts and associated terminology. Springer, Heidelberg (1991)

    Google Scholar 

  13. Laprie, J.C. (ed.): Dependability: Basic concepts and terminology — in English, French, German, Italian and Japanese. Springer, Vienna (1992)

    MATH  Google Scholar 

  14. Laprie, J.C.: Dependable Computing: Concepts, Limits, Challenges. In: 25th IEEE International Symposium on Fault-Tolerant Computing - Special Issue, Pasadena, California, USA, pp. 42–54. IEEE, Los Alamitos (1995)

    Google Scholar 

  15. Littlewood, B., Miller, D.R.: Conceptual Modelling of Coincident Failures in Multi- Version Software. IEEE Trans. Software Engineering 15(12), 1596–1614

    Google Scholar 

  16. Lomet, D.B.: Process Structuring, Synchronization, and Recovery Using Atomic Actions. ACM SIGPLAN Notices 12(3), 128–137

    Google Scholar 

  17. Naur, P., Randell, B. (eds.): Software Engineering: Report of a conference sponsored by the NATO Science Committee, Garmisch, Germany, October 7-11. Scientific Affairs Division, NATO, Brussels (1969)

    Google Scholar 

  18. Neumann, P.: Computer Related Risks. Addison-Wesley, New York (1995)

    Google Scholar 

  19. Randell, B.: Facing up to Faults (Turing Memorial Lecture). Computer Journal 43(2), 95–106

    Google Scholar 

  20. Randell, B.: System Structure for Software Fault Tolerance. IEEE Trans. on Software Engineering SE-1(2), 220–232

    Google Scholar 

  21. Romanovsky, A., Xu, J., Randell, B.: Exception Handling in Object-Oriented Real- Time Distributed Systems. In: Proc. 1st IEEE International Symposium on Object-Oriented Real-time Distributed Computing (ISORC 1998), Kyoto, Japan, pp. 32–42 (1998)

    Google Scholar 

  22. von Neumann, J.: Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components. In: Shannon, C.E., McCarthy, J. (eds.) Automata Studies, pp. 43–98. Princeton University Press, Princeton (1956)

    Google Scholar 

  23. Xu, J., Randell, B., Romanovsky, A., Stroud, R.J., Wu, Z.: Fault Tolerance in Concurrent Object-Oriented Software through Coordinated Error Recovery. In: Proc. 25th Int. Symp. Fault-Tolerant Computing (FTCS-25), Los Angeles. IEEE Computer Society Press, Los Alamitos (1995)

    Google Scholar 

  24. Xu, J., Randell, B., Romanovsky, A., Stroud, R.J., Zorzo, A., Canver, E., Henke, F.v.: Developing Control Software for Production Cell II: Failure Analysis and System Design Using CA Actions. In: FTCS-29, Madison, USA. IEEE CS Press, Los Alamitos (1999)

    Google Scholar 

  25. Xu, J., Randell, B., Romanovsky, A., Stroud, R.J., Zorzo, A.F., Canver, E., Henke, F.v.: Rigorous Development of a Safety-Critical System Based on Coordinated Atomic Actions. In: Proc. 29th Int. Symp. Fault-Tolerant Computing (FTCS-29), Madison. IEEE Computer Society Press, Los Alamitos (1999)

    Google Scholar 

  26. Xu, J., Randell, B., Romanovsky, A., Stroud, R.J., Zorzo, A.F., Canver, E., Henke, F.v.: Rigorous development of an Embedded Fault-Tolerant System Based on Coordinated Atomic Actions. IEEE Trans. on Computers (Special Issue on Fault Tolerance) 51(2), 164–179

    Google Scholar 

  27. Xu, J., Romanovsky, A., Randell, B.: Co-ordinated Exception Handling in Distributed Object Systems: from Model to System Implementation. In: Proc. 18th IEEE International Conference on Distributed Computing Systems, Amsterdam, Netherlands, pp. 12–21 (1998)

    Google Scholar 

  28. Zorzo, A.F., Romanovsky, A., Xu, J., Randell, B., Stroud, R.J., Welch, I.S.: Using Coordinated Atomic Actions to Design Complex Safety-Critical Systems: The Production Cell Case Study. Software — Practice & Experience 29(8), 677–697

    Google Scholar 

  29. Zurcher, F.W., Randell, B.: Iterative Multi-Level modelling: A methodology for computer system design. In: Proc. IFIP Congress 1968, Edinburgh, pp. D138-D142 (1968)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Randell, B. (2003). On Failures and Faults. In: Araki, K., Gnesi, S., Mandrioli, D. (eds) FME 2003: Formal Methods. FME 2003. Lecture Notes in Computer Science, vol 2805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45236-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45236-2_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40828-4

  • Online ISBN: 978-3-540-45236-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics