Abstract.
Real computer-based systems fail, and hence are often far less dependable than their owners and users need and desire. Individuals, organisations and indeed the world at large are becoming more dependent on such systems, so there has been much work on trying to gain increased understanding of the many and varied types of faults that need to be prevented or tolerated in order to reduce the probability and severity of system failures. In this paper I analyze the concept of system faults and failures, and discuss the assumptions that are often made by computing system designers regarding faults, and a number of continuing research issues related to fault tolerance.
Much of this paper is based closely on some of the material in my BCS/IEE 1999 Turing Memorial Lecture [19]
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alexander, C.: Notes on the Synthesis of Form. Harvard University Press, Cambridge (1964)
Anderson, R.: How to Cheat at the Lottery (or, Massively Parallel Requirements Engineering). In: Proc. Computer Security Applications Conference, Phoenix, AZ (1999)
Avizienis, A., Laprie, J.C., Randell, B.: Fundamental Concepts of Dependability. In: Third IEEE Information Survivability Workshop, Cambridge, Mass. Software Engineering Institute, pp. 7–12. Carnegie-Mellon University, Pittsburg (2000)
Campbell, R.H., Randell, B.: Error Recovery in Asynchronous Systems. IEEE Trans. Software Engineering SE-12(8), 811–826
Caughey, S.J., Little, M.C., Shrivastava, S.K.: Checked Transactions in an Asynchronous Message Passing Environment. In: 1st IEEE International Symposium on Object-Oriented Real-time Distributed Computing, Kyoto, pp. 222–229 (1998)
Davies, C.T.: Data processing spheres of control. IBM Systems Journal 17(2), 179–198
Dobson, J.E., Randell, B.: Building Reliable Secure Systems out of Unreliable Insecure Components. In: Proc. Conf. on Security and Privacy, Oakland. IEEE, Los Alamitos (1986)
Gray, J., Reuter, A.: Transaction Processing: Concepts and techniques. Morgan Kaufmann, San Francisco (1993)
Horning, J.J., Lauer, H.C., Melliar-Smith, P.M., Randell, B.: A Program Structure for Error Detection and Recovery. In: Proc. Conf. on Operating Systems, Theoretical and Practical Aspects, 16th edn. Lecture Notes in Computer Science, IRIA, pp. 171–187. Springer, Heidelberg (1974)
Horning, J.J., Randell, B.: Process Structuring. ACM Computing Surveys 5(1), 5–30
Jones, C.B.: A Formal Basis for some Dependability Notions. In: Aichernig, B.K., Maibaum, T. (eds.) Formal Methods at the Crossroads: from Panacea to Foundational Support. Springer, Heidelberg (2003)
Laprie, J.C. (ed.): Dependability: Basic concepts and associated terminology. Springer, Heidelberg (1991)
Laprie, J.C. (ed.): Dependability: Basic concepts and terminology — in English, French, German, Italian and Japanese. Springer, Vienna (1992)
Laprie, J.C.: Dependable Computing: Concepts, Limits, Challenges. In: 25th IEEE International Symposium on Fault-Tolerant Computing - Special Issue, Pasadena, California, USA, pp. 42–54. IEEE, Los Alamitos (1995)
Littlewood, B., Miller, D.R.: Conceptual Modelling of Coincident Failures in Multi- Version Software. IEEE Trans. Software Engineering 15(12), 1596–1614
Lomet, D.B.: Process Structuring, Synchronization, and Recovery Using Atomic Actions. ACM SIGPLAN Notices 12(3), 128–137
Naur, P., Randell, B. (eds.): Software Engineering: Report of a conference sponsored by the NATO Science Committee, Garmisch, Germany, October 7-11. Scientific Affairs Division, NATO, Brussels (1969)
Neumann, P.: Computer Related Risks. Addison-Wesley, New York (1995)
Randell, B.: Facing up to Faults (Turing Memorial Lecture). Computer Journal 43(2), 95–106
Randell, B.: System Structure for Software Fault Tolerance. IEEE Trans. on Software Engineering SE-1(2), 220–232
Romanovsky, A., Xu, J., Randell, B.: Exception Handling in Object-Oriented Real- Time Distributed Systems. In: Proc. 1st IEEE International Symposium on Object-Oriented Real-time Distributed Computing (ISORC 1998), Kyoto, Japan, pp. 32–42 (1998)
von Neumann, J.: Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components. In: Shannon, C.E., McCarthy, J. (eds.) Automata Studies, pp. 43–98. Princeton University Press, Princeton (1956)
Xu, J., Randell, B., Romanovsky, A., Stroud, R.J., Wu, Z.: Fault Tolerance in Concurrent Object-Oriented Software through Coordinated Error Recovery. In: Proc. 25th Int. Symp. Fault-Tolerant Computing (FTCS-25), Los Angeles. IEEE Computer Society Press, Los Alamitos (1995)
Xu, J., Randell, B., Romanovsky, A., Stroud, R.J., Zorzo, A., Canver, E., Henke, F.v.: Developing Control Software for Production Cell II: Failure Analysis and System Design Using CA Actions. In: FTCS-29, Madison, USA. IEEE CS Press, Los Alamitos (1999)
Xu, J., Randell, B., Romanovsky, A., Stroud, R.J., Zorzo, A.F., Canver, E., Henke, F.v.: Rigorous Development of a Safety-Critical System Based on Coordinated Atomic Actions. In: Proc. 29th Int. Symp. Fault-Tolerant Computing (FTCS-29), Madison. IEEE Computer Society Press, Los Alamitos (1999)
Xu, J., Randell, B., Romanovsky, A., Stroud, R.J., Zorzo, A.F., Canver, E., Henke, F.v.: Rigorous development of an Embedded Fault-Tolerant System Based on Coordinated Atomic Actions. IEEE Trans. on Computers (Special Issue on Fault Tolerance) 51(2), 164–179
Xu, J., Romanovsky, A., Randell, B.: Co-ordinated Exception Handling in Distributed Object Systems: from Model to System Implementation. In: Proc. 18th IEEE International Conference on Distributed Computing Systems, Amsterdam, Netherlands, pp. 12–21 (1998)
Zorzo, A.F., Romanovsky, A., Xu, J., Randell, B., Stroud, R.J., Welch, I.S.: Using Coordinated Atomic Actions to Design Complex Safety-Critical Systems: The Production Cell Case Study. Software — Practice & Experience 29(8), 677–697
Zurcher, F.W., Randell, B.: Iterative Multi-Level modelling: A methodology for computer system design. In: Proc. IFIP Congress 1968, Edinburgh, pp. D138-D142 (1968)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Randell, B. (2003). On Failures and Faults. In: Araki, K., Gnesi, S., Mandrioli, D. (eds) FME 2003: Formal Methods. FME 2003. Lecture Notes in Computer Science, vol 2805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45236-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-45236-2_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40828-4
Online ISBN: 978-3-540-45236-2
eBook Packages: Springer Book Archive