Abstract
Software fault tolerance has primarily been aimed at increasing total software reliability. Unfortunately, it is impossible to provide general techniques that tolerate all faults with a very high confidence rate. This paper presents some of the available experimental evidence. However, in some situations a more limited fault tolerance may be all that is needed, i.e., the program must be able to prevent unsafe states (but not necessarily all incorrect states) or detect them and recover to a safe (but not necessarily correct) state. This approach is application-specific; the particular fault-tolerance facilities are designed specifically for the particular application. This paper briefly describes how this can be accomplished. Although more specific analysis of the problem is required for this approach than the more general ones, it provides the advantage of partial verification of the adequacy of the fault tolerance used (e.g., it is possible to show that certain hazardous states cannot be caused by software faults) and therefore will aid in certifying and licensing software that can potentially have catastrophic consequences. That is, the approach provides greater confidence about a more limited goal than more general approaches. These techniques can also be used to tailor more general fault-tolerance techniques, such as recovery blocks, and to aid in writing acceptance tests that will ensure safety. Even with the use of these techniques, systems with very low acceptable risk may not be able to be built using software components.
The work reported in this paper was partially supported by Micros grants funded by the University of California, TRW, and Hughes Aircraft Co., by NASA grants NAG-1-511 and NAG-1-668, and by NSF grants DCR-8406532 and DCR-8521398.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anderson, T., Barrett, P.A., Halliwell, D.N., and Moulding, M.R. “An evaluation of software fault tolerance in a practical system,” 15th Int. Symposium on Fault Tolerant Computing (FTCS-15), June 1985, pp. 140-145.
Andrews, D.M. and Benson, J.T. “An automated program testing methodology and its implementation,” Proc. 5th Int. Conference on Software Engineering, San Diego, CA, March 1981.
Brilliant, S.S, Knight, J.C., and Leveson, N.G. “Analysis of faults in an n-version software experiment,” submitted for publication, 1986.
Brilliant, S.S., Knight, J.C., and Leveson, N.G. “The consistent comparison problem in n-version software,” ACM SIGSOFT Software Engineering Notes, vol. 12, no. 1, January 1987).
Cha, S., Leveson, N.G., Shimeall, T.J., and Knight, J.C. “An empirical study of software error detection using self-checks,” 17th Int. Symposium on Fault Tolerant Computing, Pittburgh, July 1987.
Chen, L. and Avizienis, A. “N-version programming: A fault-tolerance approach to reliability of software operation,” 8th Int. Symposium on Fault Tolerant Computing, Toulouse, France, June 1978, pp. 3-9.
Joyce, E. “Software bugs: A matter of life and liability,” Datamation, vol. 33, no. 10, 15 May 1987, pp. 88–92.
Kit, E. “State-of-the-art C Compiler Testing,” Tandem Systems Review, vol. 2, no. 2, June 1986, pp. 73–78.
Knight, J.C. and Leveson, N.G. “An experimental evaluation of the assumption of independence in multiversion programming,” IEEE Trans. on Software Engineering, vol. SE-12, no. 1, January 1986, pp. 96–109.
Knight, J.C. and Leveson, N.G. “An empirical study of failure probabilities in multiversion software,” Proc. 16th Int. Symposium on Fault Tolerant Computing (FTCS-16), Vienna, Austria, July 1986, pp. 165-170.
Leveson, N.G. “Software safety: Why, what, and how,” ACM Computing Surveys, vol. 18, no. 2, June 1986, pp. 125–163.
Leveson, N.G. and Harvey, P.R. “Analyzing software safety,” IEEE Trans. on Software Engineering, vol. SE-9, no. 5, September 1983, pp. 569–579.
Leveson, N.G. and Stolzy, J.L. “Safety analysis using petri nets,” IEEE Trans. on Software Engineering, vol. SE-13, no. 3, March 1987, pp. 386–397.
Randell, B. “System structure for software fault tolerance,” IEEE Trans. on Software Engineering, vol. SE-1, pp. 220–232, June 1975.
Scott, R.K., Gault, J.W., McAllister, D.F. “Fault-tolerant software reliability modeling,” IEEE Trans. on Software Engineering, vol. SE-13, no.5, May 1987, pp. 582–592.
Stucki, L.G. “New directions in automated tools for improving software quality,” Current Trends in Programming Methodology (Volume II: Program Validation), Prentice-Hall, 1977.
Thompson, K. “Reflections on trusting trust,” Communications of the ACM, vol. 27, no. 8, August 1984, pp. 761–763.
Vesely, W.E., Goldberg, F.F., Roberts, N.H., and Haasl, D.F. Fault Tree Handbook, NUREG-0492, U.S. Nuclear Regulatory Commission, January 1981.
Yount, L.J., Lievel, K.A., and Hill. B.H. “Fault effect protection and partitioning for fly-by-wire/fly-by-light avionics systems,” AIAA Computers in Aerospace V Conference, Long Beach, CA, October 1985, pp.275-284.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1987 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leveson, N.G. (1987). Software Fault Tolerance in Safety-Critical Applications. In: Belli, F., Görke, W. (eds) Fehlertolerierende Rechensysteme / Fault-Tolerant Computing Systems. Informatik-Fachberichte, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45628-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-45628-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-18294-8
Online ISBN: 978-3-642-45628-2
eBook Packages: Springer Book Archive