Skip to main content
Log in

The reliability of life-critical computer systems

  • Published:
Acta Informatica Aims and scope Submit manuscript

Summary

In order to aid the designers of life-critical, fault-tolerant computing systems, accurate and efficient methods for reliability prediction are needed. The accuracy requirement implies the need to model the system in great detail, and hence the need to address the problems of large state space, non-exponential distributions, and error analysis. The efficiency requirement implies the need for new model solution techniques, in particular the use of decomposition/aggregation in the context of a hybrid model. We describe a model for reliability prediction which meets both requirements. Specifically, our model is partitioned into fault occurrence and fault/error handling submodels, which are represented by non-homogeneous Markov processes and extended stochastic Petri nets, respectively. The overall aggregated model is a stochastic process that is solved by numerical techniques. Methods to analyze the effects of variations in input parameters on the resulting reliability predictions are also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agerwala, T., Flynn, M.: Comments on Capabilities, Limitations and “Correctness” of Petri Nets, Gainesville, FL, pp. 81–86. Proc. First Ann. ACM Symp. Comp. Arch. 1973

  2. Bernhard, R.: The “No Downtime” Computer. IEEE Spectrum 17, 33–37 (1980)

    Google Scholar 

  3. Conn, R., Merryman, P., Whitelaw, P.: CAST — A Complementary Analytic-Simulative Technique for Modeling Complex Fault-Tolerant Computer Systems, pp. 6.1–6.27. Proc. AIAA-/NASA/IEEE/ACM Computers in Aerospace Conf., Los Angeles 1977

  4. Costes, A., Doucet, J., Landrault, C., Laprie, J.: SURF — A Program for Dependability Evaluation of Complex Fault-Tolerant Computing Systems, pp. 72–78. Proc. 11th IEEE FaultTolerant Computing Symp., Portland, ME 1981

  5. Dugan, J.B.: Extended Stochastic Petri Nets: Applications and Analysis. Dept. Elect. Eng., Duke University, Ph.D. Diss. 1984

  6. Dugan, J.B., Trivedi, K., Geist, R., Nicola, V.: Extended Stochastic Petri Nets: Applications and Analysis. Proc. 10th Intl. Symp. Comput. Perf. (PERFORMANCE '84), Paris, pp. 507–519. Amsterdam: North-Holland 1984

    Google Scholar 

  7. Forsythe, G., Malcolm, M., Moler, C.: Computer Methods for Mathematical Computations. Englewood Cliffs, NJ: Prentice-Hall 1977

    Google Scholar 

  8. Frank, P.: Introduction to System Sensitivity. New York: Academic Press 1978

    Google Scholar 

  9. Froberg, C.-E.: Introduction to Numerical Analysis. Reading, MA: Addison-Wesley 1969

    Google Scholar 

  10. Geist, R., Trivedi, K.: Ultra-High Reliability Prediction for Fault-Tolerant Computer Systems. IEEE Trans. Comput. 32, 1118–1127 (1983)

    Google Scholar 

  11. Geist, R., Trivedi, K., Dugan, J.B., Smotherman, M.: Design of the Hybrid Automated Reliability Predictor, pp. 16.5.1–16.5.8. Proc. 5th IEEE/AIAA Digital Avionics Systems Conf., Seattle, WA, 1983

  12. Geist, R., Trivedi, K., Dugan, J.B., Smotherman, M.: Modeling Imperfect Coverage in FaultTolerant Systems, pp. 77–82. Proc. 14th IEEE Intl. Symp. Fault-Tolerant Computing, Orlando, FL 1984

  13. Gnedenko, B., Belyayev, Y., Solovyev, A.: Mathematical Methods of Reliability Theory. New York: Academic Press 1969

    Google Scholar 

  14. Hildebrand, F.: Introduction to Numerical Analysis. New York, NY: McGraw-Hill 1956

    Google Scholar 

  15. Hille, E.: Lectures on Ordinary Differential Equations. Reading, MA: Addison-Wesley 1969

    Google Scholar 

  16. Hopkins, A., Smith, T., Lala, J.: FTMP — A Highly Reliable Fault-Tolerant Multiprocessor for Aircraft. Proc. IEEE 66, 1221–1239 (1978)

    Google Scholar 

  17. Iyer, R.: Reliability Evaluation of Fault-Tolerant Systems — Effect of Variability in Failure Rates. IEEE Trans. Comput. 33, 197–200 (1984)

    Google Scholar 

  18. Laprie, J.: Trustable Evaluation of Computer System Dependability. In: Mathematical Comp. Perf. and Reliability (G. Iazeolla, P. Courtois, A. Hordijk, eds.), pp. 341–360. Amsterdam: North-Holland 1984

    Google Scholar 

  19. Macam, S., Avizienis, A.: ARIES 81: A Reliability and Life-Cycle Evaluation Tool for FaultTolerant Systems, pp. 267–274. Proc. 12th IEEE Symp. Fault-Tolerant Computing, Los Angeles 1982

  20. Marsan, M., Conte, G., Balbo, G.: A Class of Generalized Stochastic Petri Nets for the Performance Evaluation of Multiprocessor Systems. ACM Trans. Comput. Syst. 2, 93–122 (1984)

    Google Scholar 

  21. McGough, J., Smotherman, M., Trivedi, K.: The Conservativeness of Reliability Estimates Based on Instantaneous Coverage. IEEE Trans. Comput. 34, 602–609 (1985)

    Google Scholar 

  22. Siewiorek, D., Swarz, R.: The Theory and Practice of Reliable System Design. Bedford, MA: Digital Press 1982

    Google Scholar 

  23. Smotherman, M., Geist, R., Trivedi, K.: Provably Conservative Approximations to Complex Reliability Models. IEEE Trans. Comput. 35, 333–338 (1986)

    Google Scholar 

  24. Stiffler, J., Bryant, L.: CARE III Phase III Report — Mathematical Description. NASA Langley Res. Ctr., Langley, VA, Contractor Report 3566, 1982

    Google Scholar 

  25. Trivedi, K., Gault, J., Clary, J.: A Validation Prototype of System Reliability in Life Critical Applications, pp. 79–86. Proc. Pathways to System Integrity Symp., National Bureau of Standards, Washington, DC 1980

    Google Scholar 

  26. Trivedi, K., Geist, R., Smotherman, M., Dugan, J.B.: Hybrid Reliability Modeling of Fault-Tolerant Computer Systems. Comput. Electr. Eng. 11, 87–108 (1985)

    Google Scholar 

  27. Trivedi, K., Geist, R.: Decomposition in Reliability Analysis of Fault-Tolerant Systems. IEEE Trans. Reliab. 32, 463–468 (1983)

    Google Scholar 

  28. Trivedi, K.: Reliability Evaluation for Fault-Tolerant Systems. In: Mathematical Comp. Perf. and Reliability (G. Iazeolla, P. Courtois, A. Hordijk, eds.), pp. 403–414. Amsterdam: NorthHolland 1984

    Google Scholar 

  29. Trivedi, K.: Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Englewood Cliffs, NJ: Prentice-Hall 1982

    Google Scholar 

  30. Wensley, J., Lamport, L., Goldberg, J., Green, M., Levitt, K., Melliar-Smith, P., Shostak, R., Weinstock, C.: SIFT: The Design and Analysis of a Fault-Tolerant Computer for Aircraft Control. Proc. IEEE 66, 1240–1255 (1978)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work was supported in part by NASA grant NAG 1-70 and by an equipment grant from the Concurrent Computer Corp

Rights and permissions

Reprints and permissions

About this article

Cite this article

Geist, R., Smotherman, M., Trivedi, K. et al. The reliability of life-critical computer systems. Acta Informatica 23, 621–642 (1986). https://doi.org/10.1007/BF00264310

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00264310

Keywords