Summary
In order to aid the designers of life-critical, fault-tolerant computing systems, accurate and efficient methods for reliability prediction are needed. The accuracy requirement implies the need to model the system in great detail, and hence the need to address the problems of large state space, non-exponential distributions, and error analysis. The efficiency requirement implies the need for new model solution techniques, in particular the use of decomposition/aggregation in the context of a hybrid model. We describe a model for reliability prediction which meets both requirements. Specifically, our model is partitioned into fault occurrence and fault/error handling submodels, which are represented by non-homogeneous Markov processes and extended stochastic Petri nets, respectively. The overall aggregated model is a stochastic process that is solved by numerical techniques. Methods to analyze the effects of variations in input parameters on the resulting reliability predictions are also provided.
Similar content being viewed by others
References
Agerwala, T., Flynn, M.: Comments on Capabilities, Limitations and “Correctness” of Petri Nets, Gainesville, FL, pp. 81–86. Proc. First Ann. ACM Symp. Comp. Arch. 1973
Bernhard, R.: The “No Downtime” Computer. IEEE Spectrum 17, 33–37 (1980)
Conn, R., Merryman, P., Whitelaw, P.: CAST — A Complementary Analytic-Simulative Technique for Modeling Complex Fault-Tolerant Computer Systems, pp. 6.1–6.27. Proc. AIAA-/NASA/IEEE/ACM Computers in Aerospace Conf., Los Angeles 1977
Costes, A., Doucet, J., Landrault, C., Laprie, J.: SURF — A Program for Dependability Evaluation of Complex Fault-Tolerant Computing Systems, pp. 72–78. Proc. 11th IEEE FaultTolerant Computing Symp., Portland, ME 1981
Dugan, J.B.: Extended Stochastic Petri Nets: Applications and Analysis. Dept. Elect. Eng., Duke University, Ph.D. Diss. 1984
Dugan, J.B., Trivedi, K., Geist, R., Nicola, V.: Extended Stochastic Petri Nets: Applications and Analysis. Proc. 10th Intl. Symp. Comput. Perf. (PERFORMANCE '84), Paris, pp. 507–519. Amsterdam: North-Holland 1984
Forsythe, G., Malcolm, M., Moler, C.: Computer Methods for Mathematical Computations. Englewood Cliffs, NJ: Prentice-Hall 1977
Frank, P.: Introduction to System Sensitivity. New York: Academic Press 1978
Froberg, C.-E.: Introduction to Numerical Analysis. Reading, MA: Addison-Wesley 1969
Geist, R., Trivedi, K.: Ultra-High Reliability Prediction for Fault-Tolerant Computer Systems. IEEE Trans. Comput. 32, 1118–1127 (1983)
Geist, R., Trivedi, K., Dugan, J.B., Smotherman, M.: Design of the Hybrid Automated Reliability Predictor, pp. 16.5.1–16.5.8. Proc. 5th IEEE/AIAA Digital Avionics Systems Conf., Seattle, WA, 1983
Geist, R., Trivedi, K., Dugan, J.B., Smotherman, M.: Modeling Imperfect Coverage in FaultTolerant Systems, pp. 77–82. Proc. 14th IEEE Intl. Symp. Fault-Tolerant Computing, Orlando, FL 1984
Gnedenko, B., Belyayev, Y., Solovyev, A.: Mathematical Methods of Reliability Theory. New York: Academic Press 1969
Hildebrand, F.: Introduction to Numerical Analysis. New York, NY: McGraw-Hill 1956
Hille, E.: Lectures on Ordinary Differential Equations. Reading, MA: Addison-Wesley 1969
Hopkins, A., Smith, T., Lala, J.: FTMP — A Highly Reliable Fault-Tolerant Multiprocessor for Aircraft. Proc. IEEE 66, 1221–1239 (1978)
Iyer, R.: Reliability Evaluation of Fault-Tolerant Systems — Effect of Variability in Failure Rates. IEEE Trans. Comput. 33, 197–200 (1984)
Laprie, J.: Trustable Evaluation of Computer System Dependability. In: Mathematical Comp. Perf. and Reliability (G. Iazeolla, P. Courtois, A. Hordijk, eds.), pp. 341–360. Amsterdam: North-Holland 1984
Macam, S., Avizienis, A.: ARIES 81: A Reliability and Life-Cycle Evaluation Tool for FaultTolerant Systems, pp. 267–274. Proc. 12th IEEE Symp. Fault-Tolerant Computing, Los Angeles 1982
Marsan, M., Conte, G., Balbo, G.: A Class of Generalized Stochastic Petri Nets for the Performance Evaluation of Multiprocessor Systems. ACM Trans. Comput. Syst. 2, 93–122 (1984)
McGough, J., Smotherman, M., Trivedi, K.: The Conservativeness of Reliability Estimates Based on Instantaneous Coverage. IEEE Trans. Comput. 34, 602–609 (1985)
Siewiorek, D., Swarz, R.: The Theory and Practice of Reliable System Design. Bedford, MA: Digital Press 1982
Smotherman, M., Geist, R., Trivedi, K.: Provably Conservative Approximations to Complex Reliability Models. IEEE Trans. Comput. 35, 333–338 (1986)
Stiffler, J., Bryant, L.: CARE III Phase III Report — Mathematical Description. NASA Langley Res. Ctr., Langley, VA, Contractor Report 3566, 1982
Trivedi, K., Gault, J., Clary, J.: A Validation Prototype of System Reliability in Life Critical Applications, pp. 79–86. Proc. Pathways to System Integrity Symp., National Bureau of Standards, Washington, DC 1980
Trivedi, K., Geist, R., Smotherman, M., Dugan, J.B.: Hybrid Reliability Modeling of Fault-Tolerant Computer Systems. Comput. Electr. Eng. 11, 87–108 (1985)
Trivedi, K., Geist, R.: Decomposition in Reliability Analysis of Fault-Tolerant Systems. IEEE Trans. Reliab. 32, 463–468 (1983)
Trivedi, K.: Reliability Evaluation for Fault-Tolerant Systems. In: Mathematical Comp. Perf. and Reliability (G. Iazeolla, P. Courtois, A. Hordijk, eds.), pp. 403–414. Amsterdam: NorthHolland 1984
Trivedi, K.: Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Englewood Cliffs, NJ: Prentice-Hall 1982
Wensley, J., Lamport, L., Goldberg, J., Green, M., Levitt, K., Melliar-Smith, P., Shostak, R., Weinstock, C.: SIFT: The Design and Analysis of a Fault-Tolerant Computer for Aircraft Control. Proc. IEEE 66, 1240–1255 (1978)
Author information
Authors and Affiliations
Additional information
This work was supported in part by NASA grant NAG 1-70 and by an equipment grant from the Concurrent Computer Corp
Rights and permissions
About this article
Cite this article
Geist, R., Smotherman, M., Trivedi, K. et al. The reliability of life-critical computer systems. Acta Informatica 23, 621–642 (1986). https://doi.org/10.1007/BF00264310
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF00264310