Elsevier

Advances in Computers

Volume 31, 1990, Pages 175-233
Advances in Computers

Availability and Reliability Modeling for Computer Systems

https://doi.org/10.1016/S0065-2458(08)60154-0Get rights and content

Publisher Summary

Dependability calculates the capability of a product to deliver its intended level of service to the user, especially in light of failures or other incidents that impinge on its performance, and combines various underlying ideas, such as reliability, maintainability, availability, and user demand patterns, into a basic overall measure of quality, which customers use along with cost and performance to evaluate products. This chapter describes the computer system dependability analysis and its types, different classes of dependability measures, Markov and Markov reward models commonly involved for dependability analysis and their solution methods. The three classes of dependability measures are system availability measures, system reliability measures, and task completion measures. The chapter also describes four types of dependability analyses: evaluation, sensitivity analysis, specification determination, and tradeoff analysis. A model-based evaluation, or sometimes a hybrid approach based on a judicious combination of models and measurements, is used for cost-effective dependability analysis. The chapter discusses the determination of the parameters, such as failure rates, coverage probabilities, repair rates, and reward rates as well as model verification and validation. The chapter also demonstrates the use of these methods, a detailed dependability analysis on a full-system example representative of existing computer systems.

References (56)

  • A. Reibman et al.

    Numerical transient analysis of Markov models

    Computers and Operations Research

    (1988)
  • A. Reibman et al.

    Markov and Markov Reward Models: A Survey of Numerical Approaches

    European J. Operations Research

    (1989)
  • J. Arlat et al.

    Fault Injection for Dependability Validation of Fault-Tolerant Computing Systems

    Nineteenth Int. Symp. Fault-Tolerant Computing, Chicago

    (1989)
  • Y. Bard et al.

    Statistical Methods in Computer Performance Analysis

  • J. Bavuso Salvatore et al.

    Analysis of Typical Fault-Tolerant Architectures Using HARP

    IEEE Trans. Reliability

    (1987)
  • J. Blake et al.

    Reliability of Interconnection Networks Using Hierarchical Composition

    IEEE Trans. Reliability

    (1989)
  • J. Blake et al.

    Sensitivity Analysis of Reliability and Performability for Multiprocessor Systems

    Proc. 1988 ACM SIGMETRICS Conf. Santa Fe, New Mexico

    (1988)
  • A. Bobbio et al.

    An Aggregation Technique for the Transient Analysis of Stiff Markov Chains

    IEEE Trans. Computers

    (1986)
  • A. Bobbio et al.

    Computation of the Distribution of the Completion Time When the Work Requirement is a PH Random Variable

    Stochastic Models

    (1990)
  • M.A. Boyd et al.

    An Approach to Solving Large Reliability Models

    1988 IEEE/AIAA DASC Symp., San Diego

    (1988)
  • Chimento, P. F. (1988). System Performance in a Failure Prone Environment, Ph.D. thesis, Department of Computer...
  • G. Ciardo et al.

    Solution of Large GSPN Models

    Proc. First Int. Workshop on Numerical Solution of Markov Chains. Raleigh, NC

    (1990)
  • G. Ciardo et al.

    SPNP Stochastic Petri Net Package

    Proc. Third Int. Workshop Petri Nets and Performance Models PNPM

    (1989)
  • G. Ciardo et al.

    Performability Analysis Using Semi-Markov Reward Processes

    IEEE Trans. Computers

    (1990)
  • E. Cinlar

    Introduction to Stochastic Processes

    (1975)
  • A.W. Conway et al.

    Monte Carlo Simulation of Computer System Availability/Reliability Models

    Proc. Seventeenth Int. Symp. Fault-Tolerant Computing

    (1987)
  • D.R. Cox

    A Use of Complex Probabilities in the Theory of Stochastic Processes

    Proc. Camb. Phil. Soc

    (1955)
  • J.B. Dugan et al.

    Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems

    IEEE Trans. Computers

    (1989)
  • J.B. Dugan et al.

    The Hybrid Automated Reliability Predictor

    AIAA J. Guid., Control, and Dynamics

    (1986)
  • R. Geist et al.

    Ultra-High Reliability Prediction for Fault-Tolerant Computer Systems

    IEEE Trans. Computers

    (1983)
  • A. Goyal et al.

    Probabilistic Modeling of Computer System Availability

    Annals of Operations Research

    (1987)
  • A. Goyal et al.

    The System Availability Estimator

    Proc. Sixteenth Int. Symp. Fault-Tolerant Computing

    (1986)
  • J. Gray

    Why Do Computers Stop and What Can Be Done About It?

    Proc. Fifth Symp. Reliability in Distributed Software and Database Systems

    (1986)
  • D. Heimann

    VAXcluster-System Availability—Measurements and Analysis

    (1989)
  • D. Heimann

    A Markov Model for VAXcluster System Availability

    IEEE Trans. Reliability

    (1989)
  • R.A. Howard

    Dynamic Probabilistic Systems, Vol. II: Semi-Markov and Decision Processes

    (1971)
  • M.C. Hsueh et al.

    Performability Modeling Based on Real Data: A Case Study

    IEEE Trans. Computers

    (1988)
  • O. Ibe et al.

    Approximate Availability Analysis of VAXcluster Systems

    IEEE Trans. Reliability

    (1989)
  • Cited by (36)

    • Investigating dynamic reliability and availability through state-space models

      2012, Computers and Mathematics with Applications
      Citation Excerpt :

      With regards to analytic models, different types can be distinguished depending on the nature of their constitutive elements and solution techniques. The models that are considered in this paper are based on state–space methods due to their flexibility and power in capturing dependence conditions in the system [5–7]. The state–space approach is a very general approach and can handle more cases in dependability and performance modeling than any other analytic method [8].

    • Availability modeling of energy management systems

      1998, Microelectronics Reliability
    • A uniform approach to software and hardware fault tolerance

      1994, The Journal of Systems and Software
    • Reliability modelling for some computer systems

      1994, Microelectronics Reliability
    View all citing articles on Scopus
    View full text