Reliability of non-repairable phased-mission systems with propagated failures

https://doi.org/10.1016/j.ress.2013.06.005Get rights and content

Highlights

  • Reliability of non-repairable phased-mission systems with propagated failures is analyzed.

  • The proposed method considers time-varying, phase-dependent failure rates and associated cumulative damage effects for the system elements.

  • The proposed method is recursive and can be fully automated.

Abstract

In this paper, we propose a recursive and exact method for reliability evaluation of phased-mission systems with failures originating from some system elements that can propagate causing the common cause failures of groups of elements. The system consists of multiple, consecutive, non-overlapping phases of operation, and non-identical binary non-repairable elements that can fail individually or due to propagated failures originating from other elements. The overall system can have binary states corresponding to the mission success and failure or multiple states characterized by different levels of system performance. Based on conditional probabilities as well as the branch and bound principle, the proposed method takes into account dynamic changes in system structure and demand across different phases. It also considers time-varying, phase-dependent failure rates and associated cumulative damage effects for the system elements. The main advantage of this method is that it does not require composition of decision diagrams and can be fully automated. Both an analytical example and a numerical example are analyzed to illustrate the proposed method.

Introduction

In many practical applications such as aerospace, nuclear power, airborne weapon systems, and distributed computing systems, the system mission involves multiple, consecutive and non-overlapping phases of operation [1], [2], [3], [4], [5]. During each phase, the system has to accomplish a specified task and may be subject to different stresses and environmental conditions as well as different reliability requirements [1]. Thus, system configuration, success criteria, and component failure behavior may change from phase to phase. For a particular example, a Mars orbiter mission system involves launch, cruise, Mars orbit insertion, commissioning, and orbit phases that must be accomplished in sequence [2]. Those systems are referred to as phased-mission systems (PMSs).

The dynamics in system configuration and component behavior typically requires a distinct model for each phase of the PMS, which poses unique challenges to existing reliability analysis methods. Further complicating the analysis of PMS is statistical dependence across different phases for a given component. For example, the state of a component at the beginning of a phase should be identical to its state at the end of the previous phase in a non-repairable PMS [6]. Various approaches have been developed for the reliability analysis of PMS. They can be divided into two classes: analytical modeling and simulations [7], [8]. The analytical modeling approaches can be further classified into three categories: combinatorial methods (e.g., binary decision diagrams) [6], [9], [10], [11], state-space oriented approaches based on Markov chains or/and Petri nets [12], [13], [14], [15], and a phase modular approach that combines the former two methods as appropriate [16], [17]. A state-of-the-art review of reliability modeling and analysis techniques for PMS is provided in Ref. [18]. However, even with advances in computing technology, only small-scale PMS problems can be solved accurately due to high computational complexity of the existing methods.

Recently Ref. [19] proposed an efficient and exact method for the reliability evaluation of non-repairable PMS with the arbitrary system structure and non-identical binary-state elements. In this paper we extend the method in Ref. [19] to the case of PMS with propagated failures assuming that failures originating from some system elements can propagate causing the common cause failures (CCF) of groups of elements.

Note that besides the internal cause considered in this work (in particular, propagated failures originating from some elements within the system), CCF can also be caused by external factors (e.g., sudden changes in environment, power-supply disturbances, and design mistakes). Many studies have shown that both types of CCF tend to increase the joint system failure probabilities and thus contribute significantly to the overall system unreliability [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31]. Therefore, it is significant to consider the effect of CCF for the accurate reliability analysis of systems with CCF. However, the existing works have various limitations, such as being concerned with a specific system structure [32], [33], [34]; being applicable to systems with exponential time-to-failure distributions [24], [25], [35]; being subject to combinatorial explosion as the system redundancy increases [36], [37]; having a single common cause that affects all the system components [33], [38]. In addition, most of the existing works on CCF are dedicated to single-phase systems; few of them consider CCF for PMS [2], [39], [40] and they focus on external causes. Particularly, implicit methods were developed in Refs. [2], [39], which consider CCF in the process of binary decision diagram based analysis rather than in the modeling stage; the explicit method was applied in Ref. [40], which models CCF as shared basic events in the system fault tree model, and then applies the minimal cut sets based inclusion–exclusion method with deletion for analysis. As compared to the implicit method, the explicit method may involve adding a large number of CCF basic events into the system model, which is tedious to handle especially for high level redundancy [27].

In this paper we propose a recursive method for the reliability analysis of PMS subject to CCF caused by internal causes, i.e., propagated failures. In particular, we consider the case when the system contains several disjoint groups of elements, referred to as common cause groups (CCGs), and the failure of any element in a CCG can cause with some probability the failure of the entire group. The conditional probability of the failure propagation given the element fails can be specific for any element and phase.

The remainder of the paper is organized as follows. Section 2 describes the binary and multi-state systems and defines the system mission success probability based on an acceptability function in each phase. Section 3 presents the assumptions. Section 4 summarizes the method for evaluating the conditional reliability of an element and a CCG at a particular phase given that it is working at the beginning of the phase. Section 5 describes the proposed recursive PMS reliability evaluation algorithm. Section 6 illustrates the proposed algorithm using both an analytical example and a realistic size numerical example. Section 7 presents conclusion and future work.

Section snippets

Mission success for binary systems and multi-state systems with binary elements

This paper considers two types of PMS consisting of binary elements: binary systems and multi-state systems. The binary system can have only two states (corresponding to success and failure of the system mission); the system mission fails if the system fails at any phase. The multi-state system can have multiple states characterized by different levels of system performance and corresponding to different combinations of states of its elements; the system mission fails if its performance takes

Assumptions

The proposed method is based on the following assumptions:

  • The system mission consists of H consecutive and non-overlapping phases.

  • The system has n independent binary-state elements.

  • Neither the system nor its elements are repairable during the mission.

  • The elements can have phase-dependent and time-varying failure rates.

  • Baseline failure time distribution of individual element is the same during all phases.

  • The phase time does not depend on system state.

  • The system structure can change with the

Conditional unreliabilities of system elements and CCGs

This section describes the method for evaluating the conditional reliability and unreliability of element j at phase h given that it is working at the beginning of the phase, respectively represented by pj(h) and qj(h). The phase-dependent stress on the failure properties of the elements is considered using the concept of equivalent age associated with the cumulative exposure model (CEM) [44]. Let Fj(h,t) be the stress dependent failure distribution of element j in phase h. When the life–stress

Generating combinations of element failures

Consider a random vector Xh=(x1(h),…, xn(h)) representing the system state at the end of phase h and assume that a realization Y=(y1(h),…, yn(h)) of this vector consists of s zeros (s out of n elements fail before the end of phase h). The zero elements have position numbers c(i) (1≤is). Let δ(Y) be the set of position numbers of zero elements in vector Y: δ(Y)={c(1), c(2), …, c(s)}⊂Δ. Since xj(h) are non-increasing functions of h, any yj(h)=1 in Y implies that xj(m)=1 for m=1, …, h−1. Thus,

Analytical example

Consider a non-repairable system consisting of three elements. The system has to perform its task during three consecutive phases. Assume that φ1(X)=x1(x2+x3); φ2(X)=x2x3; φ3(X)=x2. The realizations of system state vector Xh, corresponding to the working system state by the end of phase h are (111), (110) and (101) for h=1; (011) and (111) for h=2 and (111), (110), (011) and (010) for h=3. The system contains one CCG ω1={1,3}. The failure propagation probabilities for element 1 and 3 are ε1(h)

Conclusion and future work

Propagated failures are one type of common-cause failures that involve simultaneous failure of multiple system elements due to a failure originating from some internal system element. In this paper, a recursive method based on conditional probabilities as well as the branch and bound principle was proposed for the reliability analysis of PMS subject to propagated failures. The proposed method has no limitation on the type of element time-to-failure distributions.

As one direction of our future

References (49)

  • L. Xing

    Reliability evaluation of phased-mission systems with imperfect fault coverage and common-cause failures

    IEEE Transactions on Reliability

    (2007)
  • A. Pedar et al.

    Phased-mission analysis for evaluating the effectiveness of aerospace computing-systems

    IEEE Transactions on Reliability

    (1981)
  • H.S. Winokur et al.

    Analysis of mission-oriented systems

    IEEE Transactions on Reliability

    (1969)
  • J.L. Bricker

    A unified method for analyzing mission reliability for fault tolerant computer systems

    IEEE Transactions on Reliability

    (1973)
  • X. Zang et al.

    A BDD-based algorithm for reliability analysis of phased-mission systems

    IEEE Transactions on Reliability

    (1999)
  • R.E. Altschul et al.

    The efficient simulation of phased fault trees

    Proceedings of the Annual Reliability and Maintainability Symposium

    (1987)
  • F.A. Tillman et al.

    Simulation model of mission effectiveness for military systems

    IEEE Transactions on Reliability

    (1978)
  • L. Xing et al.

    Analysis of generalized phased mission system reliability, performance and sensitivity

    IEEE Transactions on Reliability

    (2002)
  • J.D. Esary et al.

    Reliability analysis of phased missions

  • Somani AK, Trivedi KS. Boolean algebraic methods for phased-mission system analysis, NASA Langley Research Center,...
  • A. Bondavalli et al.

    Dependability modeling and evaluation of multiple-phased systems using DEEM

    IEEE Transactions on Reliability

    (2004)
  • J.B. Dugan

    Automated analysis of phased-mission reliability

    IEEE Transactions on Reliability

    (1991)
  • I. Mura et al.

    Markov regenerative stochastic Petri nets to model and evaluate phased mission systems dependability

    IEEE Transactions on Computers

    (2001)
  • M.K. Smotherman et al.

    A non-homogeneous Markov model for phased-mission reliability analysis

    IEEE Transactions on Reliability

    (1989)
  • Cited by (64)

    View all citing articles on Scopus
    View full text