Reliability of non-repairable phased-mission systems with propagated failures
Introduction
In many practical applications such as aerospace, nuclear power, airborne weapon systems, and distributed computing systems, the system mission involves multiple, consecutive and non-overlapping phases of operation [1], [2], [3], [4], [5]. During each phase, the system has to accomplish a specified task and may be subject to different stresses and environmental conditions as well as different reliability requirements [1]. Thus, system configuration, success criteria, and component failure behavior may change from phase to phase. For a particular example, a Mars orbiter mission system involves launch, cruise, Mars orbit insertion, commissioning, and orbit phases that must be accomplished in sequence [2]. Those systems are referred to as phased-mission systems (PMSs).
The dynamics in system configuration and component behavior typically requires a distinct model for each phase of the PMS, which poses unique challenges to existing reliability analysis methods. Further complicating the analysis of PMS is statistical dependence across different phases for a given component. For example, the state of a component at the beginning of a phase should be identical to its state at the end of the previous phase in a non-repairable PMS [6]. Various approaches have been developed for the reliability analysis of PMS. They can be divided into two classes: analytical modeling and simulations [7], [8]. The analytical modeling approaches can be further classified into three categories: combinatorial methods (e.g., binary decision diagrams) [6], [9], [10], [11], state-space oriented approaches based on Markov chains or/and Petri nets [12], [13], [14], [15], and a phase modular approach that combines the former two methods as appropriate [16], [17]. A state-of-the-art review of reliability modeling and analysis techniques for PMS is provided in Ref. [18]. However, even with advances in computing technology, only small-scale PMS problems can be solved accurately due to high computational complexity of the existing methods.
Recently Ref. [19] proposed an efficient and exact method for the reliability evaluation of non-repairable PMS with the arbitrary system structure and non-identical binary-state elements. In this paper we extend the method in Ref. [19] to the case of PMS with propagated failures assuming that failures originating from some system elements can propagate causing the common cause failures (CCF) of groups of elements.
Note that besides the internal cause considered in this work (in particular, propagated failures originating from some elements within the system), CCF can also be caused by external factors (e.g., sudden changes in environment, power-supply disturbances, and design mistakes). Many studies have shown that both types of CCF tend to increase the joint system failure probabilities and thus contribute significantly to the overall system unreliability [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31]. Therefore, it is significant to consider the effect of CCF for the accurate reliability analysis of systems with CCF. However, the existing works have various limitations, such as being concerned with a specific system structure [32], [33], [34]; being applicable to systems with exponential time-to-failure distributions [24], [25], [35]; being subject to combinatorial explosion as the system redundancy increases [36], [37]; having a single common cause that affects all the system components [33], [38]. In addition, most of the existing works on CCF are dedicated to single-phase systems; few of them consider CCF for PMS [2], [39], [40] and they focus on external causes. Particularly, implicit methods were developed in Refs. [2], [39], which consider CCF in the process of binary decision diagram based analysis rather than in the modeling stage; the explicit method was applied in Ref. [40], which models CCF as shared basic events in the system fault tree model, and then applies the minimal cut sets based inclusion–exclusion method with deletion for analysis. As compared to the implicit method, the explicit method may involve adding a large number of CCF basic events into the system model, which is tedious to handle especially for high level redundancy [27].
In this paper we propose a recursive method for the reliability analysis of PMS subject to CCF caused by internal causes, i.e., propagated failures. In particular, we consider the case when the system contains several disjoint groups of elements, referred to as common cause groups (CCGs), and the failure of any element in a CCG can cause with some probability the failure of the entire group. The conditional probability of the failure propagation given the element fails can be specific for any element and phase.
The remainder of the paper is organized as follows. Section 2 describes the binary and multi-state systems and defines the system mission success probability based on an acceptability function in each phase. Section 3 presents the assumptions. Section 4 summarizes the method for evaluating the conditional reliability of an element and a CCG at a particular phase given that it is working at the beginning of the phase. Section 5 describes the proposed recursive PMS reliability evaluation algorithm. Section 6 illustrates the proposed algorithm using both an analytical example and a realistic size numerical example. Section 7 presents conclusion and future work.
Section snippets
Mission success for binary systems and multi-state systems with binary elements
This paper considers two types of PMS consisting of binary elements: binary systems and multi-state systems. The binary system can have only two states (corresponding to success and failure of the system mission); the system mission fails if the system fails at any phase. The multi-state system can have multiple states characterized by different levels of system performance and corresponding to different combinations of states of its elements; the system mission fails if its performance takes
Assumptions
The proposed method is based on the following assumptions:
- •
The system mission consists of H consecutive and non-overlapping phases.
- •
The system has n independent binary-state elements.
- •
Neither the system nor its elements are repairable during the mission.
- •
The elements can have phase-dependent and time-varying failure rates.
- •
Baseline failure time distribution of individual element is the same during all phases.
- •
The phase time does not depend on system state.
- •
The system structure can change with the
Conditional unreliabilities of system elements and CCGs
This section describes the method for evaluating the conditional reliability and unreliability of element j at phase h given that it is working at the beginning of the phase, respectively represented by pj(h) and qj(h). The phase-dependent stress on the failure properties of the elements is considered using the concept of equivalent age associated with the cumulative exposure model (CEM) [44]. Let Fj(h,t) be the stress dependent failure distribution of element j in phase h. When the life–stress
Generating combinations of element failures
Consider a random vector Xh=(x1(h),…, xn(h)) representing the system state at the end of phase h and assume that a realization Y=(y1(h),…, yn(h)) of this vector consists of s zeros (s out of n elements fail before the end of phase h). The zero elements have position numbers c(i) (1≤i≤s). Let δ(Y) be the set of position numbers of zero elements in vector Y: δ(Y)={c(1), c(2), …, c(s)}⊂Δ. Since xj(h) are non-increasing functions of h, any yj(h)=1 in Y implies that xj(m)=1 for m=1, …, h−1. Thus,
Analytical example
Consider a non-repairable system consisting of three elements. The system has to perform its task during three consecutive phases. Assume that φ1(X)=x1(x2+x3); φ2(X)=x2x3; φ3(X)=x2. The realizations of system state vector Xh, corresponding to the working system state by the end of phase h are (111), (110) and (101) for h=1; (011) and (111) for h=2 and (111), (110), (011) and (010) for h=3. The system contains one CCG ω1={1,3}. The failure propagation probabilities for element 1 and 3 are ε1(h)
Conclusion and future work
Propagated failures are one type of common-cause failures that involve simultaneous failure of multiple system elements due to a failure originating from some internal system element. In this paper, a recursive method based on conditional probabilities as well as the branch and bound principle was proposed for the reliability analysis of PMS subject to propagated failures. The proposed method has no limitation on the type of element time-to-failure distributions.
As one direction of our future
References (49)
- et al.
A systematic procedure for incorporation of common cause events into risk and reliability models
Nuclear Engineering and Design
(1986) Common cause failure probabilities in standby safety system fault tree analysis with testing — scheme and timing dependencies
Reliability Engineering & System Safety
(2003)Uncertainties and quantification of common cause failure rates and probabilities for system analyses
Reliability Engineering & System Safety
(2005)- et al.
On the value of redundancy subject to common-cause failures: toward the resolution of an on-going debate
Reliability Engineering & System Safety
(2009) - et al.
Stochastic analysis of a parallel system with common cause failures, preventive maintenance and two types of repair
Microelectronics and Reliability
(1985) Fault tree analysis of phased mission systems with repairable and non-repairable components
Reliability Engineering and System Safety
(2001)- et al.
Optimal separation of elements in vulnerable multi-state systems
Reliability Engineering & System Safety
(2001) - et al.
Separation in homogeneous systems with independent identical elements
European Journal of Operational Research
(2010) - et al.
BDD-based reliability evaluation of phased-mission systems with internal/external common-cause failures
Reliability Engineering & System Safety
(2013) - et al.
Computationally efficient phased-mission reliability analysis for systems with variable configurations
IEEE Transactions on Reliability
(1992)
Reliability evaluation of phased-mission systems with imperfect fault coverage and common-cause failures
IEEE Transactions on Reliability
Phased-mission analysis for evaluating the effectiveness of aerospace computing-systems
IEEE Transactions on Reliability
Analysis of mission-oriented systems
IEEE Transactions on Reliability
A unified method for analyzing mission reliability for fault tolerant computer systems
IEEE Transactions on Reliability
A BDD-based algorithm for reliability analysis of phased-mission systems
IEEE Transactions on Reliability
The efficient simulation of phased fault trees
Proceedings of the Annual Reliability and Maintainability Symposium
Simulation model of mission effectiveness for military systems
IEEE Transactions on Reliability
Analysis of generalized phased mission system reliability, performance and sensitivity
IEEE Transactions on Reliability
Reliability analysis of phased missions
Dependability modeling and evaluation of multiple-phased systems using DEEM
IEEE Transactions on Reliability
Automated analysis of phased-mission reliability
IEEE Transactions on Reliability
Markov regenerative stochastic Petri nets to model and evaluate phased mission systems dependability
IEEE Transactions on Computers
A non-homogeneous Markov model for phased-mission reliability analysis
IEEE Transactions on Reliability
Cited by (64)
Multilevel preventive replacement for a system subject to internal deterioration, external shocks, and dynamic missions
2023, Reliability Engineering and System SafetyReliability analysis of a BWR plant system at startup stage - analysis by the GO-FLOW methodology with consideration of loop structures and phased mission problem -
2023, Reliability Engineering and System SafetyA repair-replacement policy for a system subject to missions of random types and random durations
2023, Reliability Engineering and System SafetyImportance measure-based phased mission reliability and UAV number optimization for swarm
2022, Reliability Engineering and System SafetyReliability modeling and configuration optimization of a photovoltaic based electric power generation system
2022, Reliability Engineering and System SafetyReliability evaluation of PMS considering coupling effect of functional and physical dependency
2022, Chinese Journal of Aeronautics