Competing failure analysis in phased-mission systems with functional dependence in one of phases
Highlights
► Reliability of non-repairable phased-mission systems subject to competing failure propagation and isolation effects is analyzed. ► The proposed combinatorial and analytical algorithm exercises the “decomposition and aggregation” strategy. ► The proposed algorithm has no limitation on the type of time-to-failure distributions for system components.
Introduction
In many real-world applications, such as aerospace, nuclear power, airborne weapon systems and distributed computing systems, multiple phases are involved in the system mission [1], [2], [3], [4], [5], [6]. During different phases, different tasks have to be accomplished, and the system may be subject to different stresses, environmental conditions, as well as reliability requirements. Thus, system configuration, success criteria, and component behavior may vary from phase to phase [1]. A classic example is an aircraft flight which involves taxi, take-off, ascent, level flight, descent, and landing phases [2], [6]. If there are two engines, one engine is usually required during the taxi phase, but both engines are necessary during the take-off phase. In addition, the engines are more likely to fail during the take-off phase due to the enormous stress in this phase as compared to other phases of the flight profile [7]. Systems used in those multi-phased missions are referred to as phased-mission systems (PMS).
Besides the above described dynamics in the system configuration, success criteria, and component behavior, statistical dependencies of component states across phases (in particular, the state of a component at the beginning of a phase should be identical to its state at the end of the previous phase) also contribute to the difficulty in analyzing the reliability of PMS [8]. Ref. [9] reviews the state-of-the-art of PMS reliability modeling and analysis techniques, where two classes of approaches were identified: analytical methods [1], [2], [3], [4], [5], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17] and simulations [18], [19]. The analytical methods can be further classified into three categories: state space-oriented methods [1], [13], [14], [15], [16], combinatorial methods [3], [4], [5], [8], [10], [11], [12], and hybrid methods that combine the former two analytical methods as appropriate [2], [17]. To the best of our knowledge, no existing works on PMS have addressed competing failure propagation and isolation effects [20].
Specifically, the failure propagation effect or a propagated failure occurs when a failure originating from a system component causes damage to other system components besides the component itself. Furthermore, if the failure affects all the system components and thus causes the entire system to fail, a propagated failure with global effect (PFGE) occurs [21]. There are two major causes for the occurrence of PFGE: imperfect fault coverage resulting from the malfunction of the system’s automatic fault detection/recovery mechanism [10], [22], [23], [24], [25], and destructive effect (e.g., explosion, blackout, overheating, voltage surge) of some system component on other system components in the case of failure [20]. However, it is not always the case that the PFGE causes the total failure for systems subject to functional dependence (FDEP) behavior, where the failure of a trigger component causes other components (referred to as dependent components) to become inaccessible or unusable. In particular, if the failure of the trigger component occurs before the PFGE of dependent component occurs, the failure isolation effect takes place. The entire system may or may not fail depending on the remaining operational components and the system structure function. In this case, the failure of the trigger component not only isolates the dependent components from the rest of the system but also makes the system insensitive to any failure originating from the dependent components. However, if the trigger component is still functioning when the PFGE of a dependent component happens, the propagation effect takes place causing the overall system failure.
Consider a specific example where communication among computers within a computer network is achieved through network interface cards (NIC). Each NIC is a trigger component and corresponding connected computers are dependent components. Some computer virus in a connected computer is able to spread through and crash the entire network. But, if the NIC fails first, the failure isolation effect takes place, the virus only affects the local computer instead of the entire network [26], [27]. In summary, there are two distinct consequences of a propagated failure in the system subject to the functional dependence behavior due to the competition in the time domain between the failure from the trigger component and the PFGE from a dependent component, or in other words, between the failure isolation and failure propagation effects. Such competing failure behavior must be addressed for the accurate system reliability analysis.
Reliability of systems subject to the competing failures has been studied for both binary systems [28], [29] and multi-state systems [30]. All these works focus only on single-phase systems; no work has been done on the analysis of PMS considering the competing failures. In this paper, we develop an analytical and combinatorial method for analyzing the reliability of PMS subject to competing failure propagation and isolation effects.
The remainder of the paper is organized as follows. Section 2 presents a description of the system model considered in this paper. Section 3 describes preliminary methods to handle propagated failure and to analyze PMS in our study. Section 4 presents the proposed combinatorial approach for the reliability analysis of PMS subject to the competing failure isolation and propagation effects. Section 5 gives an illustrative example and detailed analysis of the example system using the proposed method. Verification using a Markov-based method is also presented. Section 6 gives a case study to further illustrate the application and advantages of the proposed method. Conclusions and future work are given in Section 7.
Section snippets
System model description
The system mission consists of multiple consecutive and non-overlapping phases. The system structure can change with the phases, which is expressed by phase-dependent fault tree models. The system components can also have phase-dependent and time-varying failure parameter values. Some system components can be used in all the phases; some are used only in specific phases. If a component is not used in a phase, it means that the component’s local failure makes no contribution to the failure of
Preliminary methods
In this section, we review the basics of simple and efficient algorithm (SEA) method and a binary decision diagrams (BDD) based algorithm, which are used to handle propagated failures and to analyze PMS without competing failure effects, respectively.
The proposed combinatorial approach
The suggested approach for the reliability analysis of PMS subject to competing failure isolation and propagation effects can be described as the following step-by-step procedure:
Step 1: Separate propagated failure originating from all non-dependent components using the SEA method described in Section 3.1. If any propagated failure from non-dependent component happens, the PMS fails.
An illustrative example and verification
In this section, a two-phase system is analyzed to illustrate the application of our proposed method. The fault tree model for each phase is shown in Fig. 2. The example system is also analyzed using a Markov-based method to verify the proposed method. Because the Markov-based method can only be applied to systems with components having exponential time-to-failure distributions, the exponential distribution is assumed for each component for verification purpose. Table 1 illustrates the constant
A case study
In this section, a larger example with three phases is studied. Fig. 9 shows the fault tree model for each phase. In this example PMS, four computers B, C, D, and E work together to complete a three-phase mission task. In phase 1 and phase 3, only local computing is involved. Specifically, phase 1 fails when computer D and either of the two computers B and C fails; phase 3 fails when computer B and either of the two computers D and E fails. In phase 2, computers B and C need to access the
Conclusions and future work
This paper has proposed a combinatorial and analytical method for the reliability analysis of PMS subject to competing failure isolation and failure propagation effects. As illustrated through the examples, based on the total probability law, our proposed approach exercises the “decomposition and aggregation” strategy in decomposing the original reliability problem into a set of reduced problems. Those reduced problems are independent and thus can be solved in parallel given available computing
Acknowledgment
This work was supported in part by the US National Science Foundation under grant No. 0832594.
References (32)
- et al.
Reliability of k-out-of-n systems with phased-mission requirements and imperfect fault coverage
Reliability Engineering and System Safety
(2012) - et al.
An algorithm for reliability analysis of phased-mission systems
Reliability Engineering and System Safety
(1999) - et al.
Reliability and performance of multi-state systems with propagated failures having selective effect
Reliability Engineering and System Safety
(2010) - et al.
Multi-state systems with multi-fault coverage
Reliability Engineering and System Safety
(2008) - et al.
Combinatorial analysis of systems with competing failures subject to failure isolation and propagation effects
Reliability Engineering and System Safety
(2010) - et al.
Computationally efficient phased-mission reliability analysis for systems with variable configurations
IEEE Transactions on Reliability
(1992) Reliability evaluation of phased-mission systems with imperfect fault coverage and common-cause failures
IEEE Transactions on Reliability
(2007)- et al.
Phased-mission analysis for evaluating the effectiveness of aerospace computing-systems
IEEE Transactions on Reliability
(1981) - et al.
Analysis of mission-oriented systems
IEEE Transactions on Reliability
(1969) A unified method for analyzing mission reliability for fault tolerant computer systems
IEEE Transactions on Reliability
(1973)
Reliability importance analysis of generalized phased-mission systems
International Journal of Performability Engineering
BDD-based algorithm for reliability analysis of phased-mission systems
IEEE Transactions on Reliability
Reliability of phased-mission systems
Analysis of generalized phased-mission system reliability, performance and sensitivity
IEEE Transactions on Reliability
Reliability analysis of phased missions
Dependability modeling and evaluation of multiple-phased systems using DEEM
IEEE Transactions on Reliability
Cited by (57)
Dependent failure behavior modeling for risk and reliability: A systematic and critical literature review
2023, Reliability Engineering and System SafetyOptimal structure of multiple resource supply systems with storages
2023, Reliability Engineering and System SafetyMission performance analysis of phased-mission systems with cross-phase competing failures
2023, Reliability Engineering and System SafetyReliability modeling of modular k-out-of-n systems with functional dependency: A case study of radar transmitter systems
2023, Reliability Engineering and System SafetyReliability analysis of smart home sensor systems subject to competing failures
2022, Reliability Engineering and System SafetyReliability evaluation of PMS considering coupling effect of functional and physical dependency
2022, Chinese Journal of Aeronautics