Competing failure analysis in phased-mission systems with functional dependence in one of phases

https://doi.org/10.1016/j.ress.2012.07.004Get rights and content

Abstract

This paper proposes an algorithm for the reliability analysis of non-repairable phased-mission systems (PMS) subject to competing failure propagation and isolation effects. A failure originating from a system component which causes extensive damage to other system components is a propagated failure. When the propagated failure affects all the system components, causing the entire system failure, a propagated failure with global effect (PFGE) is said to occur. However, the failure propagation can be isolated in systems subject to functional dependence (FDEP) behavior, where the failure of a component (referred to as trigger component) causes some other components (referred to as dependent components) to become inaccessible or unusable (isolated from the system), and thus further failures from these dependent components have no effect on the system failure behavior. On the other hand, if any PFGE from dependent components occurs before the trigger failure, the failure propagation effect takes place, causing the overall system failure. In summary, there are two distinct consequences of a PFGE due to the competition between the failure isolation and failure propagation effects in the time domain. Existing works on such competing failures focus only on single-phase systems. However, many real-world systems are phased-mission systems (PMS), which involve multiple, consecutive and non-overlapping phases of operations or tasks. Consideration of competing failures for PMS is a challenging and difficult task because PMS exhibit dynamics in the system configuration and component behavior as well as statistical dependencies across phases for a given component. This paper proposes a combinatorial method to address the competing failure effects in the reliability analysis of binary non-repairable PMS. The proposed method is verified using a Markov-based method through a numerical example. Different from the Markov-based approach that is limited to exponential distribution, the proposed approach has no limitation on the type of time-to-failure distributions for the system components. A case study is given to illustrate such advantage of the proposed method.

Highlights

► Reliability of non-repairable phased-mission systems subject to competing failure propagation and isolation effects is analyzed. ► The proposed combinatorial and analytical algorithm exercises the “decomposition and aggregation” strategy. ► The proposed algorithm has no limitation on the type of time-to-failure distributions for system components.

Introduction

In many real-world applications, such as aerospace, nuclear power, airborne weapon systems and distributed computing systems, multiple phases are involved in the system mission [1], [2], [3], [4], [5], [6]. During different phases, different tasks have to be accomplished, and the system may be subject to different stresses, environmental conditions, as well as reliability requirements. Thus, system configuration, success criteria, and component behavior may vary from phase to phase [1]. A classic example is an aircraft flight which involves taxi, take-off, ascent, level flight, descent, and landing phases [2], [6]. If there are two engines, one engine is usually required during the taxi phase, but both engines are necessary during the take-off phase. In addition, the engines are more likely to fail during the take-off phase due to the enormous stress in this phase as compared to other phases of the flight profile [7]. Systems used in those multi-phased missions are referred to as phased-mission systems (PMS).

Besides the above described dynamics in the system configuration, success criteria, and component behavior, statistical dependencies of component states across phases (in particular, the state of a component at the beginning of a phase should be identical to its state at the end of the previous phase) also contribute to the difficulty in analyzing the reliability of PMS [8]. Ref. [9] reviews the state-of-the-art of PMS reliability modeling and analysis techniques, where two classes of approaches were identified: analytical methods [1], [2], [3], [4], [5], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17] and simulations [18], [19]. The analytical methods can be further classified into three categories: state space-oriented methods [1], [13], [14], [15], [16], combinatorial methods [3], [4], [5], [8], [10], [11], [12], and hybrid methods that combine the former two analytical methods as appropriate [2], [17]. To the best of our knowledge, no existing works on PMS have addressed competing failure propagation and isolation effects [20].

Specifically, the failure propagation effect or a propagated failure occurs when a failure originating from a system component causes damage to other system components besides the component itself. Furthermore, if the failure affects all the system components and thus causes the entire system to fail, a propagated failure with global effect (PFGE) occurs [21]. There are two major causes for the occurrence of PFGE: imperfect fault coverage resulting from the malfunction of the system’s automatic fault detection/recovery mechanism [10], [22], [23], [24], [25], and destructive effect (e.g., explosion, blackout, overheating, voltage surge) of some system component on other system components in the case of failure [20]. However, it is not always the case that the PFGE causes the total failure for systems subject to functional dependence (FDEP) behavior, where the failure of a trigger component causes other components (referred to as dependent components) to become inaccessible or unusable. In particular, if the failure of the trigger component occurs before the PFGE of dependent component occurs, the failure isolation effect takes place. The entire system may or may not fail depending on the remaining operational components and the system structure function. In this case, the failure of the trigger component not only isolates the dependent components from the rest of the system but also makes the system insensitive to any failure originating from the dependent components. However, if the trigger component is still functioning when the PFGE of a dependent component happens, the propagation effect takes place causing the overall system failure.

Consider a specific example where communication among computers within a computer network is achieved through network interface cards (NIC). Each NIC is a trigger component and corresponding connected computers are dependent components. Some computer virus in a connected computer is able to spread through and crash the entire network. But, if the NIC fails first, the failure isolation effect takes place, the virus only affects the local computer instead of the entire network [26], [27]. In summary, there are two distinct consequences of a propagated failure in the system subject to the functional dependence behavior due to the competition in the time domain between the failure from the trigger component and the PFGE from a dependent component, or in other words, between the failure isolation and failure propagation effects. Such competing failure behavior must be addressed for the accurate system reliability analysis.

Reliability of systems subject to the competing failures has been studied for both binary systems [28], [29] and multi-state systems [30]. All these works focus only on single-phase systems; no work has been done on the analysis of PMS considering the competing failures. In this paper, we develop an analytical and combinatorial method for analyzing the reliability of PMS subject to competing failure propagation and isolation effects.

The remainder of the paper is organized as follows. Section 2 presents a description of the system model considered in this paper. Section 3 describes preliminary methods to handle propagated failure and to analyze PMS in our study. Section 4 presents the proposed combinatorial approach for the reliability analysis of PMS subject to the competing failure isolation and propagation effects. Section 5 gives an illustrative example and detailed analysis of the example system using the proposed method. Verification using a Markov-based method is also presented. Section 6 gives a case study to further illustrate the application and advantages of the proposed method. Conclusions and future work are given in Section 7.

Section snippets

System model description

The system mission consists of multiple consecutive and non-overlapping phases. The system structure can change with the phases, which is expressed by phase-dependent fault tree models. The system components can also have phase-dependent and time-varying failure parameter values. Some system components can be used in all the phases; some are used only in specific phases. If a component is not used in a phase, it means that the component’s local failure makes no contribution to the failure of

Preliminary methods

In this section, we review the basics of simple and efficient algorithm (SEA) method and a binary decision diagrams (BDD) based algorithm, which are used to handle propagated failures and to analyze PMS without competing failure effects, respectively.

The proposed combinatorial approach

The suggested approach for the reliability analysis of PMS subject to competing failure isolation and propagation effects can be described as the following step-by-step procedure:

  • Step 1: Separate propagated failure originating from all non-dependent components using the SEA method described in Section 3.1. If any propagated failure from non-dependent component happens, the PMS fails.Pr(PMSfails)=Pr[PMSfails|(atleastonenon-dependentcomponentfailsgloballyduringthemission)]×Pr(Atleastonenon-

An illustrative example and verification

In this section, a two-phase system is analyzed to illustrate the application of our proposed method. The fault tree model for each phase is shown in Fig. 2. The example system is also analyzed using a Markov-based method to verify the proposed method. Because the Markov-based method can only be applied to systems with components having exponential time-to-failure distributions, the exponential distribution is assumed for each component for verification purpose. Table 1 illustrates the constant

A case study

In this section, a larger example with three phases is studied. Fig. 9 shows the fault tree model for each phase. In this example PMS, four computers B, C, D, and E work together to complete a three-phase mission task. In phase 1 and phase 3, only local computing is involved. Specifically, phase 1 fails when computer D and either of the two computers B and C fails; phase 3 fails when computer B and either of the two computers D and E fails. In phase 2, computers B and C need to access the

Conclusions and future work

This paper has proposed a combinatorial and analytical method for the reliability analysis of PMS subject to competing failure isolation and failure propagation effects. As illustrated through the examples, based on the total probability law, our proposed approach exercises the “decomposition and aggregation” strategy in decomposing the original reliability problem into a set of reduced problems. Those reduced problems are independent and thus can be solved in parallel given available computing

Acknowledgment

This work was supported in part by the US National Science Foundation under grant No. 0832594.

References (32)

  • L Xing

    Reliability importance analysis of generalized phased-mission systems

    International Journal of Performability Engineering

    (2007)
  • X Zang et al.

    BDD-based algorithm for reliability analysis of phased-mission systems

    IEEE Transactions on Reliability

    (1999)
  • L Xing et al.

    Reliability of phased-mission systems

  • L Xing et al.

    Analysis of generalized phased-mission system reliability, performance and sensitivity

    IEEE Transactions on Reliability

    (2002)
  • JD Esary et al.

    Reliability analysis of phased missions

  • A Bondavalli et al.

    Dependability modeling and evaluation of multiple-phased systems using DEEM

    IEEE Transactions on Reliability

    (2004)
  • Cited by (57)

    • Optimal structure of multiple resource supply systems with storages

      2023, Reliability Engineering and System Safety
    View all citing articles on Scopus
    View full text