A stochastic alternating renewal process model for unavailability analysis of standby safety equipment

https://doi.org/10.1016/j.ress.2015.03.005Get rights and content

Highlights

  • Analyzes unavailability as a stochastic alternating renewal process.

  • Derives a renewal integral equations for computing instantaneous unavailability.

  • Includes general distributions for failure and repair times.

  • Analyzes unavailability under an age-based preventive maintenance policy.

  • Shows average unavailability that cannot capture the equipment aging effect.

Abstract

The paper presents a stochastic approach to analyze instantaneous unavailability of standby safety equipment caused by latent failures. The problem of unavailability analysis is formulated as a stochastic alternating renewal process without any restrictions on the form of the probability distribution assigned to time to failure and repair duration. An integral equation for point unavailability is derived and numerically solved for a given maintenance policy. The paper also incorporates an age-based preventive maintenance policy with random repair time. In case of aging equipment, the asymptotic limit or average unavailability should be used with a caution, because it cannot model an increasing trend in unavailability as a result of increasing hazard rate (i.e. aging) of the time to failure distribution.

Introduction

A variety of safety equipment are installed in a nuclear plant to prevent accidents that can result in the core damage and large release of radiation. In fact, safety systems are an integral part of any industrial processing plant. A safety equipment can be an assembly of many sub-systems or components. They can be small or large, such as emergency diesel generators, back up pumps, valves, and batteries. In this paper, “system” means a single entity and unavailability is analyzed as a single unit or equipment.

Since a nuclear safety system mostly remains in the standby or idle state, a system failure is not immediately noticed. The danger of a latent failure is that the system would not be available to respond to a demand arising randomly in time due to an abnormal condition or transient. Therefore, safety systems are periodically inspected and tested to detect latent failures, and are maintained to ensure their operational readiness.

Although safety equipment are diligently inspected and maintained, potential effect of aging on nuclear safety has become a common concern among regulators world wide. As the nuclear fleet in Canada and many other countries is approaching the end of first life, it is important to evaluate the adequacy of existing safety systems in planning of the plant refurbishment projects. An asymptotic limit of unavailability, which is the same as the limiting time average unavailability, is commonly used in probabilistic safety assessment. However, the effect of equipment aging, i.e., increase in hazard rate with time, cannot be accurately reflected by the measure like average unavailability. With this background, a comprehensive investigation has been undertaken to develop a general approach to evaluate instantaneous unavailability of standby safety systems affected by latent failures. This paper presents a stochastic alternating process model to analyze the effects of aging and preventive maintenance (PM) on the equipment unavailability. Another goal of this paper is to present a clear and structured derivation of instantaneous and asymptotic unavailability based on the basics of probability theory. The proposed approach will provide a sound basis for setting up the inspection and maintenance program.

Maintenance policies can be classified into two main categories: calendar-based and time-based policies. In a calendar based policy, the equipment testing and maintenance takes places at a fixed date, such as regular schedule of plant maintenance outages.

In contrast, the time-based policy means that equipment maintenance takes place at a fixed time interval from the latest time of maintenance. In this policy, the inspections are done periodically within a renewal cycle, but actual calendar time of inspection is essentially random due to random occurrences of failures and random duration of repairs. In a nuclear plant, both calendar and time-based policies are employed depending on factors, such as maintenance planning and equipment accessibility.

This paper deals with the analysis of a time-based maintenance policy defined as follows. Consider that a safety equipment can fail at a certain random time X, which makes it unavailable to respond to a demand. The system remains unavailable until the failure is detected by an inspection and repaired. The inspection is periodic and the duration of a repair (or maintenance) is a random variable Y. A repair is able to restore the system to a as good-as-new state.

This policy is described in Fig. 1. Starting with a new system at time 0, inspections are periodically carried out at an interval τI. The system first fails at time X1, which is detected at the following inspection at time 3τI, and then repaired that takes random time Y1. Thus, the system becomes unavailable in the interval X1 to 3τI+Y1. The system is renewed at time S1=3τI+Y1 after which the inspections are resumed again at an interval τI, such that inspections in the second cycle take place at times S1+τI,S1+2τI,, and so on. In summary, the calendar time of a future inspection is essentially random due to random occurrences of failures and random duration of repairs.

The first key problem is to estimate the point (or instantaneous) unavailability at any time t, which is the probability that the system is unavailable at time t due to a failure or due to an ongoing repair after the detection of a failure.

The second problem is to evaluate the effect of an age-based preventive maintenance (PM) policy in which the system is renewed as soon as it reaches an age τM. The renewal by PM also takes some random time Z. The inspection interval (τI) and PM age (τM) can be treated as parameters for optimizing the system availability. This paper analyzes these two problems in a general setting, i.e., no restrictions on the type of distributions assigned to time to failure (X) and the repair durations, Y and Z.

This paper does not intend to provide a formal review of literature related to unavailability analysis, since comprehensive reviews, such as Vaurio [1], are already available. Only a few key papers that are pertinent to place this paper in relation to existing literature are briefly reviewed.

Unavailability analysis became an active topic of research interest with the advent of nuclear power generation technology [2]. A rigorous probabilistic treatment of this problem was presented by Caldarola [3]. This paper formulated the unavailability problem using stochastic renewal process model, derived asymptotic solutions, and also provided analytical formulas for standard case of exponentially distributed time to failure and time to repair variables.

A comprehensive analysis of unavailability of nuclear safety equipment was presented in a seminal paper by Vaurio [4]. This paper introduced general failure time, test duration and repair time distributions in the analysis. This model assumed that the system is renewed at each time of inspection and testing. This assumption is not applicable in many practical instances where inspection/testing has no significant renewal effect on the component reliability. Many follow up studies therefore aimed at relaxing this assumption. For example, Hilsmeier et al. [5] did not invoke the assumption of renewal by inspection in unavailability analysis. Vaurio presented several studies related to unavailability analysis and reviewed them in [1]. This paper also classified different types of unavailability problems depending upon the effect of testing and repair on the component lifetime distribution or reliability. This study derived the integral equation for point unavailability under an assumption that maintenance and repair actions are instantaneous. This paper also points out how to account for non-instantaneous mean downtime when it is smaller than the test interval.. The cost impact of system unavailability was analyzed in Vaurio [6] for a time-based maintenance policy similar to that described in Section 1.2. This study derived the average cost rate and average unavailability considering the finite time of repair and maintenance. Since this paper did not evaluate the instantaneous unavailability, the probability distributions of the repair and maintenance duration do not appear in the analysis.

Evaluation of time-average unavailability is appealing due to its analytical simplicity. The reason is that this requires only the mean length of (single) renewal cycle and the mean length of downtime in a single renewal cycle, which effectively eliminates the need for formulating and solving a stochastic renewal equation. The average measure however cannot capture the effect of equipment aging reflected by an increasing hazard rate, as shown by examples given in this paper. Past studies that ignored repair duration or employed the asymptotic solution are not reviewed in the subsequent discussion.

The time-based maintenance policy with random repair time was analyzed by Vaurio [6] to evaluate the average unavailability and cost rate. Cui and Xie [7] considered both discrete and continuous repair time distribution. However, age-based PM policy was not analyzed in this study.

Although the calendar-based maintenance policy is not analyzed in this paper, interested readers are referred to Dialynas and Michos [8] for details. Tang et al. [9] extended this work by including a finite and constant time of inspection. Unavailability can also be caused by human error and imperfect inspections, though these issues are not considered in this paper. Similarly, unavailability of a system with n components is not analyzed. Readers are referred to Li et al. [10] for recent results in the analysis of k-out-of-n systems.

The main aim of this paper is to present a general stochastic alternating renewal process model for analyzing instantaneous unavailability in which the failure and repair times are modelled using general probability distributions. To mitigate adverse effects of aging on safety, the proposed model also includes an age-based PM policy with random duration of repair.

The paper is organized as follows. Section 2 presents the basic terminology and elements of stochastic renewal process model. The formulations of instantaneous (or point) and asymptotic unavailability analysis are presented in Section 3. Section 4 presents numerical examples and the conclusions of this study are presented in the last section. Additional analytical derivations are presented in Appendix A.

Section snippets

Terminology and assumptions

With reference to the time-based maintenance policy shown in Fig. 1, key assumptions and notations are described below:

  • The time to failure, X, of the equipment is a random variable with the cumulative distribution function (CDF), FX(x) and the reliability function, FX¯(x)=1FX(x). A failure at time X makes the equipment unavailable.

  • To restore the equipment after a failure, a corrective maintenance (CM) is required. The time required for CM, denoted as Y, is a random variable with CDF FY(y). A

Point unavailability

Following Eq. (6), the length of an nth renewal cycle, Tn, can be written as Tn=(Xn/τIτI+Yn)1{XnτM}+(τM+Z)1{Xn>τM}.The nth renewal cycle is composed of two disjoint sub-intervals. In the first sub-interval, the system is available, and it is unavailable in the second sub-interval. The sub-interval of unavailability begins at time Sn1+Xn if XnτM, and at Sn1+τM if Xn>τM. In summary, the nth cycle can be written as a union of the following two disjoint sub-intervals: [Sn1,Sn)=[Sn1,Sn1+min(

Example 1: the Weibull aging model

Consider that the lifetime (X) of a system has the Weibull distribution with shape parameter α=2 and scale parameter β=20 months. The mean time to failure is 17.72 months. The time to CM and PM are exponentially distributed with mean of μY=0.5 and μZ=0.25 months, respectively. The planning horizon is 40 months. The inspection interval is τI=4 months. A numerical algorithm based on the trapezoidal scheme is used to solve recursive integral equation for point unavailability [16], [17].

Consider

Conclusions

Standby safety systems in a nuclear plant serve important safety functions in case of an emergency, such as during a reactor transient or an external hazard. The availability of safety system is an important input to probabilistic safety analysis (PSA) of the nuclear plant. Probabilistic modelling of unavailability has served as a basis to determine inspection and maintenance intervals. Traditionally, the effect of aging has been ignored by assuming the equipment lifetime as an exponentially

Acknowledgements

The authors gratefully acknowledge the financial support for this study provided by the Natural Science and Engineering Council of Canada (NSERC) and the University Network of Excellence in Nuclear Engineering (UNENE).

Cited by (25)

  • An alternating renewal process to model constellation availability

    2021, Advances in Space Research
    Citation Excerpt :

    This allows us to analyze each of the outage types individually. The proof of this is similar to the one provided by Weide and Pandey (2015) and we write this as Eq. A.1 in the Appendix. This section has described the methods of computing both point and asymptotic availability for a constellation slot.

  • A two-scale maintenance policy for protection systems subject to shocks when meeting demands

    2020, Reliability Engineering and System Safety
    Citation Excerpt :

    Protection systems often play an important role in reducing the risk of critical incidents, hence the importance of providing a satisfactory level of availability for them. Inspection policies and hybrid inspection and preventive replacement/repair policies have been proposed as good alternatives for providing a satisfactory level of availability for protection systems at a reasonable cost [1–3, 5–8, 21, 22, 35–39]. The cited papers addressed different issues observed in problems involving maintenance planning for protection systems; however, all of them proposed maintenance policies based solely on a time scale.

  • Reliability analysis of a cold-standby system considering the development stages and accumulations of failure mechanisms

    2018, Reliability Engineering and System Safety
    Citation Excerpt :

    Several authors [25] have clearly stated the assumption that the cold-standby component did not degrade or fail prior to the operation, whereas in most studies, the default condition was assumed. A special case of a cold-standby system is the standby safety equipment (SSE) [26,27] used in nuclear plants to prevent core damage and radiation leaks. This kind of standby system is used to terminate the nuclear physical reaction in a safe manner; this system differs from the traditional system in that the cold-standby component covers the failed active component and completes the function.

  • Statistical trend tests for resilience of power systems

    2018, Reliability Engineering and System Safety
    Citation Excerpt :

    As there are a lot of missing data for the demand loss in the OE-417 database, further research on advanced data augmentation procedures [31] is need to analyse the data. Furthermore, availability is also an important index for system resilience [32,33]. Future research can focus on testing trends in a system’s availability over time.

  • Determining the inspection intervals for one-shot systems with support equipment

    2018, Reliability Engineering and System Safety
    Citation Excerpt :

    The system is always replaced at interval T, and corrective replacement cost is considered higher than preventive replacement cost in the (B − R) policy, they also found the optimal values of regular time interval T which minimized the cost criterion. Van der Weide and Pandey [6] presented a stochastic alternating renewal process model for a single-unit system. Failures are detected only by periodic inspection, the system is renewed at each inspection time point.

View all citing articles on Scopus
View full text