Automatic creation of Markov models for reliability assessment of safety instrumented systems

https://doi.org/10.1016/j.ress.2007.03.029Get rights and content

Abstract

After the release of new international functional safety standards like IEC 61508, people care more for the safety and availability of safety instrumented systems. Markov analysis is a powerful and flexible technique to assess the reliability measurements of safety instrumented systems, but it is fallible and time-consuming to create Markov models manually. This paper presents a new technique to automatically create Markov models for reliability assessment of safety instrumented systems. Many safety related factors, such as failure modes, self-diagnostic, restorations, common cause and voting, are included in Markov models. A framework is generated first based on voting, failure modes and self-diagnostic. Then, repairs and common-cause failures are incorporated into the framework to build a complete Markov model. Eventual simplification of Markov models can be done by state merging. Examples given in this paper show how explosively the size of Markov model increases as the system becomes a little more complicated as well as the advancement of automatic creation of Markov models.

Introduction

In accordance with IEC 61508, the new international functional safety standards [1], assessing the reliability measurements of SISs is required in the safety life cycle to assure the safety. The assessments can be done through a number of probabilistic analysis techniques, such as fault tree analysis (FTA) [2], [3], reliability block diagram (RBD) [4], Markov analysis (MA) [3], [5], [6], simplified equations [7], [8] and hybrid method [9]. Hauge et al. introduced a method called PDS [10] to quantify the safety unavailability and loss of production for safety instrumented systems. Availability of SISs can be evaluated as well by probabilistic analysis models. Some compared those techniques and outlined their advantages and disadvantages [11], [12]. MA covers most aspects that affect reliability, shows more flexibility than any other techniques and is the only one that can describe dynamic transitions among different system states. Hokstad questioned whether the Markov chain approach is very appropriate for a system with dormant failures, requiring periodic functional test [13]. He also suggested a “standard” approach, as given e.g. in Chapter 10 of Ref. [14]. Bukowski provided answers to Hokstad's question. She modeled and analyzed the effects of periodic test using Markov models [17]. Markov models can be solved using the methods in Refs. [3], [5], [6], [12], [17]. The numeric technique in Ref. [3] combined with periodic tests modeling [17] can provide a practical and easy way to calculate reliability measures for safety instrumented systems. However, the size of Markov model of SIS increases explosively as the system becomes more complex. It is fallible and time-consuming to create Markov models manually. Those disadvantages may eliminate engineers’ willingness to use MA, though Markov models can be solved by some computer programs.

Compared with manual modeling, automatic Markov modeling is more effective, accurate and convenient, but only a few papers published are focused on that topic. Johnson and Butler developed a high-level abstract language to describe the behavior of the fault-tolerant system to be modeled [15]. Houtermans et al. put forward another method, into which descriptive intermediate models were also incorporated [16]. The intermediate models include an RBD and a voting table, which are used to identify dangerous failures and safe failures, respectively. In the two approaches above, experts are still indispensable for the generation of statements of a high-level abstract language or RBD and voting tables.

Safety instrumented systems are a special kind of fault-tolerant systems, which always consist of three subsystems, the sensor part, the logic part and the final element part. Furthermore, a part can be divided into several independent groups, each of which has its own single channel or redundant channels and the corresponding voting logic. According to such decomposability, this paper presents a new technique that does not use any intermediate model to automatically create Markov models for assessing the reliability measurements of safety instrumented systems. Reparation policies and common-cause failures (CCF) are also introduced into the Markov models.

Section snippets

Decomposing the SIS

To perform a specific safety function, a safety instrumented system may need three independent subsystems, sensor, logic and final element. If one of the three subsystems fails, the SIS cannot function correctly. Accordingly, the relationship of the three subsystems is logic “or”. The average probability of failure on demand (PFDavg) of a safety function is determined by calculating and combining the PFDavg for all the subsystems that together provide the safety function. The calculation can be

Assumptions

The technique of automated Markov models creation in this paper is based on the following assumptions:

  • All the channels in a voted group have same failure rate and same diagnostic coverage.

  • Both failure rates and repair rates are constant.

  • In the initial state of SIS, all the components made up of the SIS operate successfully.

  • Only single normal failure (non-CCF) can occur per unit of time.

  • Only one set of multiple failures caused by common cause can occur per unit of time.

  • Single normal failure and

Generating the framework of a Markov model

From this section on, voted groups are the systems to be modeled for their operation between proof tests. The first step of building a Markov model is to form a framework which contains all the states and normal failures that the Markov model has. The framework must have an initial state, no failure at all. On the other hand, safe fail, dangerous detected fail and dangerous undetected fail are the three concerned categories of failure states. There is another kind of state called intermediate

Reparations of detected failures

After components failures are detected by self-diagnostic, they can be repaired immediately. However, it is possible for more than one detected failure to exist at the same moment. How multiple detected failures are repaired is determined by the repair policy. When safe and dangerous failures are detected simultaneously, the priority must be chosen concerning failures of which category should be repaired first. Consider the case that R (R>0, integer) repair teams are available to work on

Incorporating CCF into the framework

A simple technique was introduced in IEC 61508 to handle CCF called β-factor model. Failures are divided into two categories, normal and common cause, as shown in Table 1. β is the ratio of CCF rate to total failure rate. Two or more components can fail together due to common cause, but β-factor does not distinguish the number of failed components. Hokstad and Corneliussen were aware of the limitation and advanced an improved technique called multiple beta factor model [18]. As well as β, more

Merging states

Incorporated with reparations and CCF, a framework becomes a complete Markov model for solution. Before solving the model, it is possible to simplify the model through states merging in order to reduce the burden of computation. Shooman and Laemmel suggested a method in 1987 to merge states of Markov models [19]. Those states that have identical transition rates to common states can be merged into one. Entry rates are added and exit rates remain the same.

Examples

A computer program has been developed to realize the technique presented in this paper. Some modeling results are illustrated in this section.

Conclusion

MA covers most aspects that affect reliability, shows more flexibility than any other techniques and describes dynamic transitions among different states. However, the size of Markov model of SIS increases explosively as the system becomes more complex. It is fallible and time-consuming to create Markov models manually. Safety instrumented systems are a special kind of fault-tolerant systems. They can always be decomposed into subsystems and groups, among which simple and straightforward

Acknowledgment

The paper is a result of a work financially supported by National Natural Science Foundation of China numbered 60674064.

References (19)

There are more references available in the full text version of this article.

Cited by (87)

  • Markov and semi-Markov models in system reliability

    2022, Engineering Reliability and Risk Assessment
  • Safety barriers: Research advances and new thoughts on theory, engineering and management

    2020, Journal of Loss Prevention in the Process Industries
    Citation Excerpt :

    State transition models, the Markov method and Petri net (PN), are used to reflect the operations of active safety barriers, and then to analyze their integrity. The Markov method is recommended by IEC 61508 (2010) due to its flexibility and has been adopted by many researchers (e.g. Guo and Yang, 2008; Liu and Rausand, 2011, 2013; Cai et al., 2012a, 2012b; Verlinden et al., 2012; Mechri et al., 2015; Zeng and Zio, 2018). He et al. (2016) have combined RBD and the Markov method to construct a model for analyzing SISs in nuclear plants.

View all citing articles on Scopus
View full text