Automatic creation of Markov models for reliability assessment of safety instrumented systems

doi:10.1016/j.ress.2007.03.029

Reliability Engineering & System Safety

Volume 93, Issue 6, June 2008, Pages 829-837

https://doi.org/10.1016/j.ress.2007.03.029 Get rights and content

Abstract

After the release of new international functional safety standards like IEC 61508, people care more for the safety and availability of safety instrumented systems. Markov analysis is a powerful and flexible technique to assess the reliability measurements of safety instrumented systems, but it is fallible and time-consuming to create Markov models manually. This paper presents a new technique to automatically create Markov models for reliability assessment of safety instrumented systems. Many safety related factors, such as failure modes, self-diagnostic, restorations, common cause and voting, are included in Markov models. A framework is generated first based on voting, failure modes and self-diagnostic. Then, repairs and common-cause failures are incorporated into the framework to build a complete Markov model. Eventual simplification of Markov models can be done by state merging. Examples given in this paper show how explosively the size of Markov model increases as the system becomes a little more complicated as well as the advancement of automatic creation of Markov models.

Introduction

In accordance with IEC 61508, the new international functional safety standards [1], assessing the reliability measurements of SISs is required in the safety life cycle to assure the safety. The assessments can be done through a number of probabilistic analysis techniques, such as fault tree analysis (FTA) [2], [3], reliability block diagram (RBD) [4], Markov analysis (MA) [3], [5], [6], simplified equations [7], [8] and hybrid method [9]. Hauge et al. introduced a method called PDS [10] to quantify the safety unavailability and loss of production for safety instrumented systems. Availability of SISs can be evaluated as well by probabilistic analysis models. Some compared those techniques and outlined their advantages and disadvantages [11], [12]. MA covers most aspects that affect reliability, shows more flexibility than any other techniques and is the only one that can describe dynamic transitions among different system states. Hokstad questioned whether the Markov chain approach is very appropriate for a system with dormant failures, requiring periodic functional test [13]. He also suggested a “standard” approach, as given e.g. in Chapter 10 of Ref. [14]. Bukowski provided answers to Hokstad's question. She modeled and analyzed the effects of periodic test using Markov models [17]. Markov models can be solved using the methods in Refs. [3], [5], [6], [12], [17]. The numeric technique in Ref. [3] combined with periodic tests modeling [17] can provide a practical and easy way to calculate reliability measures for safety instrumented systems. However, the size of Markov model of SIS increases explosively as the system becomes more complex. It is fallible and time-consuming to create Markov models manually. Those disadvantages may eliminate engineers’ willingness to use MA, though Markov models can be solved by some computer programs.

Compared with manual modeling, automatic Markov modeling is more effective, accurate and convenient, but only a few papers published are focused on that topic. Johnson and Butler developed a high-level abstract language to describe the behavior of the fault-tolerant system to be modeled [15]. Houtermans et al. put forward another method, into which descriptive intermediate models were also incorporated [16]. The intermediate models include an RBD and a voting table, which are used to identify dangerous failures and safe failures, respectively. In the two approaches above, experts are still indispensable for the generation of statements of a high-level abstract language or RBD and voting tables.

Safety instrumented systems are a special kind of fault-tolerant systems, which always consist of three subsystems, the sensor part, the logic part and the final element part. Furthermore, a part can be divided into several independent groups, each of which has its own single channel or redundant channels and the corresponding voting logic. According to such decomposability, this paper presents a new technique that does not use any intermediate model to automatically create Markov models for assessing the reliability measurements of safety instrumented systems. Reparation policies and common-cause failures (CCF) are also introduced into the Markov models.

Section snippets

Decomposing the SIS

To perform a specific safety function, a safety instrumented system may need three independent subsystems, sensor, logic and final element. If one of the three subsystems fails, the SIS cannot function correctly. Accordingly, the relationship of the three subsystems is logic “or”. The average probability of failure on demand (PFD_avg) of a safety function is determined by calculating and combining the PFD_avg for all the subsystems that together provide the safety function. The calculation can be

Assumptions

The technique of automated Markov models creation in this paper is based on the following assumptions:

•
All the channels in a voted group have same failure rate and same diagnostic coverage.
•
Both failure rates and repair rates are constant.
•
In the initial state of SIS, all the components made up of the SIS operate successfully.
•
Only single normal failure (non-CCF) can occur per unit of time.
•
Only one set of multiple failures caused by common cause can occur per unit of time.
•
Single normal failure and

Generating the framework of a Markov model

From this section on, voted groups are the systems to be modeled for their operation between proof tests. The first step of building a Markov model is to form a framework which contains all the states and normal failures that the Markov model has. The framework must have an initial state, no failure at all. On the other hand, safe fail, dangerous detected fail and dangerous undetected fail are the three concerned categories of failure states. There is another kind of state called intermediate

Reparations of detected failures

After components failures are detected by self-diagnostic, they can be repaired immediately. However, it is possible for more than one detected failure to exist at the same moment. How multiple detected failures are repaired is determined by the repair policy. When safe and dangerous failures are detected simultaneously, the priority must be chosen concerning failures of which category should be repaired first. Consider the case that R (R>0, integer) repair teams are available to work on

Incorporating CCF into the framework

A simple technique was introduced in IEC 61508 to handle CCF called β-factor model. Failures are divided into two categories, normal and common cause, as shown in Table 1. β is the ratio of CCF rate to total failure rate. Two or more components can fail together due to common cause, but β-factor does not distinguish the number of failed components. Hokstad and Corneliussen were aware of the limitation and advanced an improved technique called multiple beta factor model [18]. As well as β, more

Merging states

Incorporated with reparations and CCF, a framework becomes a complete Markov model for solution. Before solving the model, it is possible to simplify the model through states merging in order to reduce the burden of computation. Shooman and Laemmel suggested a method in 1987 to merge states of Markov models [19]. Those states that have identical transition rates to common states can be merged into one. Entry rates are added and exit rates remain the same.

Examples

A computer program has been developed to realize the technique presented in this paper. Some modeling results are illustrated in this section.

Conclusion

MA covers most aspects that affect reliability, shows more flexibility than any other techniques and describes dynamic transitions among different states. However, the size of Markov model of SIS increases explosively as the system becomes more complex. It is fallible and time-consuming to create Markov models manually. Safety instrumented systems are a special kind of fault-tolerant systems. They can always be decomposed into subsystems and groups, among which simple and straightforward

Acknowledgment

The paper is a result of a work financially supported by National Natural Science Foundation of China numbered 60674064.

References (19)

H. Guo et al.
A simple reliability block diagram method for safety integrity verification
Reliab Eng Sys Safety
(2007)
J. Bukowski et al.
Using Markov models for safety analysis of programmable electronic systems
ISA Trans
(1995)
T. Zhang et al.
Availability of systems with self-diagnostic components—applying Markov model to IEC 61508-6
Reliab Eng Sys Safety
(2003)
A. Summers
Viewpoint on ISA TR84.0.02—simplified methods and fault tree analysis
ISA Trans
(2000)
B. Knegtering et al.
Application of micro Markov models for quantitative safety assessment to determine safety integrity levels as defined by the IEC 61508 standard for functional safety
Reliab Eng Sys Safety
(1999)
J. Rouvroye et al.
New quantitative safety standards: different techniques, different results?
Reliab Eng Sys Safety
(1999)
P. Hokstad et al.
Loss of safety assessment and the IEC 61508 standard
Reliab Eng Sys Safety
(2004)
IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems. International...
L. Beckman
Easily assess complex safety loops
Chem Eng Prog
(2001)

There are more references available in the full text version of this article.

Cited by (87)

Analysis of simplification in Markov state-based models for reliability assessment of complex safety systems
2022, Reliability Engineering and System Safety
One limitation of the Markov model is that the exponential growth in the number of states with increase of system complexity weakens the implementability of Markov-based state model construction for complex systems and magnifies the consumption of computational resources. This paper introduces a method for simplification in Markov state-based models for reliability assessment of complex safety systems based on decomposing the target system into independent sub-systems and adopting system-level failure rates of the sub-systems estimated individually by the developed formulas. Using failure rates of the sub-systems, a simplified model of the target system can be built easily. The proposed method was used to construct the simplified model for a typical reactor protection system found in nuclear power plants. The number of states in the simplified model is greatly reduced compared to the full model developed separately. Additionally, the results of the sensitivity analysis show that the deviation of the reliabilities of the system calculated by the simplified model and the full model is not higher than 0.0021% within the range of the model parameters; and the value of the validation metric indicates that the simplified model and the full model are in a high level of agreement.
Markov and semi-Markov models in system reliability
2022, Engineering Reliability and Risk Assessment
One of the most critical factors in using systems is their reliability. The reliability study aims to help understand the probable nature of equipment failure during use and establish reliability models. Predicting the reliability of a system is related to the future of the components. Therefore, it is probable, so reliability is mainly modeled in the literature using the concepts of probabilities and stochastic processes. The Markov process usually describes the time-dependent behavior of systems. Markov and semi-Markov models have been used in various fields for reliability widely. In this chapter, reliability in different systems is first described, and then the system failure process is defined. The following is a review of the literature on reliability focusing on Markov and semi-Markov models. At the end of the chapter, the conclusions and future research trends are stated.
Modeling for evaluation of safety instrumented systems with heterogeneous components
2021, Reliability Engineering and System Safety
A novel methodology for evaluation of safety instrumented systems (SISs) with heterogeneous components based on multi-stage dynamic Bayesian networks (MDBNs) is proposed. The unified evaluation model for M-out-of-N architectures based on MDBNs is constructed. An automatic modeling method of conditional probability tables for time-slice and inter-slice transitions is established. Average probability of failure on demand (PFD_avg) and probability of failing safety (PFS) of the structure can be calculated by the established model. A method for calculating the life-cycle cost of SISs is presented by using the results of system evaluation and specific cost details. A high-integrity pressure protection system (HIPPS) is adopted to demonstrate the application of the proposed approach. A MDBNs-based evaluation and cost analysis software of heterogeneous redundant structure is developed by using MATLAB graphical user interface. The analysis of system evaluation results and life-cycle cost can help the designer determine the configuration of the heterogeneous redundant structure.
A method of railway system safety analysis based on cusp catastrophe model
2021, Accident Analysis and Prevention
For rail transit systems, safety is one of the most important goals that need to be guaranteed. We build a method of railway system safety analysis based on the cusp catastrophe model. This method describes the continuous changing process of railway system safety and considers the emergent property of safety. This method solves the problem that the outputs of most static analysis methods of railway system safety do not have continuity and avoids the complex analysis process and heavy computation burden brought by using some dynamic models such as state transition models. The quantitative case study of the continuous changing process of system safety in the evolution process of the Yong-Wen railway accident shows that the visualized outputs obtained by using this method are consistent with the actual situation. This method has the potential to be applied in the real-time monitoring of railway system safety and the early warning of train accidents.
Safety barriers: Research advances and new thoughts on theory, engineering and management
2020, Journal of Loss Prevention in the Process Industries
Citation Excerpt :
State transition models, the Markov method and Petri net (PN), are used to reflect the operations of active safety barriers, and then to analyze their integrity. The Markov method is recommended by IEC 61508 (2010) due to its flexibility and has been adopted by many researchers (e.g. Guo and Yang, 2008; Liu and Rausand, 2011, 2013; Cai et al., 2012a, 2012b; Verlinden et al., 2012; Mechri et al., 2015; Zeng and Zio, 2018). He et al. (2016) have combined RBD and the Markov method to construct a model for analyzing SISs in nuclear plants.
Safety barriers include physical and non-physical means in different industries for preventing the occurrences of hazardous events and mitigating the consequences in case they have occurred. After clarifying the relevant terminologies, this article reviews the literature in the domain of safety barriers in the recent decade, and categorizes these studies into barrier theory, barrier engineering and barrier management. Classifications of barriers, performance measures, modeling approaches and data-driven analysis for safety barriers are reviewed as parts of barrier theories. In the engineering section, the research advances are presented in accordance with design for reliability and safety, test and maintenance strategies, responses to dependent failures, and diagnosis and prognosis of degradations. Then, project and process management, human and organizational factors, and standardization and compliance management of safety barriers are summarized. Based on the review of literature, research perspectives on safety barriers for resilience, digital safety, security of barriers, utilizing data, and dealing with intelligence, are highlighted and potential challenges are mentioned. This study is therefore expected to be beneficial to the researchers of system and safety engineering, with systematically streamlining and innovatively categorizing the recent findings and insights.
Periodic surveillance test strategies to effectively enhance the availability of safety-critical systems in NPPs using the multi-state based availability model
2020, Annals of Nuclear Energy
We present effective periodic surveillance test (PST) strategies enhancing the availability of nuclear safety systems with three-channel configuration in PWR (Pressurized Water Reactor), which can be used in limited space applications. The suggested PST strategies are about testing the system with the optimal surveillance test interval, the different way depending on the system operation time and testing each module in the system using the loaded test scenarios continuously and automatically. To analyze the quantitative effects of proposed PST strategies on the availability, we develop the multi-state based availability model based on the Markov model, which exactly describes the behavior of the module as well as the system level and enables the complex interrelationship among the availability parameters to be modeled systematically. Using the developed model, the unavailability with the suggested PST strategies is reduced to 47.56% compared to with the conventional PST methods. Also we show that the availability of safety-critical systems in NPPs with three-channel configuration using proposed the test strategies can be higher than that of four-channel configuration with conventional PST methods.

View all citing articles on Scopus

View full text

Automatic creation of Markov models for reliability assessment of safety instrumented systems

Abstract

Introduction

Section snippets

Decomposing the SIS

Assumptions

Generating the framework of a Markov model

Reparations of detected failures

Incorporating CCF into the framework

Merging states

Examples

Conclusion

Acknowledgment

Reliab Eng Sys Safety

ISA Trans

Reliab Eng Sys Safety

ISA Trans

Reliab Eng Sys Safety

Reliab Eng Sys Safety

Reliab Eng Sys Safety

Easily assess complex safety loops

Chem Eng Prog