How to interpret safety critical failures in risk and reliability assessments

https://doi.org/10.1016/j.ress.2017.01.003Get rights and content

Abstract

Management of safety systems often receives high attention due to the potential for industrial accidents. In risk and reliability literature concerning such systems, and particularly concerning safety-instrumented systems, one frequently comes across the term ‘safety critical failure’. It is a term associated with the term ‘critical failure’, and it is often deduced that a safety critical failure refers to a failure occurring in a safety critical system. Although this is correct in some situations, it is not matching with for example the mathematical definition given in ISO/TR 12489:2013 on reliability modeling, where a clear distinction is made between ‘safe failures’ and ‘dangerous failures’. In this article, we show that different interpretations of the term ‘safety critical failure’ exist, and there is room for misinterpretations and misunderstandings regarding risk and reliability assessments where failure information linked to safety systems are used, and which could influence decision-making. The article gives some examples from the oil and gas industry, showing different possible interpretations of the term. In particular we discuss the link between criticality and failure. The article points in general to the importance of adequate risk communication when using the term, and gives some clarification on interpretation in risk and reliability assessments.

Introduction

Industrial failure events occurring in safety systems could, depending on the system, lead to severe consequences. However, in many situations we find the terminology referring to such events to be rather vague. For example, the term ‘safety critical failure’ could have different meanings. It depend on what ‘critical’ refers to. It could refer to an undetected unsafe state with possibility for severe consequences, or simply express that a part of the safety system has lost its functionality, i.e. the failure in itself is categorized as critical (cf [12].). There are also other interpretations.

In this article we will focus on what is the meaning of the term ‘safety critical failure’ as a subset of the failures occurring in ‘safety critical systems’. Currently, in literature and practice within the risk and reliability area, both of these two terms are open for interpretation. And there is a need for clarification.

Nevertheless, there is a common understanding that failures occurring in ‘safety critical systems’ may have safety concerns, including possible harm to humans or the environment. It thus intuitively makes sense to label the failures that could cause severe consequences as ‘safety critical’. Many publications establish such a link, although with different levels of specificity. For example, Isaksen et al. [10] associate the term with failure events that can contribute to accidents. The definition could be further extended by including system states that produce the possibility for accidents (i.e. dangerous failures). Consider a redundant system with two components A and B; for example, two emergency shutdown valves. If a failure occurs on Components A, then the impact could be very different according to the state of component B. The state of Component B could matter for whether or not the system is conditioned for severe consequences, and thus matters for the criticality.

Vinnem [34], as another example to link failure events and criticality, refer to failure events on safety-related equipment by the term ‘safety critical failures’; such as for example, if an emergency shutdown valve does not close on demand, this is considered a ‘safety critical failure’. Such a link works fine for a single component system, but is not necessarily applicable for example for the redundant system considered above. There is then a need to address the relevant system in more detail and understand how the critical and dangerous failures might occur. Such a definition is provided in Hauge and Lundteigen [2], where the term is defined as “a failure that prevents the component to perform its safety function, i.e. to bring the process to a predefined safe state”.

There are also definitions going beyond the impact of the failure events [12]. on reliability data collection, define ‘safety critical failures’ as: “critical dangerous failures that are undetected”. This interpretation gives no reference to what are the acceptable consequences, but instead focus on the ability to detect a so-called critical dangerous failure (i.e. the component cannot carry out some safety function) within the safety system. This interpretation is based on the PDS method (see for example [30]) applied within the oil and gas industry for reliability assessment of safety-instrumented systems on the Norwegian Continental Shelf (NCS).

Following IEC 61508 [8], the [18] provides a clear distinction between two types of possible failures in ‘safety critical systems’, i.e. ‘safe’ and ‘dangerous’. The ‘dangerous’ type may be referred to as ‘safety critical failures’. This is a common distinction applied in reliability calculation of safety-instrumented systems (see for example [30], [8]). Here the functionality of the equipment is having a key role and must be taken into consideration. It means that the system configuration and condition monitoring is important for whether or not for example a safety valve failure is classified as safety critical.

The interpretation given above comes in strong contrast to the alternative that claim that ‘safety critical failure’ refer to a failure in a safety system where the functionality of the valve is lost (for example [34]).

Many of the publications dealing with safety critical failures, however, fail or see no need to provide a proper clarification to the term, and seem to assume that the term is self-explanatory. It is then left to the reader to interpret the meaning, such as in for example [3], [11], [19], [20], [21], [33]. The same is the situation in for example NORSOK standard Z-008 [25] on risk based maintenance and consequence classification, where the term ‘safety critical failure’ is used several places, but defined nowhere in the standard.

If the situation was that the term ‘safety critical failure’ referred to any failure that could lead to unacceptable consequences and that there was consensus on this interpretation, then there would be no problems with the current understanding and use of the term. However as cited above, that is not the situation as reliability analysts currently interpret the term differently, and depending on the perspective a significantly different population of items and failure events could be relevant in risk or reliability analysis.

In general, we have identified four specific aspects associated with criticality where the interpretation of a ‘safety critical failure’ may differentiate. These are listed in Table 1.

In the ensuing sections of this article, these aspects are addressed in more detail, and discussed how the interpretation of the term ‘safety critical failure’ may vary. References from the oil and gas industry are used as the main industry focused upon in this article, to show how different perspectives may produce different interpretations of the term. In particular, it is discussed how misinterpretations may lead to different conclusions in decision-making, and how regulations and guidelines influence the use. Some specific advice is also given on what the recommended interpretation for use of the term in risk and reliability assessments should be.

Section snippets

How to understand the criticality in safety critical failures

Based on the four aspects listed in Table 1, several interpretations are possible for the term ‘safety critical failure’. The interpretation strongly depends on how we understand the ‘critical’ part of the term. Currently, it is not sufficiently clear what the criticality refers to when dealing with such failures. For example, it may relate to some important part of the system, or it could refer to some undesired state. All of the four aspects listed in some way influence the criticality.

Discussion

The term ‘safety critical failure’ as shown in Table 1 can have various interpretations, depending on what is ‘critical’. Section 2 of this article has also shown that the different aspects listed in the table are also open for interpretations and even a disciplinary term such as ‘safety’ may be open for discussion (cf. the definitions given in Section 2.1.1).

Nevertheless, it is considered reasonable that ‘safety critical failure’ links failure events to safety in the same manner as the

Recommended interpretation

When dealing with safety systems, a focus is to achieve successful safety performance. Meaning that any failure that lead to a significantly reduced safety performance, and thus increase the risk for hazardous events, may be considered critical to the safety system in the sense that one or several safety functions and thus the protection against such events are down.

The term ‘critical failure’ has a sound mathematical definition; that is a failure bringing the system from an up state to a down

Conclusion

A main objective of this article has been to provide some clarification regarding misinterpretations and misunderstanding when using the term ‘safety critical failure’. The main critique is that the meaning of this term is not sufficiently clear. It is an issue that relates to the importance of adequate risk communication. It also relates to different ways of interpretations with meaningful content.

Besides, there are several associated terms, such as for example ‘critical failure’, ‘dangerous

Acknowledgements

The authors are grateful to three anonymous reviewers for their useful comments and suggestions to the original version of this article.

References (36)

  • IEC 60300-3-11:2009. Dependability management - Part 3-11: Application guide – Reliability centred...
  • IEC 61508:2010. Functional safety of electrical/electronic/programmable electronic safety-related systems – All...
  • IEC 61511:2016 Functional safety - Safety instrumented systems for the process industry sector – All...
  • Isaksen U, Bowen JP, Nissanke N. 1996. System and Software Safety in Critical Systems. Computer Science Department...
  • Isermann R, Schwarz R, Stolzl S. 2002. Fault-tolerant drive-by-wire systems. IEEE Control Systems Magazine, October...
  • ISO 14224:2016. Petroleum, petrochemical and natural gas industries - Collection and exchange of reliability and...
  • ISO 11014:2009. Safety data sheet for chemical products - Content and order of...
  • ISO 19906:2010. Petroleum and natural gas industries - Arctic offshore...
  • Cited by (8)

    • Regression-based finite element machines for reliability modeling of downhole safety valves

      2020, Reliability Engineering and System Safety
      Citation Excerpt :

      The authors argued that such environments have grown regarding complexity, and exhaustive simulation processes are time-consuming, requiring near-optimal solutions for such purpose (e.g., nature-inspired optimization algorithms). Selvik and Signoret [12] stressed the importance of understanding and interpreting critical safety failures, which end up influencing the decision-making process. The oil and gas industry also features works related to reliability and prevention of failures.

    • Designing a bio-fuel network considering links reliability and risk-pooling effect in bio-refineries

      2018, Reliability Engineering and System Safety
      Citation Excerpt :

      Almost completely reliability optimization models suppose that failures of factors are s-independent but their research did not execute this assumption. Selvik and Signoret [35] pointed that different interpretations of the term ‘safety critical failure’ exist. Their article indicated in general to the importance of adequate risk communication when using the term, and gives some clarification on interpretation in risk and reliability assessments.

    • Functional Safety Related Modelling and Calculations

      2021, Springer Series in Reliability Engineering
    View all citing articles on Scopus
    1

    France: Project leader of ISO/TR 12489.

    View full text