How to interpret safety critical failures in risk and reliability assessments
Introduction
Industrial failure events occurring in safety systems could, depending on the system, lead to severe consequences. However, in many situations we find the terminology referring to such events to be rather vague. For example, the term ‘safety critical failure’ could have different meanings. It depend on what ‘critical’ refers to. It could refer to an undetected unsafe state with possibility for severe consequences, or simply express that a part of the safety system has lost its functionality, i.e. the failure in itself is categorized as critical (cf [12].). There are also other interpretations.
In this article we will focus on what is the meaning of the term ‘safety critical failure’ as a subset of the failures occurring in ‘safety critical systems’. Currently, in literature and practice within the risk and reliability area, both of these two terms are open for interpretation. And there is a need for clarification.
Nevertheless, there is a common understanding that failures occurring in ‘safety critical systems’ may have safety concerns, including possible harm to humans or the environment. It thus intuitively makes sense to label the failures that could cause severe consequences as ‘safety critical’. Many publications establish such a link, although with different levels of specificity. For example, Isaksen et al. [10] associate the term with failure events that can contribute to accidents. The definition could be further extended by including system states that produce the possibility for accidents (i.e. dangerous failures). Consider a redundant system with two components A and B; for example, two emergency shutdown valves. If a failure occurs on Components A, then the impact could be very different according to the state of component B. The state of Component B could matter for whether or not the system is conditioned for severe consequences, and thus matters for the criticality.
Vinnem [34], as another example to link failure events and criticality, refer to failure events on safety-related equipment by the term ‘safety critical failures’; such as for example, if an emergency shutdown valve does not close on demand, this is considered a ‘safety critical failure’. Such a link works fine for a single component system, but is not necessarily applicable for example for the redundant system considered above. There is then a need to address the relevant system in more detail and understand how the critical and dangerous failures might occur. Such a definition is provided in Hauge and Lundteigen [2], where the term is defined as “a failure that prevents the component to perform its safety function, i.e. to bring the process to a predefined safe state”.
There are also definitions going beyond the impact of the failure events [12]. on reliability data collection, define ‘safety critical failures’ as: “critical dangerous failures that are undetected”. This interpretation gives no reference to what are the acceptable consequences, but instead focus on the ability to detect a so-called critical dangerous failure (i.e. the component cannot carry out some safety function) within the safety system. This interpretation is based on the PDS method (see for example [30]) applied within the oil and gas industry for reliability assessment of safety-instrumented systems on the Norwegian Continental Shelf (NCS).
Following IEC 61508 [8], the [18] provides a clear distinction between two types of possible failures in ‘safety critical systems’, i.e. ‘safe’ and ‘dangerous’. The ‘dangerous’ type may be referred to as ‘safety critical failures’. This is a common distinction applied in reliability calculation of safety-instrumented systems (see for example [30], [8]). Here the functionality of the equipment is having a key role and must be taken into consideration. It means that the system configuration and condition monitoring is important for whether or not for example a safety valve failure is classified as safety critical.
The interpretation given above comes in strong contrast to the alternative that claim that ‘safety critical failure’ refer to a failure in a safety system where the functionality of the valve is lost (for example [34]).
Many of the publications dealing with safety critical failures, however, fail or see no need to provide a proper clarification to the term, and seem to assume that the term is self-explanatory. It is then left to the reader to interpret the meaning, such as in for example [3], [11], [19], [20], [21], [33]. The same is the situation in for example NORSOK standard Z-008 [25] on risk based maintenance and consequence classification, where the term ‘safety critical failure’ is used several places, but defined nowhere in the standard.
If the situation was that the term ‘safety critical failure’ referred to any failure that could lead to unacceptable consequences and that there was consensus on this interpretation, then there would be no problems with the current understanding and use of the term. However as cited above, that is not the situation as reliability analysts currently interpret the term differently, and depending on the perspective a significantly different population of items and failure events could be relevant in risk or reliability analysis.
In general, we have identified four specific aspects associated with criticality where the interpretation of a ‘safety critical failure’ may differentiate. These are listed in Table 1.
In the ensuing sections of this article, these aspects are addressed in more detail, and discussed how the interpretation of the term ‘safety critical failure’ may vary. References from the oil and gas industry are used as the main industry focused upon in this article, to show how different perspectives may produce different interpretations of the term. In particular, it is discussed how misinterpretations may lead to different conclusions in decision-making, and how regulations and guidelines influence the use. Some specific advice is also given on what the recommended interpretation for use of the term in risk and reliability assessments should be.
Section snippets
How to understand the criticality in safety critical failures
Based on the four aspects listed in Table 1, several interpretations are possible for the term ‘safety critical failure’. The interpretation strongly depends on how we understand the ‘critical’ part of the term. Currently, it is not sufficiently clear what the criticality refers to when dealing with such failures. For example, it may relate to some important part of the system, or it could refer to some undesired state. All of the four aspects listed in some way influence the criticality.
Discussion
The term ‘safety critical failure’ as shown in Table 1 can have various interpretations, depending on what is ‘critical’. Section 2 of this article has also shown that the different aspects listed in the table are also open for interpretations and even a disciplinary term such as ‘safety’ may be open for discussion (cf. the definitions given in Section 2.1.1).
Nevertheless, it is considered reasonable that ‘safety critical failure’ links failure events to safety in the same manner as the
Recommended interpretation
When dealing with safety systems, a focus is to achieve successful safety performance. Meaning that any failure that lead to a significantly reduced safety performance, and thus increase the risk for hazardous events, may be considered critical to the safety system in the sense that one or several safety functions and thus the protection against such events are down.
The term ‘critical failure’ has a sound mathematical definition; that is a failure bringing the system from an up state to a down
Conclusion
A main objective of this article has been to provide some clarification regarding misinterpretations and misunderstanding when using the term ‘safety critical failure’. The main critique is that the meaning of this term is not sufficiently clear. It is an issue that relates to the importance of adequate risk communication. It also relates to different ways of interpretations with meaningful content.
Besides, there are several associated terms, such as for example ‘critical failure’, ‘dangerous
Acknowledgements
The authors are grateful to three anonymous reviewers for their useful comments and suggestions to the original version of this article.
References (36)
- et al.
Barrier management in the offshore oil and gas industry
J Loss Prev Process Ind
(2015) Software tools to support incident reporting in safety-critical systems
Saf Sci
(2002)- et al.
A framework for safety automation of safety-critical systems operations
Saf Sci
(2015) Risk indicators for major hazards on offshore installations
Saf Sci
(2010)Reliability Centered Maintenance: Implementation Made Simple
(2006)- Hauge S, Lundteigen MA. 2008. SINTEF report no. A8788. Guidelines for follow-up of Safety Instrumented Systems (SIS) in...
- et al.
A reliability model for optimization of test schemes for fire and gas detectors
Reliab Eng Syst Saf
(1994) - HSE. 2015. The Offshore Installations (Safety Case) Regulations 2005, UK S.I. 2005/3117. Health and Safety...
- IEC 60050-191:1999. Dependability and quality of service - Chapter 191: Amendment...
- IEC 60050-192:2015. International electrotechnical vocabulary - Part 192:...
Cited by (8)
Regression-based finite element machines for reliability modeling of downhole safety valves
2020, Reliability Engineering and System SafetyCitation Excerpt :The authors argued that such environments have grown regarding complexity, and exhaustive simulation processes are time-consuming, requiring near-optimal solutions for such purpose (e.g., nature-inspired optimization algorithms). Selvik and Signoret [12] stressed the importance of understanding and interpreting critical safety failures, which end up influencing the decision-making process. The oil and gas industry also features works related to reliability and prevention of failures.
Designing a bio-fuel network considering links reliability and risk-pooling effect in bio-refineries
2018, Reliability Engineering and System SafetyCitation Excerpt :Almost completely reliability optimization models suppose that failures of factors are s-independent but their research did not execute this assumption. Selvik and Signoret [35] pointed that different interpretations of the term ‘safety critical failure’ exist. Their article indicated in general to the importance of adequate risk communication when using the term, and gives some clarification on interpretation in risk and reliability assessments.
Industry Application and Benefits of ISO/TR 12489 for Reliability Modelling and Calculation of Safety Systems
2023, Proceedings of the Annual Offshore Technology ConferenceFunctional Safety Related Modelling and Calculations
2021, Springer Series in Reliability EngineeringDefinition of reliability and maintenance concepts in oil and gas–validity aspects
2020, Safety and Reliability
- 1
France: Project leader of ISO/TR 12489.