Stochastic hybrid automaton model of a multi-state system with aging: Reliability assessment and design consequences
Introduction
Traditional techniques of reliability assessment have been developed under hypotheses that simplify many real-life boundary conditions. This is probably the price paid in the early 30׳s when the reliability theory started to gain importance and, under the pressure of new technological advances in the military, maritime, Oil & Gas and aircraft industries, grew fast and rigid. Such industrial applications can never stop, work under well defined conditions and perform within narrow operating margins for the entire time of the mission. For these applications, the actual definition of reliability has offered a well-defined theoretical domain, such that the mathematics built-up around resulted elegant and robust [1]. From a practical point of view, such theory was applicable supported by the famous stratagem in engineering: think always the worst scenario.
Nowadays, the improvement of this conservative approach has brought to the conception of a new research field that encompasses many different subjects and goes under the name of performability [2].
Dependability is one of the attribute of performability; it represents an extension of reliability and deals with reliability, availability, safety and related measures of interest for a system [3], [4]. In turn, dependability assessment considers also non-functional aspects related with the functioning of a system like reconfiguration, fault tolerance, interferences or dependencies and, more generally, its dynamic evolution. These elements allow overcoming the binary nature (i.e., fault or working) when modelling the operating state of a component and relax the hypotheses of traditional reliability theory (i.e., statistical independence), giving room to more insightful analyses including performance evaluation in degraded conditions and considering the system evolution under different environmental and operational conditions. Actually, a complex system works under a multitude of different conditions whose sequence and durations can be stochastic or deterministic, therefore its operating rules and performance can change dramatically.
The need for more realistic reliability assessments started within the field of nuclear engineering, with the definition of the Dynamic Probabilistic Risk Assessment (DPRA) [5]. This class of problems, known also as Hybrid Stochastic systems [6] are characterized by the coupling of a physical deterministic model (i.e., first principles models) with a stochastic one. In this assembly, a stochastic event can trigger a change in the deterministic model and, mutually, a variation of the deterministic model modifies the operational conditions and the probability functions of the stochastic one.
In the literature, DPRA is also referred as Probabilistic Dynamics or Dynamic Reliability [7], [8], [9], [10]; the term dynamic is used to address changes in the environmental and operational conditions. Another main difference with traditional quantitative techniques [2] like Static Fault Trees (SFT) and Reliability Block Diagrams (RBD) is the possibility to model more accurately the aging effects in a system. As matter of fact, in a DPRA a component degrades only during the intervals of time in which it is operating, as opposed to conventional analyses where aging factors are usually given as inputs to the analysis.
At the state of the art, there are analytical and simulation techniques that can be used to handle a DPRA problem. Among the former, Piecewise Deterministic Markov Process (PDMP) [10], [11], [12] and Regime Switching Modelling (RSM) [13], [14] are solid mathematical frameworks able to model both aging effects and system evolution. PDMP models the aging evolution of a system with a set of differential equations while RSM the dynamic change of the system with a sequence of Continuous-Time Markov Chains (CTMC), each one describing a particular type of environmental/operational condition. An alternating renewal process governs the regime switching from a CTMC to another. Also state space modelling has been recently applied to dynamic reliability with aging; [9] offers an elegant review of the most suitable analytical methods, including Generalised Stochastic Markov Processes (GSMP), able to address several dynamic reliability behaviours such as, fault coverage, load sharing, fault coverage. Moreover, it provides a useful guideline flowchart that shows what modelling approach is best to use with respect to the problem to undertake.
With the increasing of computing power, simulations can result a valid alternative and may in fact be the most suitable approach for complex DPRA models. In particular, simulations allow the analysis of systems with non-Markovian structure, as shown in [15], for an application in the Oil & Gas sector. Simulation can be implemented with plenty of different tools, from a spreadsheet [16] to other well-known high-level tools, like Simulink [17] and can benefit of several speed up algorithms [18], [19]. Other research contributes show how hybrid stochastic models like Fluid Stochastic Petri nets (FSPN) [20] and Stochastic Activity Networks (SAN) [21], [22] can be used to implement a continuous process with stochastic features and simulate dynamic reliability problems. In fact, besides the elements of traditional Petri Nets, hybrid stochastic models present additional objects that allow the characterisation of a continuous/discrete marking and time-dependent activities. Such models are then solved using a discrete event simulation engine. Although the penetration of these modelling formalisms within the industrial fields is nowadays a fact, it must be pointed out that the flat representation of a complex Petri Net, made up of places and activities, can become large and difficult to interpret, even more when describing a continuous process typical of a mechanical or a physical system. Hierarchy is a feature that has been often used to alleviate this issue; for instance, SHARPE [23] can combine SFT, GSPN and Markov Chains, RAATSS [24] supports DFT and ATS while MÖBIUS [25], [26] offers high-level constructs (JOIN and REP) to build up composed hierarchical SAN models based on simpler atomic models which can be developed independently, replicated and joined. In particular, the construct REP allows to replicate an atomic model, while the construct JOIN permits the combination of two or more atomics on the base of a set of shared variables.
These hybrid formalisms are very powerful and general but do not offer any high-level construct for modelling systems characterized by physical and mechanical interactions. In these cases, the main drawbacks is the effort linked with the maintenance and the handover of such models. For these reasons, authors recognise the importance to make further investigation on promising hybrid modelling like Stochastic Hybrid Automaton (SHA), for the resolution of DPRA as the one shown in [6,[27], [28], [29] and the utilisation of other tools, more indicated to describe dynamic systems. Among the several attributes of dependability this work deals with the reliability assessment under a dynamic reliability point of view.
Therefore, starting from the definition of system reliability, the DPRA modelling is gradually introduced with the inclusion of the aging effects and of the dynamic changes of the working/operative conditions of a system.
This first section clarifies why analytical techniques, like PDMP, GSMP or RSM, fail the resolution of such models. Afterwards, a non-trivial case study for the reliability assessment, possessing all the characteristics of a DPRA problem will be presented to better show the dynamic multi-state nature of a DPRA model.
The attempt to use a RAMS technique model like DFT, BDMP or DBRD for solving the case study will confirm the limits of this reliability technique and highlights the need for a more powerful class of basic events (that will be defined Hybrid Basic Events, HBE) that better suits the modelling characteristics of a DPRA problem.
In this paper, the solution of a DPRA problem is tackled with the use of a Monte Carlo simulation applied to an architecture of concurrent models, based on the separation of concern [30], [31] and SHA modelling [6,[27], [28], [29]. In a recent paper [48], a formal SHA model (called SHyFTA) extending the MatCarloRE tool for dynamic reliability with DFT was introduced, but it did not consider the importance of the dimensioning activity that cannot be neglected when performing a dynamic reliability assessment. In fact, as it will be shown in this paper, the reliability of a system is strongly affected by the working and operational conditions in which the system operates and the activity of plant dimensioning (i.e., the choice of the correct system tuning) can extend the system life, reducing the aging and the wear-out of the system.
The proposed architecture offers several benefits. First, the modelling effort is reduced because it is possible to break down the original DPRA problem in two different simulation models, the physical and the stochastic, that are individually simpler to implement. Moreover, this architecture permits to describe easily the multi-state nature of a system in terms of mechanical performance, degradation, variation of independent physical variables (like temperatures, pressures and aging), change of failure and stochastic characteristics, resulting able to capture any cumulative damage behaviour.
Thus, the main contributions of this paper can be summarised as below:
- 1.
It offers a review about dynamic reliability, limits of analytical techniques and the opportunity for the adoption of a simulation Stochastic Hybrid Automaton model;
- 2.
It introduces to the hybrid basic event [48] as a general concept for the dynamic reliability modelling, able to model the multi-state nature of a component in a dynamic working and environmental condition;
- 3.
For the case study, it presents the codification of a Stochastic Hybrid Automaton that can be used as a sizing tool for finding the optimal trade-off between system reliability and performance. Moreover, it can be adopted as a reference model, alternative to the SHA solutions discussed in [6,[27], [28], [29]48].
The remainder of this paper is organised as follows: Section 2 introduces to the dynamic reliability. Section 3 presents the case study of a Data Cluster system, showing its characteristics in term of DPRA. Section 4 presents the SHA simulation model of the case study while Section 5 discusses the results of the reliability assessment with respect to several sizing configurations of the system under evaluation. Section 6 contains a discussion summarizing the advantages and drawbacks of the SHA-HPM, including the Simulink implementation offered. Finally, Section 7 provides conclusions and draws the line for future researches.
Section snippets
Literature review on dynamic probabilistic risk assessment
DPRA aims to relax the rigid hypotheses of traditional RAMS techniques, focusing on systems that operate in variable and dynamic conditions.
It can consider numerous characteristics of complex systems, such as inclusion of environmental dependencies, interactions between continuous process variables and system components, stochastic and deterministic behaviours evolving in time. As matter of fact, a component does not operate always around the nominal design operative conditions, resulting in
Case study: dynamic reliability assessment of a Data Cluster
Fig. 2 shows the lay-out of a Data Cluster installation. This system is made up of a service facility that maintains the condition required for the correct functioning of the Data Cluster. The service facility (e.g., the air conditioning system) is constituted by an Internal Unit (IU), the Air Treatment Unit (ATU), and by an External Unit (EU). The ATU permits the evaporation of the coolant refreshing the internal environment, while the EU performs the compressing and condensing of the coolant
Implementation of the SHA-HPM simulation
As discussed in the previous section, the DPRA has to be solved via simulation, using a concurrent simulation approach. It was implemented with Simulink, a block diagram environment of the Matlab suite [42]. The choice of using Simulink relies on the fact that it can be effectively coded for simulating complex concurrent models and solve dynamic systems. Complex logics can be implemented using Boolean logic blocks, Switch blocks, Memory blocks and Assertion Blocks. These latter, in particular,
Simulation campaign
At first, the reliability model of the Data Cluster system was studied with the reliability models of Fig. 4. These representations are equivalent and become valid as a result of a dimensioning process aimed to prevent the occurrence of the overwarming in the technical room where the Data Cluster is placed. Specifically, the air conditioning system is a component that engineers decide to install to improve the reliability of the Data Cluster. In fact, without the air conditioning system the
Discussion
This section contains a brief discussion about the SHA-HPM methodology and the Simulink implementation shown in this paper, providing information about modelling efforts, computational aspects and related benefits and drawbacks.
The first aspect to highlight is that the modelling effort of a SHA-HPM is lower than an analytical model. This benefit is linked with the simulation nature of the SHA-HPM that offers no limitations on the number of components, inter-dependencies, working behaviours and
Conclusions
Traditional RAMS techniques cannot be used to analyse systems featuring aging and dynamic change of boundary conditions. Dynamic reliability arises with the consciousness that the performance of a system is tightly interconnected with the failure behaviour and, consequently, a holistic design of a plant solution cannot disregard this combination of behaviours.
In this paper, a Stochastic Hybrid Automaton model has been created to assess the dynamic reliability of a multi-state system with aging,
References (48)
Risk assessment for dynamic systems: an overview
Reliab Eng Syst Saf
(1994)Probabilistic dynamics: a comparison between continuous event trees and discrete event tree model
Reliab Eng Syst Saf
(1994)- et al.
A concept paper on dynamic reliability via Monte Carlo simulation
Math Comput Simul
(1998) - et al.
Investigating dynamic reliability and availability through state–space models
Comput Math Appl
(2012) - et al.
The development and application of dynamic operational risk assessment in oil/gas and chemical process industry
Reliab Eng Syst Saf
(2010) - et al.
Dynamic fault tree resolution: a conscious trade-off between analytical and simulative approaches
Reliab Eng Syst Saf
(2011) - et al.
MatCarloRe: an integrated FT and Monte Carlo Simulink tool for the reliability assessment of dynamic fault tree
Exp Syst Appl
(2012) - et al.
System resiliency quantification using non-state-space and state-space analytic models
Reliab Eng Syst Saf
(2013) - et al.
Conception of Repairable Dynamic Fault Trees and resolution by the use of RAATSS, a Matlab® toolbox based on the ATS formalism
Reliab Eng Syst Saf
(2014) - et al.
Limiting the loss of information in KNXnet/IP on congestion conditions
Comput Netw
(2014)
State/event fault trees—a safety analysis model for software-controlled systems
Reliab Eng Syst Saf
Performability analysis of clustered systems with rejuvenation under varying workload
Perform Eval
A Weibull-based compositional approach for hierarchical dynamic fault trees
Reliab Eng Syst Saf
Algebraic determination of the structure function of Dynamic Fault Trees
Reliab Eng Syst Saf
SHyFTA, a Stochastic Hybrid Fault Tree Automaton for the modelling and simulation of dynamic reliability problems
Exp Syst Appl
Life data analysis
System reliability theory: models, statistical methods, and applications
Handbook of performability engineering
Stochastic modeling formalisms for dependability, performance and performability
Basic concepts and taxonomy of dependable and secure computing
IEEE Trans Dependable Secur Comput
Piecewise deterministic markov processes and dynamic reliability
Proc Inst Mech Eng
Modeling the evolution of system reliability performance under alternative environments
IIE Trans
Cited by (39)
Online reliability assessment of energy systems based on a high-order extended-state-observer with application to nuclear reactors
2022, Renewable and Sustainable Energy ReviewsOnline quantitative safety monitoring approach for unattended train operation system considering stochastic factors
2021, Reliability Engineering and System SafetyDynamic Reliability Assessment of PEM Fuel Cell Systems
2021, Reliability Engineering and System SafetyCitation Excerpt :The method was applied to an air conditioning system. Chiacchio et al. [30] continued the analysis of the same system but proposed using Stochastic Hybrid Automaton (SHA) to implement the stochastic events within the system. The SHA is an approach which breaks down a system into a physical and a stochastic model that are coupled together with shared variables and synchronising mechanisms.
A general framework for dependability modelling coupling discrete-event and time-driven simulation
2020, Reliability Engineering and System SafetyCitation Excerpt :In fact, SPDEs are not trivial to conceive and solve, in particular for complex dependable processes like the one characterizing the industrial systems [6]. Recent works [7-9] demonstrated the effectiveness of Stochastic Hybrid Automaton models (SHA) for the analysis of DPRA problems of complex systems. SHA models are characterized by a combination of discrete and continuous states [10]: the evolution of the system in each state is modelled with the mathematical equations of the system in that specific state.
Procedures to model and solve probabilistic dynamic system problems
2019, Reliability Engineering and System SafetyDynamic artificial neural network-based reliability considering operational context of assets.
2019, Reliability Engineering and System Safety