Markov counting and reward processes for analysing the performance of a complex system subject to random inspections

https://doi.org/10.1016/j.ress.2015.09.004Get rights and content

Highlights

  • A multi-state device is modelled in an algorithmic and computational form.

  • The performance is partitioned in multi-states and degradation levels.

  • Several types of failures with repair times according to degradation levels.

  • Preventive maintenance as response to random inspection is introduced.

  • The performance-profitable is analysed through Markov counting and reward processes.

Abstract

In this paper, a discrete complex reliability system subject to internal failures and external shocks, is modelled algorithmically. Two types of internal failure are considered: repairable and non-repairable. When a repairable failure occurs, the unit goes to corrective repair. In addition, the unit is subject to external shocks that may produce an aggravation of the internal degradation level, cumulative damage or extreme failure. When a damage threshold is reached, the unit must be removed. When a non-repairable failure occurs, the device is replaced by a new, identical one. The internal performance and the external damage are partitioned in performance levels. Random inspections are carried out. When an inspection takes place, the internal performance of the system and the damage caused by external shocks are observed and if necessary the unit is sent to preventive maintenance. If the inspection observes minor state for the internal performance and/or external damage, then these states remain in memory when the unit goes to corrective or preventive maintenance. Transient and stationary analyses are performed. Markov counting and reward processes are developed in computational form to analyse the performance and profitability of the system with and without preventive maintenance. These aspects are implemented computationally with Matlab.

Introduction

The maintenance of a deteriorating system is often costly and is sometimes subject to unscheduled interruptions. Serious damage and considerable financial losses are caused when poor reliability provokes a system failure. To avoid this situation, preventive maintenance is required. System maintenance is necessary to improve overall reliability, to prevent system failures and to maintain or increase profits. The reliability of a device is improved by corrective and preventive maintenances. In recent times, diverse contributions based on reliability theory have been proposed to enhance maintenance policies.

One action to improve the reliability of a system and to increase profits is that of preventive maintenance. In this respect, too, Nakagawa (2005) studied standard and advanced problems of maintenance policies for system reliability models. Wu et al. (2011) developed a general periodic preventive maintenance policy for a repairable revenue-generating system, proposing a general model in which both warranty contracts and system ageing losses are incorporated in the maintenance cost model. In the present paper, the implementation of preventive maintenance actions does not have to be strictly periodic.

Optimisation problems are commonly addressed in this field. Nakagawa and Mizutani (2009) converted standard infinite-horizon maintenance models to finite-horizon ones; thus, three commonly-used models of periodic replacement – with minimal repair, block replacement and simple replacement – were transformed to finite-horizon replacement models, and optimal policies for each model were analytically derived and numerically computed. Recently, Chien et al. (2012) considered a system operating over discrete time periods in which each operation period causes a random amount of damage to the system, a damage that accumulates over successive time periods. Taghipour and Banjevic (2012) proposed two optimisation models for the periodic inspection of a system with two types of components and with preventive replacement.

Markov processes are commonly-used in reliability studies, which may involve preventive maintenance, external shocks and/or inspection. Soro et al. (2010) considered a continuous-time Markov process for evaluating the availability of multi-state degrading systems receiving minimal repairs and imperfect preventive maintenance. Chen et al. (2003) proposed a state-time-dependant preventive maintenance policy for a multi-state deteriorating system which was given regular inspections. Chakravarthy (2012) conducted a steady-state analysis of a system where the system is subject to external shocks, which cause the system to deteriorate and possibly fail. A method based on a Markov process is considered to evaluate non-repairable three-state systems reliability in Guilani et al. (2014). Markov theory have also considered to model statistical dependence between two systems or components in reliability theory. Mytalas and Zazanis (2014) considered Markov-dependant multi-type sequences and studied various kinds of runs by examining additive functionals based on state visits and transitions in an appropriately constructed Markov chain. Eryilmaz (2014) provided a way for modelling s-dependence between two multi-state components. Ramírez-Cobo et al. (2014) considered the Markovian arrival process to model dependant and non exponentially distributed observations.

Nowadays, multi-state systems are of particular importance in ensuring reliability. Multi-state systems with different structures is at the centre of attention due to the wide applications in engineering. Most texts on reliability theory analyse systems in which the units perform in terms of traditional binary models, but many real-life systems, termed multi-state systems, are composed of multiple components with different performance levels and incorporating several failure modes. Lin et al. (2015) introduced, in a multi-state system, two mutually exclusive types of random shocks, extreme and cumulative, where the random shocks were independent of the degradation process, but they can influence the degradation process. Li and Peng (2014) proposed an analytical approach to calculate the system availability and the operation cost in multi-state series–parallel system. Li et al. (2014) proposed an analytical method based on multi-state multi-valued decision diagram for computing the integrated importance measure values. A dynamic model is developed by Eryilmaz and Bozbulut (2014) for the availability assessment of multi-state weighted k-out-of-n systems. Faghih-Roohi et al. (2014) also analised a multi-state weighted k-out-of-n systems. The availability and capacity for the components of such systems, where states are allowed to be changed over time, are optimised by genetic algorithms.

When complex systems are modelled, intractable expressions are often encountered. One class of probability distributions that makes it possible to model complex systems with well structured results, thanks to its matrix-algebraic form, is the phase-type (PH) distribution, which was introduced and analysed in detail by Neuts (1981). Due to their valuable properties, many varieties of this class of distributions distribution have been considered, in diverse branches of science and engineering, and applied in reliability studies. Neuts (1975) pointed out that any discrete distribution with finite support is a discrete PH distribution with a corresponding representation. These characteristics account for the widespread use of PH distributions in stochastic modelling. In this respect, Pérez-Ocón and Ruiz-Castro (2004) used PH distributions to model several reliability systems that evolve in continuous-time.

Rewards can also be introduced in a Markovian structure, and this is of interest because Markov reward processes can accurately model practical systems that evolve stochastically over time. A Markov reward process consists of a Markov environment and an associated reward structure. Li (2010) studied Markov reward processes for an irreducible continuous-time level-dependant quasi-birth-and-death process, with finitely-many levels or infinitely-many levels, and provided an introduction to the Markov reward processes. Li and Peng (2014) used Markov reward model to calculate the operation cost associated with a multi-state series–parallel system.

Reliability systems are usually studied in the continuous case; nevertheless, not all systems can be continuously monitored, and some can only be observed at certain times. This may be due to limitations such as inspection intervals and/or the inner structure of the system. Discrete reliability systems have been considered for analysing the behaviour of devices in fields such as electronic engineering. Discrete PH distributions have also been considered in the reliability field, obtaining computationally well structured results. Huang and Yuan (2010) modelled a two-stage preventive maintenance policy for a multi-state deterioration system under periodic inspection and with multiple candidate actions for preventive maintenance, using a multi-state discrete time Markov chain. Recently, Ruiz-Castro and Fernández-Villodre (2012) described the behaviour of a complex warm discrete standby system with loss of units.

This paper extends previous studies in this area in terms of the following contributions made. The behaviour of a system composed of a device that is subject to internal failures, repairable or non-repairable, and/or exposed to external shocks is analysed. The system evolves or it is observed in discrete time. The internal performance of the device is subject to degradation. The system occupies several internal degradation levels (minor and majors), each of which is partitioned into different damage states. External shocks can occur and when they happen, the internal performance of the unit can be aggravated, the unit may suffer a non-repairable failure (extreme failure) and it can undergo damage (called external damage). Several types of repair are considered: corrective repair, which is carried out when an internal repairable failure occurs, and preventive maintenance, which is performed in response to random inspections. This is the most general situation; a particular case of random inspection is periodic inspection. When an inspection takes place, if any significant internal or external damage is observed, the device is sent for preventive maintenance. If the device goes to preventive maintenance for the internal and/or external damage or a corrective internal failure occurs, the unit is sent to the repair facility; the minor internal/external damage, if any, is saved in memory, but not repaired. Different preventive and corrective repair times are assumed for the internal degradation levels and for external damage observed. When the device undergoes a non-repairable failure, internal or external, it is replaced by a new, identical one. Rewards and costs are introduced according to the corrective and/or preventive maintenance performed and the damage suffered by the device. One of the main objectives of this paper is to analyse the performance and costs of the system in an algorithmic-computational form. Accordingly, we focus on Markov counting and reward processes, which are introduced in order to analyse the number of times that the system is in each of the different states, the mean duration of each such period, the effectiveness of the action taken and the resulting profitability. The study is performed in transient and stationary regimes. All results are expressed in algorithmic form and are implemented computationally using Matlab. Particular cases can be obtained from the system proposed in this paper in an algorithmic form by considering this methodology.

The system described in this paper is applicable to real-life systems, in fields such as civil, computational and industrial engineering. For instance, a diesel motor must be given preventive maintenance in order to increase its reliability and decrease costs; another example is that of the maintenance of a computer server, with particular respect to its hard drives, which are subject to both internal failure and external shocks. Preventive maintenance may avoid or delay a hard drive failure.

The rest of this paper is organised as follows. The system and the state space are detailed in Section 2. The Markovian model is described in detail in Section 3. In Section 4 we obtain the stationary distribution in an algorithmic form. The availability and the reliability are given in Section 5. The Section 6 is focused on Markov counting processes. In Section 7 rewards and costs are introduced. Markov reward processes are introduced in this section. A numerical application is given in Section 8 showing the conditions for cost-effective preventive maintenance. Conclusions are drawn in Section 9.

Section snippets

Description and modelling the system

In this section the system is described in detail and the state space is developed.

The Markovian model

The behaviour of the system is governed by a vector Markov process {Xn;n0}, with the macro-state space E defined above. The transition probability matrix has a block matrix structure. Each block contains the transition probabilities between two macro-states. It has been developed in an algorithmic and computational form. Some auxiliary matrices are considered for building several matrix blocks. Thus, the matrix Ul and Vl are square matrices of order m and d respectively, whose element (s, t)

Stationary distribution

The stationary distribution was determined algorithmically and computationally, and partitioned according to the macro-states defined in Section 2.2. It is denoted by π={π0,π1,π2,π3,π4,π5,π6,π7,π8,π9,π10}. The vector πi contains the probability of the device occupying the different phases of the macro-state i in the stationary regime. As is well known, the stationary distribution verifies the balance equations π=πPwhere P is the transition probability matrix given in (1). The block-structured

System availability and device reliability

Several classical measures associated with a reliability system are discussed in this section and determined computationally and algorithmically. We focus on the availability of the system and on the reliability of the device.

Markov counting processes

Counting processes are frequently used in analysing system reliability. These processes are used as models for counting events that occur over time. A detailed description is given by Ross (1995) and Kulkarni (1999). When a Markov chain is considered, several associated counting processes can be defined, and here we focus on these processes.

7. Rewards

It is essential to incorporate costs and rewards into a reliability system in order to assess its profitability. In this section a cost-reward vector is defined according to the phases associated with the system. In addition, Markov reward processes are defined in order to calculate the cost arising from the preventive and corrective repairs required up to a certain time, the costs of new units installed up to a certain time, the net reward obtained while the device is operational during a

Numerical application

This section presents a numerical example to illustrate the versatility of the model, modelling and analysing, in an algorithmic form, the effect of preventive maintenance introduced in a complex reliability system such as that described in this paper. When preventive maintenance is introduced, new costs are incurred and the performance of the device usually improves. The following question then arises: is it profitable from an economic standpoint? In other words, does the performance

Conclusions

A preventive maintenance policy for improving the performance of a complex device exposed to internal failure, repairable or non-repairable, and to external shocks is described and developed in an algorithmic-computational form. Preventive maintenance is considered in order to optimise the main measures of the system, and it is performed in response to inspections carried out at random intervals. The device is a multi-state unit and several degradation levels are associated with the internal

Appendix A

The computational matrix-block expressions that constitute the transition probability matrix given in (1) are shown.

Transition from E0B00=TLM+U1TLM0η,B01=TWL0γωM+U1TWL0γωV1M0η,B02=(B02(2),,B02(ρ)),B02(l)=Ul(eT0)LM0βpr0,l,l=2,,rB04=(B04(2,0),,B04(ρ,0),B04(2,1),,B04(ρ,1),B04(2)),B04(l,0)=UlTWeL0γωV1M0βpr0,ll=2,,ρ,B04(l,1)=UlTWeL0γωV2eM0βpr1,ll=2,,ρ,B04(2)=U1TWL0γωV2eM0βpr2,B06=(B06(1),,B06(ρ))B06(l)=UlTr0Leεβcr0,l,l=1,,ρ,B08=(B08(1,0),,B08(ρ,0),B08(1,1),,B08(

Appendix B

The balance equations can be expressed as blocks as it is given in Section 4. It has been solved by blocks and the solution is given by πj=π0Rj, j=0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, beingR0=I,R1=[W01+W04B41+W04V45B51+W08B81+W08V89B91][IW11W14B41W14V45B51W18B81W18V89B91]1,R2=W02+R1V1,10B02,R3=R2V23,R4=W04+R1W14,R5=R4V45,R6=W06+R1V1,10B06,R7=R6V67,R8=W08+R1W18,R9=R8V89,R10=V0,10+R1V1,10,

where Vij=Bij(IBjj)1 for (i, j)=(2,3), (4,5), (6,7), (8,9), Vij=Bij(IB0,10)1for (i, j)=(0,10),

Appendix C

Given the rewards and costs in Section 7.1, the vector cEj that contains the net reward associated with the phases of the macro-state Ej is built. These vectors have the following expressions,cE0=(BC)emtεcintetε,cE10=(BCcnr)emtεcintetε,cE1=(BC)emtdεcintetdεemtcexteε,cE2=(etc2pr0+(V+Rpr02)etzpr0,2,,etcρpr0+(V+Rpr0ρ)etzpr0,ρ)׳,cE3=(etc2pr0+Vetzpr0,2,,etcρpr0+Vetzpr0,ρ)׳,cE4=(etdc2pr0+(V+Rpr02)etdzpr0,2,,etdcρpr0+(V+Rpr0ρ)etdzpr0,ρ,etc2pr1+(V+Rpr12)etzpr1,2,,etcρpr1+(V+R

Acknowledgements

This paper is partially supported by the Junta de Andalucía, Spain, under the grant FQM-307 and by the Ministerio de Economía y Competitividad, España, under Grant MTM2013-47929-P.

References (25)

Cited by (0)

View full text