A model-based reinforcement learning approach for maintenance optimization of degrading systems in a large state space
Introduction
As modern systems are becoming highly reliable, it is difficult to obtain sufficient failure data to construct the failure-time distribution and conduct the consequent age-based maintenance optimization (Zhao, Gaudoin, Doyen, & Xie, 2019). Differently, the condition-based maintenance (CBM) schedules the maintenance actions based on the deteriorating processes of the systems or components. The advanced sensing technologies make continuous collection of degrading data relatively easier than the failure data (Skordilis & Moghaddass, 2020). As studied by Quatrini, Costantino, Di Gravio, and Patriarca (2020), in the CBM engineering of manufacturing enterprises, commonly the degradation models are first constructed based on the monitored degradation data, or based on physical degradation mechanism, or both, before using them to optimize the maintenance policy. However, the formula of degradation models is usually unknown and hard to be accurately determined for a system working in dynamic, complicated environment (Alaswad & Xiang, 2017).
For example, Alawaysheh, Alsyouf, Tahboub, and Almahasneh (2020) investigated 132 bus maintenance practitioners working in Dubai transport sector, and revealed that although CBM is the most commonly practiced in bus maintenance, the degradation formulation of systems or components remains a challenge. Liang, Liu, Xie, and Parlikad (2020) studied that, very often, the risk of the declining operating environment is not explicitly formulated in degradation models, especially when systems operate over a long period of time. Moreover, most studies on the CBM optimization of multi-state components and systems focus on the optimal maintenance actions in stationary over an infinite time horizon or in the long term. Then, the optimal actions are stationary in that they are irrelevant to the inspection times. The optimal actions are fully determined based on the degradation states of components or systems.
In this study, we investigate the maintenance optimization of a deteriorating system having a large number of degradation states over a finite planning horizon, in which the system degrades stochastically but the formula of degradation model is unknown. We recognize that such the lack of degradation formula over a finite horizon limits significantly the applications of CBM in practices (Quatrini et al., 2020). Consider situations that a system has a finite designed lifetime or that a system experiences a non-ignorable transitional period subject to a new setup, the maintenance policy needs to be scheduled over a short-to-medium horizon in a large state space, and the maintenance schedule needs to be reoptimized every certain period based on the actual, adjusted dynamic degradation process of the system.
In abovementioned situations, the stationary maintenance solutions, which are determined for the infinite horizon, are not meaningful, and the optimal actions for a particular degradation state of the system can vary at different inspection times. That is, the optimal maintenance policy depends on not only the degradation states but also the specific inspection times, as shown by Fig. 1.
The traditional optimization methods, such as dynamic programming (Huang et al., 2018, Moghaddam and Usher, 2011) and heuristic algorithm (Bülbül, Bayındır, & Bakal, 2019), are hard to be applied in scheduling maintenance for all degradation states over all the inspection times, partially because of the large sizes of system states and inspection times or the lack of theoretical support such as renewal theory (Alaswad & Xiang, 2017) that holds only for infinite horizon. Furthermore, such traditional methods require to collect all the state transition data, which is often infeasible due to the unstable stochastic behavior of system degradation process over a finite horizon.
We propose a reinforcement learning (RL) approach to determine the actions for each degradation state of the system at each inspection time over the planning horizon. The system is periodically inspected, and at each inspection time, the degradation level can be detected and maintenance decision must be made upon the degradation level. The maintenance policy is to at each inspection time select the action among three candidates – replacement, imperfect repair, and wait until the next inspection, and the objective is to minimize the total maintenance cost over the horizon. Unlike majority in the literature that assume that the formula of degradation model is known, the proposed approach can find the optimal maintenance policy in the absence of prior knowledge on the formulation of degradation models. In addition, the proposed RL approach does not need an explicit training set before the maintenance optimization; instead, it collects the training data by interacting with the system and can deal with large numbers of states. These advantages endue the RL approach with flexibility in implementation and wide application.
When the formula of degradation process is unknown, which is common in practice, we propose a customized Dyna-Q method that alternately operates the degradation learning and CBM optimization. The learned degradation pattern serves as an estimated proxy of the system degradation process, which is incorporated into the CBM optimization. Although the formula of degradation model is not required for the proposed RL approach, we show that if such the prior formula is available, the proposed approach can incorporate the priori to accelerate the CBM optimization. To demonstrate this issue, we customize a model-based acceleration and embed it into the classical Q-learning method, in which the degradation formula severs as an environment model. The availability of degradation model improves the learning efficiency by providing one-step-ahead predictions for the degradation states at each inspection time.
The rest of the paper is structured as follows. Section 2 reviews the related literature in the maintenance optimization and RL studies. Section 3 specifies the CBM optimization problem and presents the RL-based CBM approach, considering two cases that the degradation model is known and unknown. Section 4 demonstrates the numerical study. Section 5 gives conclusions.
Section snippets
Literature review
Maintenance engineering plays a key role in production systems. In the context of Industry 4.0, state-of-the-art approaches of CBM, predictive maintenance (PdM) and prescriptive maintenance (PsM) are established by considering various challenges for cyber physical production systems. Machine learning based intelligent maintenance approaches have attained increasing attention in scientific literature due to the rapid growth of data amount (Carvalho et al., 2019). Baptista et al. (2018) employed
Maintenance optimization and reinforcement learning
In this section we present the RL-based CBM optimization for a single-unit system with a single degradation process, i.e., a single unit that degrades over time. Section 3.1 details the problem of CBM scheduling and optimization under condition monitoring. Section 3.2 presents the Q-learning method with a customized model-based acceleration, denoted as Q-MA, for the case that the degradation formula is known. Section 3.3 presents the customized Dyna-Q method with two CBM-oriented improvements,
Experimental analysis
In this section, the maintenance of light-emitting diodes (LEDs) is presented to illustrate the proposed approach. Due to their versatility in applications and higher efficiency compared with other light sources, LEDs have revolutionized the lighting industry. Nevertheless, because of the diversity of working environment the optimal maintenance planning of LEDs is often constrained by the unknown degradation formulation. For example, as shown by (Ibrahim et al., 2018), the degradation of a LED
Conclusions
In this study, we develop the model-based RL approach for the CBM optimization of the degrading multi-state system over finite planing horizon. The proposed Dyna-Q approach releases the requirement of an explicit degradation model and can incorporate various maintenance actions. We also show that the formula of degradation model, if available, can speed up the RL approach by incorporating the degradation model with the model-based acceleration. It deserves highlighting that with no information
CRediT authorship contribution statement
Ping Zhang: Conceptualization; Data curation; Formal analysis; Methodology; Software; Validation; Writing – original draft, Writing – review & editing. Xiaoyan Zhu: Conceptualization; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Resources; Supervision; Validation; Writing – review & editing. Min Xie: Conceptualization; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Resources; Supervision; Validation; Writing
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China (NSFC) under grants #71971206, #71571178, #71971181 and a key project grant #71731008, and Guangdong Technology International Cooperation Project (2020A0505100024). It is also partially supported by Research Grant Council of Hong Kong under a theme-based project grant (T32-101/15-R) and a GRF (CityU 11203519), and by Hong Kong Institute for Data Science (Project No. 9360163).
References (66)
- et al.
Dynamic scheduling of maintenance tasks in the petroleum industry: A reinforcement approach
Engineering Applications of Artificial Intelligence
(2009) - et al.
A review on condition-based maintenance optimization models for stochastically deteriorating system
Reliability Engineering & System Safety
(2017) - et al.
Managing engineering systems with large state and action spaces through deep reinforcement learning
Reliability Engineering & System Safety
(2019) - et al.
A bayesian approach to modeling two-phase degradation using change-point regression
Reliability Engineering & System Safety
(2015) - et al.
Forecasting fault events for predictive maintenance using data-driven techniques and arma modeling
Computers & Industrial Engineering
(2018) - et al.
Reinforcement learning, fast and slow
Trends in Cognitive Sciences
(2019) - et al.
Smart production planning and control in the industry 4.0 context: A systematic literature review
Computers & Industrial Engineering
(2020) - et al.
Exact and heuristic approaches for joint maintenance and spare parts planning
Computers & Industrial Engineering
(2019) - et al.
Analysis of the reliability and the maintenance cost for finite life cycle systems subject to degradation and shocks
Applied Mathematical Modelling
(2017) - et al.
A condition-based maintenance of a dependent degradation-threshold-shock model in a system with multiple degradation processes
Reliability Engineering & System Safety
(2015)
A systematic literature review of machine learning methods applied to predictive maintenance
Computers & Industrial Engineering
Condition-based maintenance using the inverse gaussian degradation model
European Journal of Operational Research
Reliability analysis for dependent competing failure processes with changing degradation rate and hard failure threshold levels
Computers & Industrial Engineering
Train speed profile optimization with on-board energy storage devices: A dynamic programming based approach
Computers & Industrial Engineering
Maintenance analytics–the new know in maintenance
IFAC-PapersOnLine
Condition-based maintenance for long-life assets with exposure to operational and environmental risks
International Journal of Production Economics
A condition-based maintenance policy for degrading systems with age-and state-dependent operating cost
European Journal of Operational Research
Dynamic selective maintenance optimization for multi-state systems over a finite horizon: A deep reinforcement learning approach
European Journal of Operational Research
A procedural approach for realizing prescriptive maintenance planning in manufacturing industries
CIRP Annals
Preventive maintenance and replacement scheduling for repairable and maintainable systems using dynamic programming
Computers & Industrial Engineering
A deep reinforcement learning approach for real-time sensor-driven decision making and predictive analytics
Computers & Industrial Engineering
Real-time big data analytics for hard disk drive predictive maintenance
Computers & Electrical Engineering
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
Fuzzy early warning systems for condition based maintenance
Computers & Industrial Engineering
Condition-based maintenance for complex systems based on current component status and bayesian updating of component reliability
Reliability Engineering & System Safety
Maintenance policy for a system with a weighted linear combination of degradation processes
European Journal of Operational Research
Dynamic maintenance policy for systems with repairable components subject to mutually dependent competing failure processes
Computers & Industrial Engineering
Reliability assessment of a continuous-state fuel cell stack system with multiple degrading components
Reliability Engineering & System Safety
Predictive maintenance in the industry 4.0: A systematic literature review
Computers & Industrial Engineering
Selecting maintenance practices based on environmental criteria: a comparative analysis of theory and practice in the public transport sector in uae/dubai
International Journal of System Assurance Engineering and Management
Prima: a prescriptive maintenance model for cyber-physical production systems
International Journal of Computer Integrated Manufacturing
Prescriptive maintenance of cpps by integrating multimodal data with dynamic bayesian networks
Bayesian inference for reliability of systems and networks using the survival signature
Risk Analysis
Cited by (31)
Multi-agent deep reinforcement learning-based maintenance optimization for multi-dependent component systems
2024, Expert Systems with ApplicationsA dynamic mission abort policy for transportation systems with stochastic dependence by deep reinforcement learning
2024, Reliability Engineering and System SafetyA stochastic track maintenance scheduling model based on deep reinforcement learning approaches
2024, Reliability Engineering and System SafetyJoint maintenance and spare part ordering from multiple suppliers for multicomponent systems using a deep reinforcement learning algorithm
2024, Reliability Engineering and System SafetyData-driven and Knowledge-based predictive maintenance method for industrial robots for the production stability of intelligent manufacturing
2023, Expert Systems with ApplicationsReinforcement and deep reinforcement learning-based solutions for machine maintenance planning, scheduling policies, and optimization
2023, Journal of Manufacturing Systems