A deep reinforcement learning approach for rail renewal and maintenance planning

https://doi.org/10.1016/j.ress.2022.108615Get rights and content

Highlights

  • Develop a deep reinforcement learning model for rail maintenance and renewal planning.

  • Add prioritized replay memory to give high weight to important experiences of the agent.

  • Consider both predictive and condition-based maintenance tasks and related constraints.

Abstract

Developing optimal rail renewal and maintenance planning that minimizes long-term costs and risks of failure is of paramount importance for railroad industry. However, intrinsic uncertainty, presence of constraints, and curse of dimensionality induce a challenging engineering problem. Despite the potential capabilities of Deep Reinforcement Learning (DRL), there is very limited research in the area of employing DRL methods to solve renewal and maintenance planning. Inspired by the recent advances in the area of DRL, a DRL-based approach is developed to optimize maintenance and renewal planning. This approach optimizes renewal and maintenance planning over a planning horizon by considering cost-effectiveness and risk reduction. We consider both predictive and condition-based maintenance tasks and incorporate time, resource, and related engineering constraints into the model to capture realistic features of the problem. Available historic inspection and maintenance data is used to simulate the rail environment and feed into DRL method. A Double Deep Q-Network (DDQN) is applied to overcome the uncertainty of the environment. In addition, prioritized replay memory is applied which improves the feedback from the improvement by giving high weight to important experiences of the agent. The proposed DDQN approach is applied to a Class I railroad network to demonstrate the applicability and efficiency the approach. Our analyses demonstrate that the proposed approach develops an optimal policy that not only reduces budget consumption but also improves the reliability and safety of the network.

Introduction

Rising demand of rail transportation urges efficient management of infrastructure to retain the service quality and safety of the network. Tracks are important backbone of railroad transportation which play a significant role in railroad operations safety. Optimal track renewal and maintenance planning as a class of important engineering decision-making problems, supports infrastructure management and sustainable operation of the railroad system. That is, an optimal maintenance and renewal plan aimed at minimizing the total cost of maintenance and renewal while ensuring the safety and reliability of the rail network.

Two main tasks in infrastructures management are renewal and maintenance. Renewal is required if the condition of infrastructures’ component is very poor that it cannot be restored with maintenance or maintenance task is not economical due to high cost of repair. Maintenance is the set of all activities meant to keep a system in a condition where it can perform its function [1]. In general, maintenance tasks fall into three main categories, including preventive, condition-based (predictive), and corrective.

Corrective Maintenance (CM): this type of maintenance tasks are performed to restore a failed or malfunctioned item. Corrective maintenance tasks are usually unscheduled, since system failures are mostly unpredictable and unplanned. Due to the following reasons this type of maintenance tasks usually cost more than other types of maintenance. First, as it is unplanned, planning the manpower and providing the needed parts are expensive. Second, sudden interruption of the system may cause safety and reliability issues. In case of railway, derailments might be consequences of system failure which causes causalities and damages to human and the industry. Third, unpredictable system failure cause a large amount of damages to other units and cause downtime and interruption in the network or the system.

Preventive Maintenance (PM): PM tasks are scheduled and pre-planned to reduce the consequences and probability of system failure and disruptions. Main advantage of PM is that it is planned and in most of the cases can be done in the idle time of a system. Therefore, needed components can be ordered in time and crew scheduled. In addition some of them can be done while the system is operating.

Condition-based Maintenance (CBM): CBM is also called predictive maintenance and in some studies it has been considered as a subcategory of PM. CBM tasks are performed according to the status or condition of items in the systems. In CBM, monitoring methods or other techniques are used to predict possible degradation and discover the areas that need maintenance tasks and based on the status of items required actions are taken to prevent the system failure and improve the current condition. Due to the safety requirements and economical benefits, CBM have been attracting more attention.

An optimal renewal and maintenance plan is aimed to determine when and what type of maintenance or renewal task is essential for different components of an infrastructure in a given period of time. Optimality refers to managing infrastructures conditions such as degradation, minimizing the risk of failure in the infrastructure and minimizing the cost of operation. In this context, an optimal plan provides decision makers with an appropriate policy i.e. sequence of tasks to perform over a presumed time frame.

Various approaches have been developed to solve the renewal and maintenance problem in railroad and other engineering applications. [2] conducted a comprehensive literature review on track maintenance planning and [3] reviewed the application of big data in railroad transportation.

Mixed integer programming models are one of the widely-developed methods to formulate and solve renewal and maintenance planning problem [4]. These approaches apply operations research techniques to solve mathematical formulations and obtain the optimal plan if possible. Due to unprecedented availability of data, data-driven methods are another class of approaches that are attracting more attention. These methods employ machine learning and statistical learning methods to incorporated data-related features into operations research methods [5].

Since maintenance and renewal planning naturally involve sequential decision-making, Markov Decision Process (MDP) is another class of approaches to model this problem. That is, MDP has always been a powerful mathematical framework to model sequential decision-making [6]. Moreover, linear programming formulations and dynamic programming provide effective solution methods for MDPs. Therefore, MDPs are embraced in maintenance planning and infrastructure asset management. Despite the aforementioned advantages, MDP is more compatible with low-dimensional problems where the understudy infrastructure does not contain high number of states and actions. In addition, number of components of the infrastructure increases the complexity of the problem which also make MDPs impractical [7]. In other words, large number of state, action and component result in extremely high-dimension transition matrices and increase the computational complexity of the problem. This complexity issue induces an interactable problem for conventional solution approaches. As another drawback, MDPs are not capable of reflecting all properties and dynamics of complicated infrastructure environments.

Reinforcement Learning (RL) has demonstrated capabilities in modeling complex environments and handling the relatively high dimensions [8]. In RL, which can be defined as an MDP, an agent takes sequence of actions in an environment to learn from it’s experience and maximize a reward function [9]. Classic RL methods has been applied to infrastructure maintenance and management in various engineering applications ([10], [11], [12], [13], [14], and [15]. Although RL handles relatively high dimension problems, in complex stochastic and high dimensional environment it shows some limitations such as lack of stability and diverging from optimal regions of solution space [7].

Fast-paced developments in artificial intelligence and specifically deep learning, boosted algorithmic capabilities in approaching complex problems. Similarly Deep Reinforcement Learning (DRL) demonstrated unprecedented capability of learning and solving high-dimension and complex environments [16]. DRL benefits from deep neural network architecture to uncover a valid model of the problem through interacting with the environment. Moreover, DRL enables modeling realistic physical properties of engineering system through defining environments.

DRL methods have been recently applied to maintenance planning problem in various engineering applications, although not many research has been conducted yet. Since advancements in DRL is relatively new, some of the studies employ Deep Q Networks (DQN) [16], [17] to approach maintenance planning problems. [18] proposed a DQN to optimize maintenance policies of a multi-component infrastructure. This method uses images of the components as inputs for Convolutional Neural Network (CNN) to approximate the Q-value function. Long-term pavement maintenance planning was studied in [19] to maximize the long-term life-cycle cost effectiveness. They applied a DQN model to provide a maintenance policy that maximizes the cost effectiveness. [20] proposed a DRL method to optimize the cost effectiveness of preventive maintenance in a serial production lines. To learn the optimal policy they applied DQN method which also reduced the cost of maintenance policies. [21] presented a DQN model for condition-based maintenance planning for a multi-component system. They also considered competing risk and economic dependencies in their model. A DRL method is also developed to asses the reliability of structures [22]. They designed an experiment using DRL method and proposed a reward as new function to determine the moving direction of agents.

Some other studies developed advanced DRL approaches or considered variant of MDP to model and solve maintenance optimal planning problems. [23] studied the problem of maintenance planning for a multi-state system over a finite time horizon. They formulated the problem as a MDP and employed actor–critic algorithm, which is a DRL method, to cope with the issue of high-dimensionality due to the uncountable number of states. [7] studied inspection and maintenance planning of a multi-component deteriorating system in a very large state and action space. They modeled the problem using partially observable MDP and proposed a customized multi-agent DRL method to provide efficient life-cycle policies for the system. In a similar study, [24] developed a joint framework of multi-agent DRL and partially observable MDP to minimize long-term risk and cost of inspection and maintenance of an engineering system. They also combined stochastic dynamic programming with Bayesian inference to address curse of dimensionality. [25] proposed a hierarchical coordinated reinforcement learning model to solve the maintenance planning of multicomponent systems. They used a simulation approach to model the system degradation. [26] formulated the problem of preventive maintenance and production scheduling optimization as a Markov decision process framework. They developed a reinforcement learning approach to solve the proposed model and used a simulation approach to analyze the effectiveness of the proposed approach.

Although DRL methods along with MDP developed for maintenance planning problems in some applications, to the best of our knowledge, railroad infrastructure, specifically track, maintenance and renewal has not been studied yet. We developed a DRL-based method to optimize rail maintenance and renewal planning in a certain planning horizon. To be more specific, rail renewal, periodic maintenance task, and condition based maintenance tasks are considered in this model. Both maintenance and inspection data are used as inputs for DRL method to simulate the track environment and develop indexes to represent the environment. Moreover, we incorporate time, resource and frequency constraints of maintenance into DRL method. This provides a practical scenario of rail maintenance and renewal planning. Cost-effectiveness and risk index are considered as reward in the process of DRL method. This method attempts to maximize the cost-effectiveness and minimize risk of hazard in the rail network.

In terms of DRL algorithm, we employ Double DQN (DDQN) algorithm which unlike DQN algorithm avoids overestimating the action value [27]. In case of rail renewal, action reward which depends on the recovery rate of maintenance actions is shown to be uncertain [4]. Therefore, applying DDQN will control the possible overestimation of action value. In addition, we also apply prioritize replay memory which ranks the agent’s experience in terms of importance and enables the DDQN to use important transitions very often [28].

Finally, the proposed DRL approach provides a decision-making tool for railroad agencies to optimize rail renewal and maintenance planning. Using available historic data, considering practical realization of the problem and employing advanced DRL algorithms, enables this approach to provide optimized and practical solution for railroad agencies. The remainder of this study is organized as follows. Section 2 briefly explains RL and Q-learning. The proposed methodology and the problem description are elaborated in Section 3. Section 4 presents the case study and results analysis. Finally, Section 5 draws conclusions.

Section snippets

Reinforcement learning; Q-learning

Reinforcement Learning (RL) is a subfield of machine learning that addresses the problem of sequential decision-making to maximize a reward. RL is learning what to do and how to map states to actions so as to maximize a numerical reward signals [29].

Although RL can be applied to model and solve wide variety of problems, inherently most of the problems to be solved by RL are modeled as sequential decision making process. Markov Decision Process (MDP) is a very powerful concept and method to

Problem description

We study the problem of track maintenance and renewal planning in a specific planning time. In other words the problem is to determine optimal maintenance and renewal policy for a railroad network considering cost-effectiveness and hazard related metrics. The optimal policy includes sequence of decisions that determine when and what type of maintenance or renewal actions needs to be performed in each segment of the railroad network. Main maintenance task are tamping, grinding and track renewal.

Case study and results

This section is devoted to a case study to examine the capability of the model in providing railroad agencies with an optimized maintenance and renewal plan.

Conclusion

We studied the problem of rail renewal and maintenance planning over a certain planning horizon. Considering the advancements in deep learning and Deep Reinforcement Learning (DRL) and availability of inspection and maintenance data in railroad industry, we developed a DRL-based method that introduces an optimal maintenance policy to the railroad agencies. The objective of this method is to maximize the cost-effectiveness and minimize the risk of failure in the network. We use various

CRediT authorship contribution statement

Reza Mohammadi: Methodology, Coding, Writing – original draft, Visualization. Qing He: Conceptualization, Project administration, Funding acquisition, Validation, Investigation, Writing – review & editing.

Acknowledgments

This study was partially funded by the National Natural Science Foundation of China (NSFC) under Grant No. U1934214.

References (35)

Cited by (0)

View full text