Elsevier

Information Sciences

Volume 570, September 2021, Pages 708-721
Information Sciences

Deep reinforcement learning with reference system to handle constraints for energy-efficient train control

https://doi.org/10.1016/j.ins.2021.04.088Get rights and content

Abstract

Train energy-efficient control involves complicated optimization processes subject to constraints such as speed, time, position and comfort requirements. Conventional optimization techniques are not apt at accumulating numerous solution instances into decision intelligence by learning for consecutively confronted new problems. Deep reinforcement learning (DRL), which can directly output control decisions based on current states, has shown great potentials for next-generation intelligent control. However, if the DRL is directly applied to energy-efficient train control, the received results are almost unsatisfactory. The reason lies in that the agent may get into confusion about how to trade off those constraints, and spend great computational time performing a large number of meaningless explorations. This article attempts to propose an approach of DRL with a reference system (DRL-RS) for proactive constraint handling, where the reference system deals with checking and correcting the agent’s learning progresses to avoid stepping farther and farther onto the erroneous road. The proposed approach is evaluated by the numerical experiments on train control in metro lines. The experimental results demonstrate that the DRL-RS can achieve faster learning convergence, compared with the directly applied DRL. Furthermore, it is possible to reduce more energy consumption than the commonly used genetic algorithm.

Introduction

Railway transportation is one of the major energy demand sectors, and various measures have been taken to save energy in this sector [1]. Especially, urban rail transit (URT) has gradually become the backbone of public transportation systems in a city, as a convenient and environmentally friendly transportation tool [2]. With the booming growth of URT, the huge electric energy consumption has become an increasingly prominent problem, and energy saving and emission reduction is inevitably imperative. The railroad industry continues to look at ways to improve the safety and energy efficiency of railway transport systems to maintain sustainable development, which depends on advanced train control equipment. The automatic train operation (ATO) system equipped in a train is capable of performing automatic control on a train by tracking target speed profiles and adjusting train control strategies. To enhance the performance of ATO, the design of high-level energy-efficient speed profiles for train operation control is an important function for the onboard control equipment.

The energy-efficient train control was initiated on the basis of optimal control theory, in particular, the Pontryagin maximum principle [3], [4], [5], [6], [7], [8]. The optimized control sequence is a combination of distinct phases such as maximum traction, cruising, coasting and maximum braking during train operations. Cheng and Howlett [3] proposed an approach to generate a feasible driving strategy towards optimality by adjusting adjoint parameters. Khmelnitsky [4] brought forward an approach to obtain energy-efficient trajectories on variable gradient profiles subject to arbitrary speed restrictions using complementary optimality conditions. Liu and Golovitcher [5] developed an algorithm to obtain analytical information for optimal operation states and their sequences in optimal speed profiles. Howlett et al. [6] presented key equations and critical speeds to calculate key switching points so as to get a satisfactory and feasible solution for energy saving. Albrecht et al. [7], [8] put up a generalized model for optimal train control and proved the existence of optimal switching points. Considering piecewise constant gradients, the algebraic formulae for adjoint variables were established, the general bounds on the positions of optimal switching points were found out, and the integral forms of necessary conditions were analyzed for optimal switching.

Various intelligent algorithms were applied to obtain the optimal combination sequence and exact switching points of driving regimes for different operation situations and train types [9]. Chang and Sim [10] made an attempt of utilizing genetic algorithm (GA) to solve the train trajectory optimization problem, with a fitness function being the weighted sum of energy consumption, operation punctuality and riding comfort. Wong and Ho [11] proposed a hierarchical genetic algorithm to determinate the appropriate number of coasting points through inspecting their corresponding train movement performance. Acikbas and Soylemez [12] put up a combined application of artificial neural networks (NNs) and GA to find optimal coasting points for multiple trains on multiple lines, taking regenerative braking energy into account. Sicre et al. [13] presented an approach of GA with fuzzy parameters to realize energy-efficient railway traffic regulation when a significant delay occurs, sufficiently utilizing regenerative energy generated from train braking. Other intelligent optimization algorithms, such as dual heuristic programming (DHP) [14], fuzzy optimization [15], [16], ant colony optimization (ACO) [17], particle swarm optimization (PSO) [18], [19], and regression tree [20], were also employed to handle the design and analysis of energy-efficient train speed profiles.

The true energy consumption depends on the realistic train control tracking performance [15], [16]. How to make the ATO adopt energy-efficient control actions is an intelligent decision problem based on current train running states. In recent years, deep reinforcement learning (DRL), combing the advantages of deep learning and reinforcement learning, is considered to be one of the most promising technologies in machine learning [21], [22]. It has shown great potential in practical engineering applications, especially in the control of intelligent systems [23]. Belletti et al. [24] presented a DRL approach to achieve adaptive and robust control for traffic management systems, and developed a novel multi-agent control algorithm for large cyber-physical systems. Li et al. [25] proposed an improved deep Q-network (DQN) strategy to learn an effective human–machine cooperative driving scheme, which can avoid potential pedestrian crossing collisions, and thereby achieve higher security. By applying two replay memory buffers, the learning process of the optimal driving policy can be shortened. Tong et al. [26], [27] proposed the DRL-based algorithms to deal with the issues of task scheduling and resource allocation in the cloud computing and mobile edge computing environments, respectively. Martinez et al. [28] developed a DRL algorithm to realize an adaptive decision on the early classification of temporal sequences. Wang et al. [29] brought forward a DRL-based method for dynamic resource allocation in network slicing to respond to the huge challenges posed by massive data in various applications. Wu et al. [30] proposed a learning method of dueling DQN, autonomous navigation and obstacle avoidance (ANOA), for unmanned surface vehicles. Other practical engineering applications of DRL to communications [31], power systems [32] and so on, also show the inspiring abilities of DRL.

However, as far as we know, there are few works focusing on the DRL approach applied to the energy-efficient trajectory optimization of train ATO systems. Train operation control is a multi-objective optimization problem concerning safety, punctuality, passenger comfort and energy saving, which is uneasy for an agent to learn. To support complex functionalities, train control systems may involve more intricate constraints. Constraint handling in DRL is generally incorporated into the reward function design. For example, Li et al. [33] designed a reward function of DRL in motion planning of free-float dual-arm space manipulators, to avoid the violations of end-effector velocity constraints and the self-collision between two arms, besides controlling the distance to the capturing point. Attention has also been paid to safe explorations in DRL to shorten learning time and prevent damage outcomes in case of constraint violations for on-line practical applications. Kou et al. [34] directly added a safety layer on the top of the actor network for one kind of DRL, deep deterministic policy gradient (DDPG). This safety layer copes with a constrained optimization problem to guarantee voltage constraints in power distribution networks. Andersen et al. [35] proposed a dreaming variational autoencoder (DVAE) for safe learning of DQN, which involves the risk evaluation on an action together with the safe predictive model. The applications of DRL to practical systems may entail the interferences to modulate learning processes for proactive constraint satisfactions. From a new perspective regarding learning frameworks, this paper attempts to propose a DRL with a reference system (DRL-RS) to proactively handle constraints for energy-efficient train control. We mainly focus on the extension of two typical DRL approaches, i.e., DQN and DDPG. The reference system for DRL acts as an expert intervening in agent and environment interactions to improve action selection policy.

The remainder of this paper is organized as follows. In Section 2, the problem of energy-efficient train operations is formulated. Section 3 presents the entire methodology of DRL-RS. Section 4 demonstrates the numerical experiments and results. Finally, the conclusions are drawn in Section 5.

Section snippets

Problem statement

The external forces that a train is subject to include tractive or braking force and additional operational resistances. The basic equation of train movement can be described bymd2xidt2=Ui-r(vi)-mgsin(θ(xi))dxidt=viwhere m is the train mass. xi and vi are the train position and speed at current instant i, respectively. Ui is the external tractive or braking force at current instant i. r(vi) is the basic operational resistance, generally described by the Davis equation r(vi)=a+bvi+cvi2 [36],

DRL-RS structure

The external force, current speed, current position, and current running time are the four essential variables that are related to and restricted by each another in train operation control processes. The speed and distance are the cumulative results of external forces over time on a train. In turn, the choice of an external force is also influenced by the states of the other three variables. Moreover, when a train reaches the destination, the speed should reach a specified value such as zero,

Simulation conditions

In order to evaluate the proposed DRL-RS algorithms, we collected practical data from a metro line to perform numerical simulations. These data include train parameters and railway line data. Train parameters include train mass m = 2.8 × 105 Kg, train length Ltr = 110 m, minimum unit-mass braking force Umin = -1 m/s2, maximum unit-mass tractive force Umax = 0.8 m/s2, and running resistance coefficients a = 2.75, b = 1.4 × 10-2, c = 7.5 × 10-4. Railway line data include stations, line length,

Conclusions

In this study, a DRL-RS has been proposed for energy-efficient train control to deal with complicated constraints. The role of the RS is to judge whether the actions and their accumulations will bring about constraint dissatisfactions. The reward-based RL at the aspect of dealing with constraints is a little identical to the penalty function approach to compromise multiple objectives. Therefore, a proactive constraint handling system is incorporated into DRL to adjust learning directions and

CRediT authorship contribution statement

Mengying Shang: Software, Investigation, Writing - original draft. Yonghua Zhou: Conceptualization, Writing - review & editing, Supervision. Hamido Fujita: Conceptualization, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was financially supported by the Beijing Natural Science Foundation under Grant L191017, the National Natural Science Foundation of China under Grant 61673049, and the Fundamental Research Funds for the Central Universities of China under Grant 2020YJS007.

References (42)

  • J. Yin et al.

    Smart train operation algorithms based on expert knowledge and ensemble CART for the electric locomotive

    Knowl. Based Syst.

    (2016)
  • J. Li et al.

    Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving

    Inf. Sci.

    (2020)
  • Z. Tong et al.

    A scheduling scheme in the cloud computing environment using deep Q -learning

    Inf. Sci.

    (2020)
  • Z. Tong et al.

    Adaptive computation offloading and resource allocation strategy in a mobile edge computing environment

    Inf. Sci.

    (2020)
  • H. Wang et al.

    Data-driven dynamic resource scheduling for network slicing: A deep reinforcement learning approach

    Inf. Sci.

    (2019)
  • P. Andersen et al.

    Towards safe reinforcement-learning in industrial grid-warehousing

    Inf. Sci.

    (2020)
  • Y. Yuan et al.

    A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning

    Knowl. Based Syst.

    (2019)
  • X. Qi et al.

    Deep reinforcement learning enabled self-learning control for energy efficient driving

    Transp. Res. C

    (2019)
  • S. Brandi et al.

    Deep reinforcement learning to optimise indoor temperature control and heating energy consumption in buildings

    Energy Build

    (2020)
  • Q. Gu et al.

    Energy-efficient train tracking operation based on multiple optimization models

    IEEE Trans. Intell. Transp. Syst.

    (2016)
  • J. Cheng et al.

    A note on the calculation of optimal strategies for the minimization of fuel consumption in the control of trains

    IEEE Trans. Autom. Control.

    (1993)
  • Cited by (28)

    • Value function factorization with dynamic weighting for deep multi-agent reinforcement learning

      2022, Information Sciences
      Citation Excerpt :

      Multi-agent reinforcement learning (MARL) has emerged as an extremely active research field in the last decade and has shown excellent success in the multi-agent system [1,2]. In recent years, deep reinforcement learning has been comprehensively studied and has accomplished beyond human-level performances in challenging games [3–5] and applications [6–8]. At the same time, the application of deep multi-agent reinforcement learning to tackle complex and large-scale problems has achieved considerable attention.

    • A model-based hybrid soft actor-critic deep reinforcement learning algorithm for optimal ventilator settings

      2022, Information Sciences
      Citation Excerpt :

      Reinforcement learning (RL) methods are a series of ML algorithms that have become very popular in recent years. RL algorithms learn a policy by maximizing expected cumulative rewards, which outperform other traditional ML and ANN-based algorithms in optimizing sequential decision problems [6,7]. A quantity of RL algorithms has been designed for prediction tasks, which can be roughly categorized into two groups: action value algorithms and direct policy search RL methods.

    • Data-driven models for train control dynamics in high-speed railways: LAG-LSTM for train trajectory prediction

      2022, Information Sciences
      Citation Excerpt :

      To realize energy-efficient train control, Shang et al. investigated a deep learning approach with a reference system (DRL-RS) for proactive constraint handling, where the reference system checks and corrects the learning progress of the agent. Compared with the DRL algorithm and genetic algorithm, DRL-RS can converge more quickly and significantly reduce the energy consumption of trains [31]. Although these data-driven models have noticeably improved the accuracy and flexibility of physical TCMs, they still have the following drawbacks.

    View all citing articles on Scopus
    View full text