Abstract
When using deep reinforcement learning (DRL) to solve train operation control in urban railways, encounter complex and dynamic environments with sparse rewards. Therefore, it is crucial to alleviate the negative impact of sparse rewards on finding the optimal trajectory. This paper introduces a novel algorithm called Allocating Spare Time and Planning Speed Intervals (ASTPSI), which can reduce the blindness of exploration dramatically of intelligent train agents under sparse rewards when using DRL and significantly improve their learning efficiency and operation quality. The ASTPSI can generate real-time train trajectories that meet the requirements by combining different DRL algorithms. To evaluate the algorithm’s performance, we verified the convergence rate of the ASTPSI-DRL to optimize train trajectories in the face of sparse rewards on a real track. ASTPSI-DRL has better performance and stability than genetic algorithms and original DRL algorithms in reducing train energy consumption, punctuality, and accurate stopping.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albrecht, A.R., Howlett, P.G., Pudney, P.J., Vu, X.: Energy-efficient train control: from local convexity to global optimization and uniqueness. Automatica 49(10), 3072–3078 (2013)
Cao, Y., Zhang, Z., Cheng, F., Shuai, S.: Trajectory optimization for high-speed trains via a mixed integer linear programming approach. IEEE Trans. Intell. Transp. Syst. 23(10), 17666–17676 (2022)
Chen, J., et al.: Integrated regenerative braking energy utilization system for multi-substations in electrified railways. IEEE Trans. Industr. Electron. 70(1), 298–310 (2022)
Deng, K., et al.: An adaptive PMP-based model predictive energy management strategy for fuel cell hybrid railway vehicles. eTransportation 7, 100094 (2021)
Dong, H., Ning, B., Cai, B., Hou, Z.: Automatic train control system development and simulation for high-speed railways. IEEE Circuits Syst. Mag. 10(2), 6–18 (2010)
Howlett, P.: An optimal strategy for the control of a train. ANZIAM J. 31(4), 454–471 (1990)
Howlett, P.G., Pudney, P.J., Vu, X.: Local energy minimization in optimal train control. Automatica 45(11), 2692–2698 (2009)
Ladosz, P., Weng, L., Kim, M., Oh, H.: Exploration in deep reinforcement learning: a survey. Inf. Fusion 85, 1–22 (2022)
Liu, R.R., Golovitcher, I.M.: Energy-efficient operation of rail vehicles. Transp. Res. Part A: Policy Pract. 37(10), 917–932 (2003)
Liu, W., Shuai, S., Tang, T., Wang, X.: A DQN-based intelligent control method for heavy haul trains on long steep downhill section. Transp. Res. Part C: Emerg. Technol. 129, 103249 (2021)
Lu, M., Ou, D., Hua, Z., Gu, L.: Analysis of stopping accuracy deviation of urban rail transit train in ATO driving Mode. In: Qin, Y., Jia, L., Liang, J., Liu, Z., Diao, L., An, M. (eds.) Proceedings of the 5th International Conference on Electrical Engineering and Information Technologies for Rail Transportation (EITRT) 2021. EITRT 2021. LNEE, vol. 868. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-9913-9_72
Ning, L., Zhou, M., Hou, Z., Goverde, R.M.P., Wang, F.-Y., Dong, H.: Deep deterministic policy gradient for high-speed train trajectory optimization. IEEE Trans. Intell. Transp. Syst. 23(8), 11562–11574 (2021)
Shang, M., Zhou, Y., Fujita, H.: Deep reinforcement learning with reference system to handle constraints for energy-efficient train control. Inf. Sci. 570, 708–721 (2021)
Xiao, Z., Wang, Q., Sun, P., You, B., Feng, X.: Modeling and energy-optimal control for high-speed trains. IEEE Trans. Transp. Electrification 6(2), 797–807 (2020)
Zhang, L., Zhou, M., Li, Z., et al.: An intelligent train operation method based on event-driven deep reinforcement learning. IEEE Trans. Industr. Inf. 18(10), 6973–6980 (2021)
Zhou, K., Song, S., Xue, A., You, K., Hui, W.: Smart train operation algorithms based on expert knowledge and reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. 52(2), 716–727 (2020)
Zhu, Q., Shuai, S., Tang, T., Liu, W., Zhang, Z., Tian, Q.: An eco-driving algorithm for trains through distributing energy: a Q-learning approach. ISA Trans. 122, 24–37 (2022)
Zhuang, D., Gan, V.J.L., Tekler, Z.D., Chong, A., Tian, S., Shi, X.: Data-driven predictive control for smart HVAC system in IoT-integrated buildings with time-series forecasting and reinforcement learning. App. Energy 338, 120936 (2023)
Acknowledgment
This work was supported by the Graduate Student Research Innovation Program of Chongqing Jiaotong University (CYS23515).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, H., Xian, G. (2024). ASTPSI: Allocating Spare Time and Planning Speed Interval for Intelligent Train Control of Sparse Reward. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14447. Springer, Singapore. https://doi.org/10.1007/978-981-99-8079-6_6
Download citation
DOI: https://doi.org/10.1007/978-981-99-8079-6_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8078-9
Online ISBN: 978-981-99-8079-6
eBook Packages: Computer ScienceComputer Science (R0)