Deep Reinforcement Learning for Solving Train Unit Shunting Problem with Interval Timing

Lee, Wan-Jui; Jamshidi, Helia; Roijers, Diederik M.

doi:10.1007/978-3-030-58462-7_9

Wan-Jui Lee¹⁷,
Helia Jamshidi¹⁸ &
Diederik M. Roijers^19,20

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1279))

Included in the following conference series:

European Dependable Computing Conference

695 Accesses

Abstract

The Train Unit Shunting Problem (TUSP) is a hard combinatorial optimization problem faced by the Dutch Railways (NS). An earlier study has shown the potential to solve the parking and matching sub-problem of TUSP by formulating it as a Markov Decision Process and employing a deep reinforcement learning algorithm to learn a strategy. However, the earlier study did not take into account service tasks, which is one of the key components of TUSP. Service tasks inject additional time constraints, making it an even more challenging application to tackle.

In this paper, we formulate the time constraints of service tasks within TUSP to enable deep reinforcement learning. Using this new formalization, we compare two learning strategies, DQN and VIPS, to evaluate the most suitable one for this application. The results show that by assigning extra triggers to agents at fixed time intervals, the agent accurately learns based on VIPS to send the trains to the service tracks in time to comply with the departure schedule.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellman, R.: A Markovian decision process. J. Math. Mech. 6, 679–684 (1957)
MathSciNet MATH Google Scholar
Boysen, N., Fliedner, M., Jaehn, F., Pesch, E.: Shunting yard operations: theoretical aspects and applications. Eur. J. Oper. Res. 220(1), 1–14 (2012)
Article Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. Wiley, New York (1960)
MATH Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Peer, E., Menkovski, V., Zhang, Y., Lee, W.J.: Shunting trains with deep reinforcement learning. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE-SMC, pp. 3063–3068 (2018)
Google Scholar
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley Series in Probability and Statistics (2011)
Google Scholar
Roijers, D.M., Whiteson, S.: Multi-objective decision making. In: Synthesis Lectures on Artificial Intelligence and Machine Learning vol. 11, no. 1, pp. 1–129 (2017)
Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)
MATH Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar

Download references

Acknowledgements

This research was in part supported by funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen”.

Author information

Authors and Affiliations

R&D Hub Logistics, Dutch Railways, Utrecht, The Netherlands
Wan-Jui Lee
CiTG, TU Delft, Delft, The Netherlands
Helia Jamshidi
Microsystems Technology, HU University of Applied Sciences Utrecht, Utrecht, The Netherlands
Diederik M. Roijers
AI Research Group, Vrije Universiteit Brussel, Brussels, Belgium
Diederik M. Roijers

Authors

Wan-Jui Lee
View author publications
You can also search for this author in PubMed Google Scholar
Helia Jamshidi
View author publications
You can also search for this author in PubMed Google Scholar
Diederik M. Roijers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wan-Jui Lee .

Editor information

Editors and Affiliations

University of Zaragoza, Zaragoza, Spain
Simona Bernardi
University of Naples Federico II, Naples, Italy
Valeria Vittorini
Linnaeus University, Växjö, Sweden
Francesco Flammini
University of Reggio Calabria, Reggio Calabria, Italy
Roberto Nardone
University of Naples Federico II, Naples, Italy
Stefano Marrone
Fraunhofer IESE, Kaiserslautern, Germany
Rasmus Adler
Fraunhofer IESE, Kaiserslautern, Germany
Daniel Schneider
Fraunhofer IKS, Munich, Germany
Philipp Schleiß
Resiltech s.r.l., Pontedera, Italy
Nicola Nostro
Aalborg University, Aalborg, Denmark
Rasmus Løvenstein Olsen
University of L'Aquila, L’Aquila, Italy
Amleto Di Salle
National Institute of Aerospace, Langley Research Center, Hampton, USA
Paolo Masci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, WJ., Jamshidi, H., Roijers, D.M. (2020). Deep Reinforcement Learning for Solving Train Unit Shunting Problem with Interval Timing. In: Bernardi, S., et al. Dependable Computing - EDCC 2020 Workshops. EDCC 2020. Communications in Computer and Information Science, vol 1279. Springer, Cham. https://doi.org/10.1007/978-3-030-58462-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-58462-7_9
Published: 31 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58461-0
Online ISBN: 978-3-030-58462-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics