Skip to main content

Deep Reinforcement Learning for Solving Train Unit Shunting Problem with Interval Timing

  • Conference paper
  • First Online:
Dependable Computing - EDCC 2020 Workshops (EDCC 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1279))

Included in the following conference series:

  • 695 Accesses

Abstract

The Train Unit Shunting Problem (TUSP) is a hard combinatorial optimization problem faced by the Dutch Railways (NS). An earlier study has shown the potential to solve the parking and matching sub-problem of TUSP by formulating it as a Markov Decision Process and employing a deep reinforcement learning algorithm to learn a strategy. However, the earlier study did not take into account service tasks, which is one of the key components of TUSP. Service tasks inject additional time constraints, making it an even more challenging application to tackle.

In this paper, we formulate the time constraints of service tasks within TUSP to enable deep reinforcement learning. Using this new formalization, we compare two learning strategies, DQN and VIPS, to evaluate the most suitable one for this application. The results show that by assigning extra triggers to agents at fixed time intervals, the agent accurately learns based on VIPS to send the trains to the service tracks in time to comply with the departure schedule.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bellman, R.: A Markovian decision process. J. Math. Mech. 6, 679–684 (1957)

    MathSciNet  MATH  Google Scholar 

  2. Boysen, N., Fliedner, M., Jaehn, F., Pesch, E.: Shunting yard operations: theoretical aspects and applications. Eur. J. Oper. Res. 220(1), 1–14 (2012)

    Article  Google Scholar 

  3. Howard, R.A.: Dynamic Programming and Markov Processes. Wiley, New York (1960)

    MATH  Google Scholar 

  4. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  5. Peer, E., Menkovski, V., Zhang, Y., Lee, W.J.: Shunting trains with deep reinforcement learning. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE-SMC, pp. 3063–3068 (2018)

    Google Scholar 

  6. Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley Series in Probability and Statistics (2011)

    Google Scholar 

  7. Roijers, D.M., Whiteson, S.: Multi-objective decision making. In: Synthesis Lectures on Artificial Intelligence and Machine Learning vol. 11, no. 1, pp. 1–129 (2017)

    Google Scholar 

  8. Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  9. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

Download references

Acknowledgements

This research was in part supported by funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wan-Jui Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, WJ., Jamshidi, H., Roijers, D.M. (2020). Deep Reinforcement Learning for Solving Train Unit Shunting Problem with Interval Timing. In: Bernardi, S., et al. Dependable Computing - EDCC 2020 Workshops. EDCC 2020. Communications in Computer and Information Science, vol 1279. Springer, Cham. https://doi.org/10.1007/978-3-030-58462-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58462-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58461-0

  • Online ISBN: 978-3-030-58462-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics