Routing in Reinforcement Learning Markov Chains

Moll, Maximilian; Weller, Dominic

doi:10.1007/978-3-031-08623-6_60

Maximilian Moll²⁰ &
Dominic Weller²⁰

Part of the book series: Lecture Notes in Operations Research ((LNOR))

Included in the following conference series:

International Conference on Operations Research

592 Accesses

Abstract

With computers beating human players in challenging games like Chess, Go, and StarCraft, Reinforcement Learning has gained much attention recently. The growing field of this data-driven approach to control theory has produced various promising algorithms that combine simulation for data generation, optimization, and often bootstrapping. However, underneath each of those lies the assumption that the problem can be cast as a Markov Decision Process, which extends the usual Markov Chain by assigning controls and resulting rewards to each potential transition. This assumption implies that the underlying Markov Chain and the reward, the data equivalent of an inverse cost function, form a weighted network. Consequently, the optimization problem in Reinforcement Learning can be translated to a routing problem in such possibly immense and largely unknown networks. This paper analyzes this novel interpretation and provides some first approaches to its solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Synchronisms Using Reinforcement Learning as an Heuristic

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

References

Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Cui, X., Shi, H.: A*-based pathfinding in modern computer games. Int. J. Comput. Sci. Netw. Secur. 11(1), 125–130 (2011)
Google Scholar
Felner, A., Stern, R., Ben-Yair, A., Kraus, S., Netanyahu, N.: PHA*: finding the shortest path with A* in an unknown physical environment. J. Artif. Intell. Res. 21, 631–670 (2004)
Article Google Scholar
Hu, Y., Yao, Y., Lee, W.S.: A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs. Knowl. Based Syst. 204, 106244 (2020)
Article Google Scholar
Mazyavkina, N., Sviridov, S., Ivanov, S., Burnaev, E.: Reinforcement learning for combinatorial optimization: a survey. Comput. Oper. Res. 134, 105400 (2021)
Article Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Moll, M.: Towards extending algorithmic strategy planning in system dynamics modeling. In: 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), pp. 1047–1051. IEEE (2017)
Google Scholar
Moore, A.W.: Efficient memory-based learning for robot control (1990)
Google Scholar
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419), 1140–1144 (2018)
Article Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universität der Bundeswehr München, Werner-Heisenberg-Weg 39, 85577, Neubiberg, Germany
Maximilian Moll & Dominic Weller

Authors

Maximilian Moll
View author publications
You can also search for this author in PubMed Google Scholar
Dominic Weller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maximilian Moll .

Editor information

Editors and Affiliations

Department of Business Administration, University of Bern, Bern, Switzerland
Norbert Trautmann
Department of Business Administration, University of Bern, Bern, Switzerland
Mario Gnägi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moll, M., Weller, D. (2022). Routing in Reinforcement Learning Markov Chains. In: Trautmann, N., Gnägi, M. (eds) Operations Research Proceedings 2021. OR 2021. Lecture Notes in Operations Research. Springer, Cham. https://doi.org/10.1007/978-3-031-08623-6_60

Download citation

DOI: https://doi.org/10.1007/978-3-031-08623-6_60
Published: 30 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08622-9
Online ISBN: 978-3-031-08623-6
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics

Routing in Reinforcement Learning Markov Chains

Abstract

Access this chapter

Similar content being viewed by others

Synchronisms Using Reinforcement Learning as an Heuristic

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Routing in Reinforcement Learning Markov Chains

Abstract

Access this chapter

Similar content being viewed by others

Synchronisms Using Reinforcement Learning as an Heuristic

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation