Abstract
In this paper, we present our findings on applying Markov chain generative model to model actions of an agent in Markov decision process framework. We outlined a problem of current solutions to reinforcement learning problems that utilize the agent-environment framework. This problem arises from the necessity of performing analysis of each environment state (for example for q-value estimation in q-learning and deep q-learning methods), which can be computationally heavy. We propose a simple method of ‘skipping’ intermediate state analysis for which optimal actions are determined from analysis of some previous state and modeled by a Markov chain. We observed a problem of this approach that limits agent’s exploratory behavior by setting Markov chain’s probabilities close to either 0 or 1. It was shown that the proposed solution by \(L^1\)-normalization of transition probabilities can successfully handle this problem. We tested our approach on a simple environment of k-armed bandit problem and showed that it outperforms commonly used gradient bandit algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: Reinforcement learning through asynchronous advantage actor-critic on a GPU (2017)
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 5, 834–846 (1983). https://doi.org/10.1109/tsmc.1983.6313077
Clemente, A.V., Castejón, H.N., Chandra, A.: Efficient parallel methods for deep reinforcement learning (2017)
Kauten, C.: Super Mario Bros for OpenAI Gym. GitHub (2018). https://github.com/Kautenja/gym-super-mario-bros
Mnih, V., et al.: Playing Atari with deep reinforcement learning (2013)
Nair, A., et al.: Massively parallel methods for deep reinforcement learning (2015)
Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: Deep multitask and transfer reinforcement learning (2016)
Rusu, A.A., et al.: Policy distillation (2016)
Schmitt, S., et al.: Kickstarting deep reinforcement learning (2018)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961
Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm (2017)
Staff, N.: NES games (2010). https://web.archive.org/web/20101221005931/. http://www.nintendo.com/consumer/gameslist/manuals/nes_games.pdf
Stooke, A., Abbeel, P.: Accelerated methods for deep reinforcement learning (2019)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, pp. 25–27. MIT Press, Cambridge (2018)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, pp. 37–40. MIT Press, Cambridge (2018)
Wang, G.: A multi-armed bandit MCMC, with applications in sampling from doubly intractable posterior (2019)
Zhou, X., Xiong, Y., Chen, N., Gao, X.: Regime switching bandits (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sarnatskyi, V., Baklan, I. (2022). Markov-Chain-Based Agents for k-Armed Bandit Problem. In: Babichev, S., Lytvynenko, V. (eds) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 77. Springer, Cham. https://doi.org/10.1007/978-3-030-82014-5_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-82014-5_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82013-8
Online ISBN: 978-3-030-82014-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)