Abstract
In this study, in order to control partially observable Markov decision processes, we propose a novel framework called continuous state controller (CSC). The CSC incorporates an auxiliary “continuous” state variable, called an internal state, whose stochastic process is Markov. The parameters of the transition probability of the internal state are adjusted properly by a policy gradient-based reinforcement learning, by which the dynamics of the underlying unknown system can be extracted. Computer simulations show that good control of partially observable linear dynamical systems is achieved by our CSC.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aberdeen, D., Baxter, J.: Scaling Internal State Policy-Gradient Methods for POMDPs. In: Proceedings of the 19th International Conference on Machine Learning, pp. 3–10 (2002)
Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research 13, 33–99 (2000)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, pp. 157–163 (1994)
Stone, P., Veloso, M.: Multiagent Systems: A Survey from a Machine Learning Perspective. Autonomous Robotics 8(3) (2000)
Sutton, R., Barto, A.: An introduction to reinforcement learning. MIT Press, Cambridge (1998)
Taniguchi, Y., Mori, T., Ishii, S.: Reinforcement Learning for Cooperative Actions in a Partially Observable Multi-Agent System. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 229–238. Springer, Heidelberg (2007)
Thrun, S.: Monte Carlo POMDPs: Advances in Neural Information Processing Systems, vol. 12, pp. 1064–1070 (2000)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
Whitehead, S.D.: A complexity analysis of cooperative mechanisms in reinforcement leaning. In: Proc. of the 9th National Conf. on Artificial Intelligence, vol. 2, pp. 607–613 (1991)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Taniguchi, Y., Mori, T., Ishii, S. (2008). A Continuous Internal-State Controller for Partially Observable Markov Decision Processes. In: Kůrková, V., Neruda, R., Koutník, J. (eds) Artificial Neural Networks - ICANN 2008. ICANN 2008. Lecture Notes in Computer Science, vol 5163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87536-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-87536-9_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87535-2
Online ISBN: 978-3-540-87536-9
eBook Packages: Computer ScienceComputer Science (R0)