Abstract
In this article, we propose a feature extraction technique for decision-theoretic planning problems in partially observable stochastic domains and show a novel approach for solving them. To maximize an expected future reward, all the agent has to do is to estimate a Markov chain over a statistic variable related to rewards. In our approach, an auxiliary state variable whose stochastic process satisfies the Markov property, called internal state, is introduced to the model with the assumption that the rewards are dependent on the pair of an internal state and an action. The agent then estimates the dynamics of an internal state model based on the maximum likelihood inference made while acquiring its policy; the internal state model represents an essential feature necessary to decision-making. Computer simulation results show that our technique can find an appropriate feature for acquiring a good policy, and can achieve faster learning with fewer policy parameters than a conventional algorithm, in a reasonably sized partially observable problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Morimoto, J., Doya, K.: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems 36, 37–51 (2001)
Collins, S., Ruina, A., Tedrake, R., Wisse, M.: Efficient bipedal robots based on passive dynamic walkers. Science Magazine 307, 1082–1085 (2005)
Tesauro, G.: TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6, 215–219 (1994)
Ishii, S., Fujita, H., Mitsutake, M., Yamazaki, T., Matsuda, J., Matsuno, Y.: A reinforcement learning scheme for a partially-observable multi-agent game. Machine Learning 59, 31–54 (2005)
Singh, S., Bertsekas, D.: Reinforcement learning for dynamic channel allocation in cellular telephone systems. Advances in Neural Information Processing Systems 9, 974–980 (1996)
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 746–752 (1998)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)
Thrun, S.: Monte Carlo POMDPs. Advances in Neural Information Processing Systems 12, 1064–1070 (2000)
Yoshimoto, J., Ishii, S., Sato, M.: System identification based on on-line variational Bayes method and its application to reinforcement learning. In: Proceedings of the International Conference on Artificial Neural Networks and Neural Information Processing, vol. 2714, pp. 123–131 (2003)
Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research 13, 33–94 (2000)
Nakamura, Y., Mori, T., Ishii, S.: An off-policy natural gradient method for a partially observable Markov decision process. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 431–436. Springer, Heidelberg (2005)
Chrisman, L., Littman, M.L.: Hidden state and short-term memory. Presentation at Reinforcement learning workshop, Machine Learning Conference (1993)
Bengio, Y., Frasconi, P.: Input-output HMM’s for sequence processing. IEEE Transactions on Neural Networks 7, 1231–1249 (1996)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: Proceedings of the 10th National Conference on Artificial Intelligence, pp. 183–188 (1992)
Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1104–1111 (1995)
McAllester, D., Singh, S.: Approximate planning for factored POMDPs using belief state simplification. In: Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence, pp. 409–416 (1999)
Kitakoshi, D., Shioya, H., Nakano, R.: Analysis for adaptability of policy-improving system with a mixture model of bayesian networks to dynamic environments. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds.) KES 2005. LNCS (LNAI), vol. 3684, pp. 730–737. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fujita, H., Nakamura, Y., Ishii, S. (2006). Feature Extraction for Decision-Theoretic Planning in Partially Observable Environments. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_85
Download citation
DOI: https://doi.org/10.1007/11840817_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38625-4
Online ISBN: 978-3-540-38627-8
eBook Packages: Computer ScienceComputer Science (R0)