Abstract
Several recent techniques for solving Markov decision processes use dynamic Bayesian networks to compactly represent tasks. The dynamic Bayesian network representation may not be given, in which case it is necessary to learn it if one wants to apply these techniques. We develop an algorithm for learning dynamic Bayesian network representations of Markov decision processes using data collected through exploration in the environment. To accelerate data collection we develop a novel scheme for active learning of the networks. We assume that it is not possible to sample the process in arbitrary states, only along trajectories, which prevents us from applying existing active learning techniques. Our active learning scheme selects actions that maximize the total entropy of distributions used to evaluate potential refinements of the networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dean, T., Kanazawa, K.: A model for reasoning about persistence and causation. Computational Intelligence 5(3), 142–150 (1989)
Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 14, pp. 1104–1113 (1995)
Feng, Z., Hansen, E., Zilberstein, Z.: Symbolic Generalization for On-line Planning. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 19, pp. 209–216 (2003)
Guestrin, C., Koller, D., Parr, R.: Max-norm Projections for Factored MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 17, pp. 673–680 (2001)
Jonsson, A., Barto, A.: Causal Graph Based Decomposition of Factored MDPs. Journal of Machine Learning Research 7, 2259–2301 (2006)
Kearns, M., Koller, D.: Efficient Reinforcement Learning in Factored MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 16, pp. 740–747 (1999)
Buntime, W.: Theory refinement on Bayesian networks. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 7, pp. 52–60 (1991)
Friedman, N., Murphy, K., Russell, S.: Learning the structure of dynamic probabilistic networks. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 14, pp. 139–147 (1998)
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20, 197–243 (1995)
Murphy, K.: Active learning of causal Bayes net structure. Technical report, Computer Science Division, University of Berkeley (2001)
Steck, H., Jaakkola, T.: Unsupervised active learning in large domains. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 18, pp. 469–476 (2002)
Tong, S., Koller, D.: Active learning for structure in Bayesian networks. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 17, pp. 863–869 (2001)
Schwartz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)
Poggio, T., Girosi, F.: Regularization Algorithms for Learning that are Equivalent to Multilayer Networks. Science 247, 978–982 (1990)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Chickering, D., Geiger, D., Heckerman, D.: Learning Bayesian networks: search methods and experimental results. In: Proceedings of Artificial Intelligence and Statistics, vol. 5, pp. 112–128 (1995)
Dietterich, T.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
Ghavamzadeh, M., Mahadevan, S.: Continuous-Time Hierarchical Reinforcement Learning. In: Proceedings of the International Conference on Machine Learning, vol. 18, pp. 186–193 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jonsson, A., Barto, A. (2007). Active Learning of Dynamic Bayesian Networks in Markov Decision Processes. In: Miguel, I., Ruml, W. (eds) Abstraction, Reformulation, and Approximation. SARA 2007. Lecture Notes in Computer Science(), vol 4612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73580-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-73580-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73579-3
Online ISBN: 978-3-540-73580-9
eBook Packages: Computer ScienceComputer Science (R0)