Active Learning of Dynamic Bayesian Networks in Markov Decision Processes

Jonsson, Anders; Barto, Andrew

doi:10.1007/978-3-540-73580-9_22

Anders Jonsson¹ &
Andrew Barto²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4612))

Included in the following conference series:

International Symposium on Abstraction, Reformulation, and Approximation

732 Accesses
9 Citations

Abstract

Several recent techniques for solving Markov decision processes use dynamic Bayesian networks to compactly represent tasks. The dynamic Bayesian network representation may not be given, in which case it is necessary to learn it if one wants to apply these techniques. We develop an algorithm for learning dynamic Bayesian network representations of Markov decision processes using data collected through exploration in the environment. To accelerate data collection we develop a novel scheme for active learning of the networks. We assume that it is not possible to sample the process in arbitrary states, only along trajectories, which prevents us from applying existing active learning techniques. Our active learning scheme selects actions that maximize the total entropy of distributions used to evaluate potential refinements of the networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dean, T., Kanazawa, K.: A model for reasoning about persistence and causation. Computational Intelligence 5(3), 142–150 (1989)
Article Google Scholar
Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 14, pp. 1104–1113 (1995)
Google Scholar
Feng, Z., Hansen, E., Zilberstein, Z.: Symbolic Generalization for On-line Planning. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 19, pp. 209–216 (2003)
Google Scholar
Guestrin, C., Koller, D., Parr, R.: Max-norm Projections for Factored MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 17, pp. 673–680 (2001)
Google Scholar
Jonsson, A., Barto, A.: Causal Graph Based Decomposition of Factored MDPs. Journal of Machine Learning Research 7, 2259–2301 (2006)
Google Scholar
Kearns, M., Koller, D.: Efficient Reinforcement Learning in Factored MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 16, pp. 740–747 (1999)
Google Scholar
Buntime, W.: Theory refinement on Bayesian networks. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 7, pp. 52–60 (1991)
Google Scholar
Friedman, N., Murphy, K., Russell, S.: Learning the structure of dynamic probabilistic networks. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 14, pp. 139–147 (1998)
Google Scholar
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20, 197–243 (1995)
MATH Google Scholar
Murphy, K.: Active learning of causal Bayes net structure. Technical report, Computer Science Division, University of Berkeley (2001)
Google Scholar
Steck, H., Jaakkola, T.: Unsupervised active learning in large domains. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 18, pp. 469–476 (2002)
Google Scholar
Tong, S., Koller, D.: Active learning for structure in Bayesian networks. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 17, pp. 863–869 (2001)
Google Scholar
Schwartz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)
Google Scholar
Poggio, T., Girosi, F.: Regularization Algorithms for Learning that are Equivalent to Multilayer Networks. Science 247, 978–982 (1990)
Article Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Chickering, D., Geiger, D., Heckerman, D.: Learning Bayesian networks: search methods and experimental results. In: Proceedings of Artificial Intelligence and Statistics, vol. 5, pp. 112–128 (1995)
Google Scholar
Dietterich, T.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
MATH Google Scholar
Ghavamzadeh, M., Mahadevan, S.: Continuous-Time Hierarchical Reinforcement Learning. In: Proceedings of the International Conference on Machine Learning, vol. 18, pp. 186–193 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Communication Technologies, Universitat Pompeu Fabra, Passeig de Circumval·lació, 8, 08003 Barcelona, Spain
Anders Jonsson
Autonomous Learning Laboratory, Department of Computer Science, University of Massachusetts, Amherst MA 01003, USA
Andrew Barto

Authors

Anders Jonsson
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Barto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ian Miguel Wheeler Ruml

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jonsson, A., Barto, A. (2007). Active Learning of Dynamic Bayesian Networks in Markov Decision Processes. In: Miguel, I., Ruml, W. (eds) Abstraction, Reformulation, and Approximation. SARA 2007. Lecture Notes in Computer Science(), vol 4612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73580-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-73580-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73579-3
Online ISBN: 978-3-540-73580-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics