Skip to main content

Active Learning of Dynamic Bayesian Networks in Markov Decision Processes

  • Conference paper
Abstraction, Reformulation, and Approximation (SARA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4612))

Abstract

Several recent techniques for solving Markov decision processes use dynamic Bayesian networks to compactly represent tasks. The dynamic Bayesian network representation may not be given, in which case it is necessary to learn it if one wants to apply these techniques. We develop an algorithm for learning dynamic Bayesian network representations of Markov decision processes using data collected through exploration in the environment. To accelerate data collection we develop a novel scheme for active learning of the networks. We assume that it is not possible to sample the process in arbitrary states, only along trajectories, which prevents us from applying existing active learning techniques. Our active learning scheme selects actions that maximize the total entropy of distributions used to evaluate potential refinements of the networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dean, T., Kanazawa, K.: A model for reasoning about persistence and causation. Computational Intelligence 5(3), 142–150 (1989)

    Article  Google Scholar 

  2. Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 14, pp. 1104–1113 (1995)

    Google Scholar 

  3. Feng, Z., Hansen, E., Zilberstein, Z.: Symbolic Generalization for On-line Planning. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 19, pp. 209–216 (2003)

    Google Scholar 

  4. Guestrin, C., Koller, D., Parr, R.: Max-norm Projections for Factored MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 17, pp. 673–680 (2001)

    Google Scholar 

  5. Jonsson, A., Barto, A.: Causal Graph Based Decomposition of Factored MDPs. Journal of Machine Learning Research 7, 2259–2301 (2006)

    Google Scholar 

  6. Kearns, M., Koller, D.: Efficient Reinforcement Learning in Factored MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 16, pp. 740–747 (1999)

    Google Scholar 

  7. Buntime, W.: Theory refinement on Bayesian networks. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 7, pp. 52–60 (1991)

    Google Scholar 

  8. Friedman, N., Murphy, K., Russell, S.: Learning the structure of dynamic probabilistic networks. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 14, pp. 139–147 (1998)

    Google Scholar 

  9. Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20, 197–243 (1995)

    MATH  Google Scholar 

  10. Murphy, K.: Active learning of causal Bayes net structure. Technical report, Computer Science Division, University of Berkeley (2001)

    Google Scholar 

  11. Steck, H., Jaakkola, T.: Unsupervised active learning in large domains. In: Proceedings of Uncertainty in Artificial Intelligence, vol. 18, pp. 469–476 (2002)

    Google Scholar 

  12. Tong, S., Koller, D.: Active learning for structure in Bayesian networks. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 17, pp. 863–869 (2001)

    Google Scholar 

  13. Schwartz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)

    Google Scholar 

  14. Poggio, T., Girosi, F.: Regularization Algorithms for Learning that are Equivalent to Multilayer Networks. Science 247, 978–982 (1990)

    Article  Google Scholar 

  15. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  16. Chickering, D., Geiger, D., Heckerman, D.: Learning Bayesian networks: search methods and experimental results. In: Proceedings of Artificial Intelligence and Statistics, vol. 5, pp. 112–128 (1995)

    Google Scholar 

  17. Dietterich, T.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)

    MATH  Google Scholar 

  18. Ghavamzadeh, M., Mahadevan, S.: Continuous-Time Hierarchical Reinforcement Learning. In: Proceedings of the International Conference on Machine Learning, vol. 18, pp. 186–193 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ian Miguel Wheeler Ruml

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jonsson, A., Barto, A. (2007). Active Learning of Dynamic Bayesian Networks in Markov Decision Processes. In: Miguel, I., Ruml, W. (eds) Abstraction, Reformulation, and Approximation. SARA 2007. Lecture Notes in Computer Science(), vol 4612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73580-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73580-9_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73579-3

  • Online ISBN: 978-3-540-73580-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics