Skip to main content

An Associative State-Space Metric for Learning in Factored MDPs

  • Conference paper
Progress in Artificial Intelligence (EPIA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8154))

Included in the following conference series:

Abstract

In this paper we propose a novel associative metric based on the classical conditioning paradigm that, much like what happens in nature, identifies associations between stimuli perceived by a learning agent while interacting with the environment. We use an associative tree structure to identify associations between the perceived stimuli and use this structure to measure the degree of similarity between states in factored Markov decision problems. Our approach provides a state-space metric that requires no prior knowledge on the structure of the underlying decision problem and is designed to be learned online, i.e., as the agent interacts with its environment. Our metric is thus amenable to application in reinforcement learning (RL) settings, allowing the learning agent to generalize its experience to unvisited states and improving the overall learning performance. We illustrate the application of our method in several problems of varying complexity and show that our metric leads to a performance comparable to that obtained with other well-studied metrics that require full knowledge of the decision problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, J.: Learning and Memory: An Integrated Approach. Wiley (2000)

    Google Scholar 

  2. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press (1998)

    Google Scholar 

  3. Szepesvári, C.: Algorithms for Reinforcement Learning. Morgan & Claypool (2010)

    Google Scholar 

  4. Kearns, M., Koller, D.: Efficient reinforcement learning in factored MDPs. In: Proc. 1999 Int. Joint Conf. Artificial Intelligence, pp. 740–747 (1999)

    Google Scholar 

  5. Jong, N., Stone, P.: State abstraction discovery from irrelevant state variables. In: Proc. 19th Int. Joint Conf. Artificial Intelligence, pp. 752–757 (2005)

    Google Scholar 

  6. Kroon, M., Whiteson, S.: Automatic feature selection for model-based reinforcement learning in factored MDPs. In: Proc. 2009 Int. Conf. Machine Learning and Applications, pp. 324–330 (2009)

    Google Scholar 

  7. Sequeira, P., Antunes, C.: Real-time sensory pattern mining for autonomous agents. In: Cao, L., Bazzan, A.L.C., Gorodetsky, V., Mitkas, P.A., Weiss, G., Yu, P.S. (eds.) ADMI 2010. LNCS, vol. 5980, pp. 71–83. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Ribeiro, C., Szepesvári, C.: Q-learning combined with spreading: Convergence and results. In: Proc. Int. Conf. Intelligent and Cognitive Systems, pp. 32–36 (1996)

    Google Scholar 

  9. Ferns, N., Panangaden, P., Precup, D.: Metrics for finite Markov decision processes. In: Proc. 20th Conf. Uncertainty in Artificial Intelligence, pp. 162–169 (2004)

    Google Scholar 

  10. Watkins, C.: Learning from delayed rewards. PhD thesis, King’s College, Cambridge Univ. (1989)

    Google Scholar 

  11. Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. J. Artificial Intelligence Research 19, 399–468 (2003)

    MATH  MathSciNet  Google Scholar 

  12. Pavlov, I.: Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. Oxford Univ. Press (1927)

    Google Scholar 

  13. Cardinal, R., Parkinson, J., Hall, J., Everitt, B.: Emotion and motivation: The role of the amygdala, ventral striatum, and prefrontal cortex. Neuroscience and Biobehavioral Reviews 26(3), 321–352 (2002)

    Article  Google Scholar 

  14. Balkenius, C., Morén, J.: Computational models of classical conditioning: A comparative study. In: Proc. 5th Int. Conf. Simulation of Adaptive Behavior: From Animals to Animats, vol. 5, pp. 348–353 (1998)

    Google Scholar 

  15. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation. Data Mining and Knowledge Disc. 8, 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  16. Jaccard, P.: The distribution of the flora in the alpine zone. New Phytologist 11(2), 37–50 (1912)

    Article  Google Scholar 

  17. Lipkus, A.: A proof of the triangle inequality for the Tanimoto distance. J. Mathematical Chemistry 26(1), 263–265 (1999)

    Article  MATH  Google Scholar 

  18. Ravindran, B., Barto, A.: Approximate homomorphisms: A framework for non-exact minimization in Markov decision processes. In: Proc. 5th Int. Conf. Knowledge-Based Computer Systems (2004)

    Google Scholar 

  19. Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov Decision Processes. Artificial Intelligence 147, 163–223 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  20. Szepesvári, C., Smart, W.: Interpolation-based Q-learning. In: Proc. 21st Int. Conf. Machine Learning, pp. 100–107 (2004)

    Google Scholar 

  21. Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49, 161–178 (2002)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sequeira, P., Melo, F.S., Paiva, A. (2013). An Associative State-Space Metric for Learning in Factored MDPs. In: Correia, L., Reis, L.P., Cascalho, J. (eds) Progress in Artificial Intelligence. EPIA 2013. Lecture Notes in Computer Science(), vol 8154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40669-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40669-0_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40668-3

  • Online ISBN: 978-3-642-40669-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics