An Associative State-Space Metric for Learning in Factored MDPs

Sequeira, Pedro; Melo, Francisco S.; Paiva, Ana

doi:10.1007/978-3-642-40669-0_15

Pedro Sequeira²²,
Francisco S. Melo²² &
Ana Paiva²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8154))

Included in the following conference series:

Portuguese Conference on Artificial Intelligence

2830 Accesses
2 Citations

Abstract

In this paper we propose a novel associative metric based on the classical conditioning paradigm that, much like what happens in nature, identifies associations between stimuli perceived by a learning agent while interacting with the environment. We use an associative tree structure to identify associations between the perceived stimuli and use this structure to measure the degree of similarity between states in factored Markov decision problems. Our approach provides a state-space metric that requires no prior knowledge on the structure of the underlying decision problem and is designed to be learned online, i.e., as the agent interacts with its environment. Our metric is thus amenable to application in reinforcement learning (RL) settings, allowing the learning agent to generalize its experience to unvisited states and improving the overall learning performance. We illustrate the application of our method in several problems of varying complexity and show that our metric leads to a performance comparable to that obtained with other well-studied metrics that require full knowledge of the decision problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, J.: Learning and Memory: An Integrated Approach. Wiley (2000)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press (1998)
Google Scholar
Szepesvári, C.: Algorithms for Reinforcement Learning. Morgan & Claypool (2010)
Google Scholar
Kearns, M., Koller, D.: Efficient reinforcement learning in factored MDPs. In: Proc. 1999 Int. Joint Conf. Artificial Intelligence, pp. 740–747 (1999)
Google Scholar
Jong, N., Stone, P.: State abstraction discovery from irrelevant state variables. In: Proc. 19th Int. Joint Conf. Artificial Intelligence, pp. 752–757 (2005)
Google Scholar
Kroon, M., Whiteson, S.: Automatic feature selection for model-based reinforcement learning in factored MDPs. In: Proc. 2009 Int. Conf. Machine Learning and Applications, pp. 324–330 (2009)
Google Scholar
Sequeira, P., Antunes, C.: Real-time sensory pattern mining for autonomous agents. In: Cao, L., Bazzan, A.L.C., Gorodetsky, V., Mitkas, P.A., Weiss, G., Yu, P.S. (eds.) ADMI 2010. LNCS, vol. 5980, pp. 71–83. Springer, Heidelberg (2010)
Chapter Google Scholar
Ribeiro, C., Szepesvári, C.: Q-learning combined with spreading: Convergence and results. In: Proc. Int. Conf. Intelligent and Cognitive Systems, pp. 32–36 (1996)
Google Scholar
Ferns, N., Panangaden, P., Precup, D.: Metrics for finite Markov decision processes. In: Proc. 20th Conf. Uncertainty in Artificial Intelligence, pp. 162–169 (2004)
Google Scholar
Watkins, C.: Learning from delayed rewards. PhD thesis, King’s College, Cambridge Univ. (1989)
Google Scholar
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. J. Artificial Intelligence Research 19, 399–468 (2003)
MATH MathSciNet Google Scholar
Pavlov, I.: Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. Oxford Univ. Press (1927)
Google Scholar
Cardinal, R., Parkinson, J., Hall, J., Everitt, B.: Emotion and motivation: The role of the amygdala, ventral striatum, and prefrontal cortex. Neuroscience and Biobehavioral Reviews 26(3), 321–352 (2002)
Article Google Scholar
Balkenius, C., Morén, J.: Computational models of classical conditioning: A comparative study. In: Proc. 5th Int. Conf. Simulation of Adaptive Behavior: From Animals to Animats, vol. 5, pp. 348–353 (1998)
Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation. Data Mining and Knowledge Disc. 8, 53–87 (2004)
Article MathSciNet Google Scholar
Jaccard, P.: The distribution of the flora in the alpine zone. New Phytologist 11(2), 37–50 (1912)
Article Google Scholar
Lipkus, A.: A proof of the triangle inequality for the Tanimoto distance. J. Mathematical Chemistry 26(1), 263–265 (1999)
Article MATH Google Scholar
Ravindran, B., Barto, A.: Approximate homomorphisms: A framework for non-exact minimization in Markov decision processes. In: Proc. 5th Int. Conf. Knowledge-Based Computer Systems (2004)
Google Scholar
Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov Decision Processes. Artificial Intelligence 147, 163–223 (2003)
Article MATH MathSciNet Google Scholar
Szepesvári, C., Smart, W.: Interpolation-based Q-learning. In: Proc. 21st Int. Conf. Machine Learning, pp. 100–107 (2004)
Google Scholar
Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49, 161–178 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

INESC-ID and Instituto Superior Técnico, Technical University of Lisbon, Av. Prof. Dr. Cavaco Silva, 2744-016, Porto Salvo, Portugal
Pedro Sequeira, Francisco S. Melo & Ana Paiva

Authors

Pedro Sequeira
View author publications
You can also search for this author in PubMed Google Scholar
Francisco S. Melo
View author publications
You can also search for this author in PubMed Google Scholar
Ana Paiva
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Informatics Department, University of Lisbon, Campo Grande, 174-016, Lisbon, Portugal
Luís Correia
Information Systems Department, University of Minho, Campus de Azurém, 4800-058, Guimarães, Portugal
Luís Paulo Reis
Department of Education, University of the Azores, Campus de Angra do Heroísmo, Angra do Heroísma, 9700-042, Azores, Portugal
José Cascalho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sequeira, P., Melo, F.S., Paiva, A. (2013). An Associative State-Space Metric for Learning in Factored MDPs. In: Correia, L., Reis, L.P., Cascalho, J. (eds) Progress in Artificial Intelligence. EPIA 2013. Lecture Notes in Computer Science(), vol 8154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40669-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-40669-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40668-3
Online ISBN: 978-3-642-40669-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics