Abstract
In this paper we propose a novel associative metric based on the classical conditioning paradigm that, much like what happens in nature, identifies associations between stimuli perceived by a learning agent while interacting with the environment. We use an associative tree structure to identify associations between the perceived stimuli and use this structure to measure the degree of similarity between states in factored Markov decision problems. Our approach provides a state-space metric that requires no prior knowledge on the structure of the underlying decision problem and is designed to be learned online, i.e., as the agent interacts with its environment. Our metric is thus amenable to application in reinforcement learning (RL) settings, allowing the learning agent to generalize its experience to unvisited states and improving the overall learning performance. We illustrate the application of our method in several problems of varying complexity and show that our metric leads to a performance comparable to that obtained with other well-studied metrics that require full knowledge of the decision problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, J.: Learning and Memory: An Integrated Approach. Wiley (2000)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press (1998)
Szepesvári, C.: Algorithms for Reinforcement Learning. Morgan & Claypool (2010)
Kearns, M., Koller, D.: Efficient reinforcement learning in factored MDPs. In: Proc. 1999 Int. Joint Conf. Artificial Intelligence, pp. 740–747 (1999)
Jong, N., Stone, P.: State abstraction discovery from irrelevant state variables. In: Proc. 19th Int. Joint Conf. Artificial Intelligence, pp. 752–757 (2005)
Kroon, M., Whiteson, S.: Automatic feature selection for model-based reinforcement learning in factored MDPs. In: Proc. 2009 Int. Conf. Machine Learning and Applications, pp. 324–330 (2009)
Sequeira, P., Antunes, C.: Real-time sensory pattern mining for autonomous agents. In: Cao, L., Bazzan, A.L.C., Gorodetsky, V., Mitkas, P.A., Weiss, G., Yu, P.S. (eds.) ADMI 2010. LNCS, vol. 5980, pp. 71–83. Springer, Heidelberg (2010)
Ribeiro, C., Szepesvári, C.: Q-learning combined with spreading: Convergence and results. In: Proc. Int. Conf. Intelligent and Cognitive Systems, pp. 32–36 (1996)
Ferns, N., Panangaden, P., Precup, D.: Metrics for finite Markov decision processes. In: Proc. 20th Conf. Uncertainty in Artificial Intelligence, pp. 162–169 (2004)
Watkins, C.: Learning from delayed rewards. PhD thesis, King’s College, Cambridge Univ. (1989)
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. J. Artificial Intelligence Research 19, 399–468 (2003)
Pavlov, I.: Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. Oxford Univ. Press (1927)
Cardinal, R., Parkinson, J., Hall, J., Everitt, B.: Emotion and motivation: The role of the amygdala, ventral striatum, and prefrontal cortex. Neuroscience and Biobehavioral Reviews 26(3), 321–352 (2002)
Balkenius, C., Morén, J.: Computational models of classical conditioning: A comparative study. In: Proc. 5th Int. Conf. Simulation of Adaptive Behavior: From Animals to Animats, vol. 5, pp. 348–353 (1998)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation. Data Mining and Knowledge Disc. 8, 53–87 (2004)
Jaccard, P.: The distribution of the flora in the alpine zone. New Phytologist 11(2), 37–50 (1912)
Lipkus, A.: A proof of the triangle inequality for the Tanimoto distance. J. Mathematical Chemistry 26(1), 263–265 (1999)
Ravindran, B., Barto, A.: Approximate homomorphisms: A framework for non-exact minimization in Markov decision processes. In: Proc. 5th Int. Conf. Knowledge-Based Computer Systems (2004)
Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov Decision Processes. Artificial Intelligence 147, 163–223 (2003)
Szepesvári, C., Smart, W.: Interpolation-based Q-learning. In: Proc. 21st Int. Conf. Machine Learning, pp. 100–107 (2004)
Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49, 161–178 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sequeira, P., Melo, F.S., Paiva, A. (2013). An Associative State-Space Metric for Learning in Factored MDPs. In: Correia, L., Reis, L.P., Cascalho, J. (eds) Progress in Artificial Intelligence. EPIA 2013. Lecture Notes in Computer Science(), vol 8154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40669-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-40669-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40668-3
Online ISBN: 978-3-642-40669-0
eBook Packages: Computer ScienceComputer Science (R0)