Learning and Coordinating Repertoires of Behaviors with Common Reward: Credit Assignment and Module Activation

Rothkopf, Constantin A.; Ballard, Dana H.

doi:10.1007/978-3-642-39875-9_6

Constantin A. Rothkopf³ &
Dana H. Ballard⁴

1202 Accesses

Abstract

Understanding extended natural behavior will require a theoretical understanding of the entire system as it is engaged in perception and action involving multiple concurrent goals such as foraging for different foods while avoiding different predators and looking for a mate. A promising way to do so is reinforcement learning (RL) as it considers in a very general way the problem of choosing actions in order to maximize a measure of cumulative benefits through some form of learning, and many connections between RL and animal learning have been established. Within this framework, we consider the problem faced by a single agent comprising multiple separate elemental task learners that we call modules, which jointly learn to solve tasks that arise as different combinations of concurrent individual tasks across episodes. While sometimes the goal may be to collect different types of food, at other times avoidance of several predators may be required. The individual modules have separate state representations, i.e. they obtain different inputs but have to carry out actions jointly in the common action space of the agent. Only a single measure of success is observed, which is the sum of the reward contributions from all component tasks. We provide a computational solution for learning elemental task solutions as they contribute to composite goals and a solution for how to learn to schedule these modules for different composite tasks across episodes. The algorithm learns to choose the appropriate modules for a particular task and solves the problem of calculating each module’s contribution to the total reward. The latter calculation works by combining current reward estimates with an error signal resulting from the difference between the global reward and the sum of reward estimates of other co-active modules. As the modules interact through their action value estimates, action selection is based on their composite contribution to individual task combinations. The algorithm learns good action value functions for component tasks and task combinations which is demonstrated on small classical problems and a more complex visuomotor navigation task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-task Learning with Modular Reinforcement Learning

Evolutionary Computation and the Reinforcement Learning Problem

Humans Adopt Different Exploration Strategies Depending on the Environment

Article 15 August 2023

References

Ballard, D. H., Hayhoe, M. M., Pelz, J. (1995). Memory representations in natural tasks. Journal of Cognitive Neuroscience, 7(1), 68–82.
Article Google Scholar
Ballard, D. H., Hayhoe, M. M., Pook, P., Rao, R. P. N. R. (1997). Deictic codes for the embodiment of cognition. Behavioral and Brain Sciences, 20, 723–767.
Google Scholar
Barrett, H., & Kurzban, R. (2006). Modularity in cognition: framing the debate. Psychological Review; Psychological Review, 113(3), 628.
Article Google Scholar
Brooks, R. (1986). A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2(1).
Google Scholar
Chang, Y.-H., Ho, T., Kaelbling, L. P. (2004). All learning is local: multi-agent learning in global reward games. In S. Thrun, L. Saul, B. Schölkopf (Eds.), Advances in neural information processing systems 16. Cambridge: MIT.
Google Scholar
Daw, N., & Doya, K. (2006). The computational neurobiology of learning and reward. Current opinion in Neurobiology, 16(2), 199–204.
Article Google Scholar
Daw, N. D., Niv, Y., Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 1704–1711.
Article Google Scholar
Dayan, P., & Hinton, G. E. (1992). Feudal reinforcement learning. In Advances in neural information processing systems 5 (pp. 271–271). Los Altos: Morgan Kaufmann Publishers, Inc.
Google Scholar
Doya, K., Samejima, K., Katagiri, K.-I., Kawato, M. (2002). Multiple model-based reinforcement learning. Neural Computation, 14(6), 1347–1369.
Article MATH Google Scholar
Fodor, J. A. (1983). Modularity of Mind. Cambridge: MIT.
Google Scholar
Gábor, Z., Kalmár, Z., Szepesvári, C. (1998). Multi-criteria reinforcement learning. In Proceedings of the fifteenth international conference on machine learning (pp. 197–205). Los Altos: Morgan Kaufmann Publishers Inc.
Google Scholar
Gershman, S., Pesaran, B., Daw, N. (2009). Human reinforcement learning subdivides structured action spaces by learning effector-specific values. The Journal of Neuroscience, 29(43), 13524–13531.
Article Google Scholar
Guestrin, C., Koller, D., Parr, R., Venkataraman, S. (2003). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19, 399–468.
MathSciNet MATH Google Scholar
Humphrys, M. (1996). Action selection methods using reinforcement learning. In P. Maes, M. Mataric, J.-A. Meyer, J. Pollack, S. W. Wilson (Eds.), From animals to animats 4: proceedings of the fourth international conference on simulation of adaptive behavior (pp. 135–144). Cambridge: MIT, Bradford Books.
Google Scholar
Jacobs, R., Jordan, M., Nowlan, S., Hinton, G. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79–87.
Article Google Scholar
Kable, J., & Glimcher, P. (2009). The neurobiology of decision: consensus and controversy. Neuron, 63(6), 733–745.
Article Google Scholar
Kaelbling, L. P. (1993). Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the tenth international conference on machine learning (vol. 951, pp. 167–173). Los Altos: Morgan Kaufmann.
Google Scholar
Karlsson, J. (1997). Learning to solve multiple goals. PhD thesis, University of Rochester.
Google Scholar
Kok, J. R., & Vlassis, N. (2004). Sparse cooperative q-learning. In Proceedings of the international conference on machine learning (pp. 481–488). New York: ACM.
Google Scholar
Land, M. F., & McLeod, P. (2000). From eye movements to actions: how batsmen hit the ball. Nature Neuroscience, 3, 1340–1345.
Article Google Scholar
Mannor, S., & Shimkin, N. (2004). A geometric approach to multi-criterion reinforcement learning. The Journal of Machine Learning Research, 5, 325–360.
MathSciNet MATH Google Scholar
Meuleau, N., Hauskrecht, M., Kim, K.-E., Peshkin, L., Kaelbling, L., Dean, T., Boutilier, C. (1998). Solving very large weakly coupled markov decision processes. In AAAI/IAAI (pp. 165–172). Menlo Park: AAAI Press.
Google Scholar
Minsky, M. (1988). The society of mind. New York: Simon and Schuster.
Google Scholar
Morris, G., Nevet, A., Arkadir, D., Vaadia, E., Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience, 9(8), 1057–1063.
Article Google Scholar
Natarajan, S., & Tadepalli, P. (2005). Dynamic preferences in multi-criteria reinforcement learning. In Proceedings of the 22nd international conference on machine learning (pp. 601–608). New York: ACM.
Google Scholar
Pinker, S. (1999). How the mind works. Annals of the New York Academy of Sciences, 882(1), 119–127.
Article Google Scholar
Ring, M. B. (1994). Continual learning in reinforcement environments. PhD thesis, University of Texas at Austin.
Google Scholar
Rothkopf, C. A. (2008). Modular models of task based visually guided behavior. PhD thesis, Department of Brain and Cognitive Sciences, Department of Computer Science, University of Rochester.
Google Scholar
Rothkopf, C. A., & Ballard, D. H. (2010). Credit assignment in multiple goal embodied visuomotor behavior. Frontiers in Psychology, 1, Special Issue on Embodied Cognition(00173).
Google Scholar
Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department.
Google Scholar
Russell, S., & Zimdars, A. L. (2003). Q-decomposition for reinforcement learning agents. In Proceedings of the international conference on machine learning (vol. 20, p. 656). Menlo Park: AAAI Press.
Google Scholar
Sallans, B., & Hinton, G. E. (2004). Reinforcement learning with factored states and actions. Journal of Machine Learning Research, 5, 1063–1088.
MathSciNet MATH Google Scholar
Samejima, K., Ueda, Y., Doya, K., Kimura, M. (2005). Representation of action-specific reward values in the striatum. Science, 310(5752), 1337.
Article Google Scholar
Schneider, J., Wong, W.-K., Moore, A., Riedmiller, M. (1999). Distributed value functions. In Proceedings of the 16th international conference on machine learning (pp. 371–378). San Francisco: Morgan Kaufmann.
Google Scholar
Schultz, W., Dayan, P., Montague, P. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.
Article Google Scholar
Singh, S., & Cohn, D. (1998). How to dynamically merge markov decision processes. In Neural information processing systems 10 (pp. 1057–1063). Cambridge: The MIT Press.
Google Scholar
Sprague, N., & Ballard, D. (2003). Multiple-goal reinforcement learning with modular sarsa(0). In International joint conference on artificial intelligence (pp. 1445–1447). Morgan Kaufmann: Acapulco.
Google Scholar
Sprague, N., Ballard, D., Robinson, A. (2007). Modeling embodied visual behaviors. ACM Transactions on Applied Perception, 4(2), 11.
Article Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT.
Google Scholar
Toutounji, H., Rothkopf, C. A., Triesch, J. (2011). Scalable reinforcement learning through hierarchical decompositions for weakly-coupled problems. In 2011 IEEE 10th international conference on development and learning (ICDL) (Vol. 2, pp. 1–7). New York: IEEE.
Google Scholar
Ullman, S. (1984). Visual routines. Cognition, 18, 97–157.
Article Google Scholar
Von Neumann, J., Morgenstern, O., Rubinstein, A., Kuhn, H. (1947). Theory of games and economic behavior. Princeton: Princeton University Press.
MATH Google Scholar
Watkins, C. J. (1989). Learning from delayed rewards. PhD thesis, University of Cambridge.
Google Scholar
Yarbus, A. (1967). Eye movements and vision. New York: Plenum Press.
Book Google Scholar

Download references

Acknowledgements

The research reported herein was supported by NIH Grants RR009283 and MH060624. C. R. was additionally supported by EC MEXT-project PLICON and by the EU-Project IM-CLeVeR, FP7- ICT-IP-231722.

Author information

Authors and Affiliations

Frankfurt Institute for Advanced Studies, Goethe University, Frankfurt am Main, Germany
Constantin A. Rothkopf
Department of Computer Science, University of Texas at Austin, Austin, TX, USA
Dana H. Ballard

Authors

Constantin A. Rothkopf
View author publications
You can also search for this author in PubMed Google Scholar
Dana H. Ballard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Constantin A. Rothkopf .

Editor information

Editors and Affiliations

Consiglio Nazionale delle Ricerche, Istituto di Scienze e Tecnologie della Cognizione, Rome, Italy
Gianluca Baldassarre
Consiglio Nazionale delle Ricerche, Istituto di Scienze e Tecnologie della Cognizione, Rome, Italy
Marco Mirolli

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rothkopf, C.A., Ballard, D.H. (2013). Learning and Coordinating Repertoires of Behaviors with Common Reward: Credit Assignment and Module Activation. In: Baldassarre, G., Mirolli, M. (eds) Computational and Robotic Models of the Hierarchical Organization of Behavior. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39875-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-39875-9_6
Published: 28 September 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39874-2
Online ISBN: 978-3-642-39875-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning and Coordinating Repertoires of Behaviors with Common Reward: Credit Assignment and Module Activation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-task Learning with Modular Reinforcement Learning

Evolutionary Computation and the Reinforcement Learning Problem

Humans Adopt Different Exploration Strategies Depending on the Environment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning and Coordinating Repertoires of Behaviors with Common Reward: Credit Assignment and Module Activation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-task Learning with Modular Reinforcement Learning

Evolutionary Computation and the Reinforcement Learning Problem

Humans Adopt Different Exploration Strategies Depending on the Environment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation