Skip to main content

Learning and Coordinating Repertoires of Behaviors with Common Reward: Credit Assignment and Module Activation

  • Chapter
  • First Online:

Abstract

Understanding extended natural behavior will require a theoretical understanding of the entire system as it is engaged in perception and action involving multiple concurrent goals such as foraging for different foods while avoiding different predators and looking for a mate. A promising way to do so is reinforcement learning (RL) as it considers in a very general way the problem of choosing actions in order to maximize a measure of cumulative benefits through some form of learning, and many connections between RL and animal learning have been established. Within this framework, we consider the problem faced by a single agent comprising multiple separate elemental task learners that we call modules, which jointly learn to solve tasks that arise as different combinations of concurrent individual tasks across episodes. While sometimes the goal may be to collect different types of food, at other times avoidance of several predators may be required. The individual modules have separate state representations, i.e. they obtain different inputs but have to carry out actions jointly in the common action space of the agent. Only a single measure of success is observed, which is the sum of the reward contributions from all component tasks. We provide a computational solution for learning elemental task solutions as they contribute to composite goals and a solution for how to learn to schedule these modules for different composite tasks across episodes. The algorithm learns to choose the appropriate modules for a particular task and solves the problem of calculating each module’s contribution to the total reward. The latter calculation works by combining current reward estimates with an error signal resulting from the difference between the global reward and the sum of reward estimates of other co-active modules. As the modules interact through their action value estimates, action selection is based on their composite contribution to individual task combinations. The algorithm learns good action value functions for component tasks and task combinations which is demonstrated on small classical problems and a more complex visuomotor navigation task.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Ballard, D. H., Hayhoe, M. M., Pelz, J. (1995). Memory representations in natural tasks. Journal of Cognitive Neuroscience, 7(1), 68–82.

    Article  Google Scholar 

  • Ballard, D. H., Hayhoe, M. M., Pook, P., Rao, R. P. N. R. (1997). Deictic codes for the embodiment of cognition. Behavioral and Brain Sciences, 20, 723–767.

    Google Scholar 

  • Barrett, H., & Kurzban, R. (2006). Modularity in cognition: framing the debate. Psychological Review; Psychological Review, 113(3), 628.

    Article  Google Scholar 

  • Brooks, R. (1986). A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2(1).

    Google Scholar 

  • Chang, Y.-H., Ho, T., Kaelbling, L. P. (2004). All learning is local: multi-agent learning in global reward games. In S. Thrun, L. Saul, B. Schölkopf (Eds.), Advances in neural information processing systems 16. Cambridge: MIT.

    Google Scholar 

  • Daw, N., & Doya, K. (2006). The computational neurobiology of learning and reward. Current opinion in Neurobiology, 16(2), 199–204.

    Article  Google Scholar 

  • Daw, N. D., Niv, Y., Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 1704–1711.

    Article  Google Scholar 

  • Dayan, P., & Hinton, G. E. (1992). Feudal reinforcement learning. In Advances in neural information processing systems 5 (pp. 271–271). Los Altos: Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  • Doya, K., Samejima, K., Katagiri, K.-I., Kawato, M. (2002). Multiple model-based reinforcement learning. Neural Computation, 14(6), 1347–1369.

    Article  MATH  Google Scholar 

  • Fodor, J. A. (1983). Modularity of Mind. Cambridge: MIT.

    Google Scholar 

  • Gábor, Z., Kalmár, Z., Szepesvári, C. (1998). Multi-criteria reinforcement learning. In Proceedings of the fifteenth international conference on machine learning (pp. 197–205). Los Altos: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  • Gershman, S., Pesaran, B., Daw, N. (2009). Human reinforcement learning subdivides structured action spaces by learning effector-specific values. The Journal of Neuroscience, 29(43), 13524–13531.

    Article  Google Scholar 

  • Guestrin, C., Koller, D., Parr, R., Venkataraman, S. (2003). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19, 399–468.

    MathSciNet  MATH  Google Scholar 

  • Humphrys, M. (1996). Action selection methods using reinforcement learning. In P. Maes, M. Mataric, J.-A. Meyer, J. Pollack, S. W. Wilson (Eds.), From animals to animats 4: proceedings of the fourth international conference on simulation of adaptive behavior (pp. 135–144). Cambridge: MIT, Bradford Books.

    Google Scholar 

  • Jacobs, R., Jordan, M., Nowlan, S., Hinton, G. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79–87.

    Article  Google Scholar 

  • Kable, J., & Glimcher, P. (2009). The neurobiology of decision: consensus and controversy. Neuron, 63(6), 733–745.

    Article  Google Scholar 

  • Kaelbling, L. P. (1993). Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the tenth international conference on machine learning (vol. 951, pp. 167–173). Los Altos: Morgan Kaufmann.

    Google Scholar 

  • Karlsson, J. (1997). Learning to solve multiple goals. PhD thesis, University of Rochester.

    Google Scholar 

  • Kok, J. R., & Vlassis, N. (2004). Sparse cooperative q-learning. In Proceedings of the international conference on machine learning (pp. 481–488). New York: ACM.

    Google Scholar 

  • Land, M. F., & McLeod, P. (2000). From eye movements to actions: how batsmen hit the ball. Nature Neuroscience, 3, 1340–1345.

    Article  Google Scholar 

  • Mannor, S., & Shimkin, N. (2004). A geometric approach to multi-criterion reinforcement learning. The Journal of Machine Learning Research, 5, 325–360.

    MathSciNet  MATH  Google Scholar 

  • Meuleau, N., Hauskrecht, M., Kim, K.-E., Peshkin, L., Kaelbling, L., Dean, T., Boutilier, C. (1998). Solving very large weakly coupled markov decision processes. In AAAI/IAAI (pp. 165–172). Menlo Park: AAAI Press.

    Google Scholar 

  • Minsky, M. (1988). The society of mind. New York: Simon and Schuster.

    Google Scholar 

  • Morris, G., Nevet, A., Arkadir, D., Vaadia, E., Bergman, H. (2006). Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience, 9(8), 1057–1063.

    Article  Google Scholar 

  • Natarajan, S., & Tadepalli, P. (2005). Dynamic preferences in multi-criteria reinforcement learning. In Proceedings of the 22nd international conference on machine learning (pp. 601–608). New York: ACM.

    Google Scholar 

  • Pinker, S. (1999). How the mind works. Annals of the New York Academy of Sciences, 882(1), 119–127.

    Article  Google Scholar 

  • Ring, M. B. (1994). Continual learning in reinforcement environments. PhD thesis, University of Texas at Austin.

    Google Scholar 

  • Rothkopf, C. A. (2008). Modular models of task based visually guided behavior. PhD thesis, Department of Brain and Cognitive Sciences, Department of Computer Science, University of Rochester.

    Google Scholar 

  • Rothkopf, C. A., & Ballard, D. H. (2010). Credit assignment in multiple goal embodied visuomotor behavior. Frontiers in Psychology, 1, Special Issue on Embodied Cognition(00173).

    Google Scholar 

  • Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department.

    Google Scholar 

  • Russell, S., & Zimdars, A. L. (2003). Q-decomposition for reinforcement learning agents. In Proceedings of the international conference on machine learning (vol. 20, p. 656). Menlo Park: AAAI Press.

    Google Scholar 

  • Sallans, B., & Hinton, G. E. (2004). Reinforcement learning with factored states and actions. Journal of Machine Learning Research, 5, 1063–1088.

    MathSciNet  MATH  Google Scholar 

  • Samejima, K., Ueda, Y., Doya, K., Kimura, M. (2005). Representation of action-specific reward values in the striatum. Science, 310(5752), 1337.

    Article  Google Scholar 

  • Schneider, J., Wong, W.-K., Moore, A., Riedmiller, M. (1999). Distributed value functions. In Proceedings of the 16th international conference on machine learning (pp. 371–378). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Schultz, W., Dayan, P., Montague, P. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.

    Article  Google Scholar 

  • Singh, S., & Cohn, D. (1998). How to dynamically merge markov decision processes. In Neural information processing systems 10 (pp. 1057–1063). Cambridge: The MIT Press.

    Google Scholar 

  • Sprague, N., & Ballard, D. (2003). Multiple-goal reinforcement learning with modular sarsa(0). In International joint conference on artificial intelligence (pp. 1445–1447). Morgan Kaufmann: Acapulco.

    Google Scholar 

  • Sprague, N., Ballard, D., Robinson, A. (2007). Modeling embodied visual behaviors. ACM Transactions on Applied Perception, 4(2), 11.

    Article  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT.

    Google Scholar 

  • Toutounji, H., Rothkopf, C. A., Triesch, J. (2011). Scalable reinforcement learning through hierarchical decompositions for weakly-coupled problems. In 2011 IEEE 10th international conference on development and learning (ICDL) (Vol. 2, pp. 1–7). New York: IEEE.

    Google Scholar 

  • Ullman, S. (1984). Visual routines. Cognition, 18, 97–157.

    Article  Google Scholar 

  • Von Neumann, J., Morgenstern, O., Rubinstein, A., Kuhn, H. (1947). Theory of games and economic behavior. Princeton: Princeton University Press.

    MATH  Google Scholar 

  • Watkins, C. J. (1989). Learning from delayed rewards. PhD thesis, University of Cambridge.

    Google Scholar 

  • Yarbus, A. (1967). Eye movements and vision. New York: Plenum Press.

    Book  Google Scholar 

Download references

Acknowledgements

The research reported herein was supported by NIH Grants RR009283 and MH060624. C. R. was additionally supported by EC MEXT-project PLICON and by the EU-Project IM-CLeVeR, FP7- ICT-IP-231722.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Constantin A. Rothkopf .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Rothkopf, C.A., Ballard, D.H. (2013). Learning and Coordinating Repertoires of Behaviors with Common Reward: Credit Assignment and Module Activation. In: Baldassarre, G., Mirolli, M. (eds) Computational and Robotic Models of the Hierarchical Organization of Behavior. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39875-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39875-9_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39874-2

  • Online ISBN: 978-3-642-39875-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics