Skip to main content

Divide and Conquer: Hierarchical Reinforcement Learning and Task Decomposition in Humans

  • Chapter
  • First Online:
Computational and Robotic Models of the Hierarchical Organization of Behavior

Abstract

The field of computational reinforcement learning (RL) has proved extremely useful in research on human and animal behavior and brain function. However, the simple forms of RL considered in most empirical research do not scale well, making their relevance to complex, real-world behavior unclear. In computational RL, one strategy for addressing the scaling problem is to introduce hierarchical structure, an approach that has intriguing parallels with human behavior. We have begun to investigate the potential relevance of hierarchical RL (HRL) to human and animal behavior and brain function. In the present chapter, we first review two results that show the existence of neural correlates to key predictions from HRL. Then, we focus on one aspect of this work, which deals with the question of how action hierarchies are initially established. Work in HRL suggests that hierarchy learning is accomplished by identifying useful subgoal states, and that this might in turn be accomplished through a structural analysis of the given task domain. We review results from a set of behavioral and neuroimaging experiments, in which we have investigated the relevance of these ideas to human learning and decision making.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This particular result provides preliminary evidence for “model-based” hierarchical planning in the Diuk et al. (2012a) delivery task.

References

  • Aldridge, J. W., & Berridge, K. C. (1998). Coding of serial order by neostriatal neurons: a “natural action” approach to movement sequence. Journal of Neuroscience, 18(7), 2777–2787.

    Google Scholar 

  • Badre, D. (2008). Cognitive control, hierarchy, and the rostro-caudal organization of the frontal lobes. Trends in Cognitive Sciences, 12(5), 193–200.

    Article  Google Scholar 

  • Baldassarre, G., & Mirolli, M. (Eds.), (2012). Intrinsically motivated learning in natural and artificial systems. Berlin: Springer.

    Google Scholar 

  • Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379.

    Article  MathSciNet  Google Scholar 

  • Botvinick, M., & Plaut, D. C. (2004). Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential action. Psychological Review, 111(2), 395–429.

    Article  Google Scholar 

  • Botvinick, M. M., Niv, Y., Barto, A. C. (2009). Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition, 113(3), 262–280.

    Article  Google Scholar 

  • Bruner, J. (1975). Organization of early skilled action. Child Development, 44, 1–11.

    Article  Google Scholar 

  • Conway, C. M., & Christiansen, M. H. (2001). Sequential learning in non-human primates. Trends in Cognitive Sciences, 5(12), 539–546.

    Article  Google Scholar 

  • Cooper, R., & Shallice, T. (2000). Contention scheduling and the control of routine activities. Cognitive Neuropsychology, 17(4), 297–338.

    Article  Google Scholar 

  • Daw, N. D., Courville, A. C., Touretzky, D. S. (2003). Timing and partial observability in the dopamine system. In Advances in Neural Information Processing Systems (NIPS). Cambridge: MIT.

    Google Scholar 

  • Dayan, P., & Hinton, G. E. (1993). Feudal reinforcement learning. In Advances in neural information processing systems 5 (pp. 271–278). San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Dietterich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13, 227–303.

    MathSciNet  MATH  Google Scholar 

  • Diuk, C., Cordova, N., Niv, Y., Botvinick, M. (2012a). Discovering hierarchical task structure. Submitted.

    Google Scholar 

  • Diuk, C., Tsai, K., Wallis, J., Niv, Y., Botvinick, M. M. (2012b). Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia. The Journal of Neuroscience, 33(13), 5797–5805.

    Article  Google Scholar 

  • Elfwing, S., Uchibe, E., Doya, K., Christensen, H. I. (2007). Evolutionary development of hierarchical learning structures. IEEE Transactions on Evolutionary Computation, 11(2), 249–264.

    Article  Google Scholar 

  • Fischer, K. W. (1980). A theory of cognitive development: the control and construction of hierarchies of skills. Psychological Review, 87(6), 477–537

    Article  Google Scholar 

  • Fuster, J. M. (1997). The prefrontal cortex: anatomy, physiology, and neuropsychology of the frontal lobe, 3rd edn. Philadelphia: Lippincott-Raven.

    Google Scholar 

  • Haruno, M., & Kawato, M. (2006). Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural networks: the official journal of the international neural network society, 19(8), 1242–1254.

    Article  MATH  Google Scholar 

  • Hengst, B. (2002). Discovering hierarchy in reinforcement learning with HEXQ. In Proceedings of the 19th international conference on machine learning, Sydney, Australia.

    Google Scholar 

  • Houk, J., Adams, J., Barto, A. (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In J. Houk, J. Davis, D. Beiser (Eds.), Models of information processing in the basal ganglia. Cambridge: MIT.

    Google Scholar 

  • Ito, M., & Doya, K. (2011). Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Current Opinion in Neurobiology, 21(3), 368–373.

    Article  Google Scholar 

  • Joel, D., Niv, Y., Ruppin, E. (2002). Actor—critic models of the basal ganglia: new anatomical and computational perspectives. Neural Networks, 15, 535–547.

    Article  Google Scholar 

  • Jonsson, A., & Barto, A. (2006). Causal graph based decomposition of factored MDPs. Journal of Machine Learning Research, 7, 2259–2301.

    MathSciNet  MATH  Google Scholar 

  • Koechlin, E., Ody, C., Kouneiher, F. (2003). The architecture of cognitive control in the human prefrontal cortex. Science (New York, N.Y.), 302(5648), 1181–1185.

    Google Scholar 

  • Lashley, K. S. (1951). The problem of serial order in behavior. New York: Wiley

    Google Scholar 

  • Li, L., Walsh, T. J., Littman, M. L. (2006). Towards a unified theory of state abstraction for MDPs. In Proceedings of the ninth international symposium on artificial intelligence and mathematics (AMAI-06).

    Google Scholar 

  • McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. Proceedings of the 18th international conference on machine learning.

    Google Scholar 

  • Menache, I., Mannor, S., Shimkin, N. (2002). Q-cut-dynamic discovery of sub-goals in reinforcement learning. In European conference on machine learning (ECML 2002) (pp. 295–306).

    Google Scholar 

  • Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24, 167–202.

    Article  Google Scholar 

  • Miller, G. A., Galanter, E., Pribram, K. H. (1960). Plans and the structure of behavior. New York: Adams-Bannister-Cox

    Book  Google Scholar 

  • Montague, P. R., Dayan, P., Sejnowski, T. J. (1996). A framework for mesencephalic predictive hebbian learning. Journal of Neuroscience, 16(5), 1936–1947.

    Google Scholar 

  • O’Doherty, J., Critchley, H., Deichmann, R., Dolan, R. J. (2003). Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 23(21), 7931–7939.

    Google Scholar 

  • Opsahl, T., Agneessens, F., Skvoretz, J. (2010). Node centrality in weighted networks: generalizing degree and shortest paths. Social Networks, 32, 245–251.

    Article  Google Scholar 

  • Parr, R., & Russell, S. J. (1998). Reinforcement learning with hierarchies of machines. Advances in neural information processing systems.

    Google Scholar 

  • Picket, M., & Barto, A. (2002). Policyblocks: An algorithm for creating useful macro-actions in reinforcement learning. In Proceedings of the 19th International conference on machine learning.

    Google Scholar 

  • Ribas-Fernandes, J. J. F., Solway, A., Diuk, C., McGuire, J. T., Barto, A. G., Niv, Y., Botvinick, M. M. (2011). A neural signature of hierarchical reinforcement learning. Neuron, 71(2), 370–379.

    Article  Google Scholar 

  • Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding: an inquiry into human knowledge structures. Hillsdale: Lawrence Erlbaum.

    MATH  Google Scholar 

  • Schapiro, A., Rogers, T., Cordova, N., Turk-Browne, N., Botvinick, M. (2013). Neural representations of events arise from temporal community structure. Nature Neuroscience, 16, 486–492.

    Article  Google Scholar 

  • Schembri, M., Mirolli, M., Baldassarre, G. (2007a). Evolution and learning in an intrinsically motivated reinforcement learning robot. In F. Almeida y Costa, L. M. Rocha, E. Costa, I. Harvey, A. Coutinho (Eds.), Advances in artificial life. Proceedings of the 9th European conference on artificial life. LNAI (vol. 4648, pp. 294–333). Berlin: Springer.

    Google Scholar 

  • Schembri, M., Mirolli, M., Baldassarre, G. (2007b). Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot. In Y. Demiris, D. Mareschal, B. Scassellati, J. Weng (Eds.), Proceedings of the 6th international conference on development and learning (pp. E1–E6). London: Imperial College.

    Google Scholar 

  • Schmidhuber, J. (1991a). A possibility for implementing curiosity and boredom in model-building neural controllers. In Proceedings of the international conference on simulation of adaptive behavior: from animals to animats (pp. 222–227).

    Google Scholar 

  • Schmidhuber, J. (1991b). Curious model-building control systems. Proceedings of the International Conference on Neural Networks, 2, 1458–1463.

    Google Scholar 

  • Schneider, D. W. & Logan, G. D. (2006). Hierarchical control of cognitive processes: switching tasks in sequences. Journal of Experimental Psychology: General, 135(4), 623–640.

    Article  Google Scholar 

  • Schoenbaum, G., Chiba, A. A., Gallagher, M. (1999). Neural encoding in orbitofrontal cortex and basolateral amygdala during olfactory discrimination learning. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 19(5), 1876–84.

    Google Scholar 

  • Schultz, W., Dayan, P., Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(March 1997), 1593–1599.

    Article  Google Scholar 

  • Schultz, W., Tremblay, L., Hollerman, J. R. (2000). Reward processing in primate orbitofrontal cortex and basal ganglia. Cerebral Cortex, 10(3), 272–84.

    Article  Google Scholar 

  • Şimşek, O. (2008). Behavioral building blocks for autonomous agents: description, identification, and learning. PhD thesis, University of Massachussetts, Amherst.

    Google Scholar 

  • Şimşek, O., Barto, A. G. (2009). Skill Characterization Based on Betweenness. In D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (Eds.), Advances in neural information processing systems 21 (pp. 1497–1504).

    Google Scholar 

  • Şimşek, O., Wolfe, A. P., Barto, A. G. (2005). Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the twenty-second international conference on machine learning.

    Google Scholar 

  • Singh, S., Barto, A., & Chentanez, N. (2005). Proceedings of Advances in Neural Information Processing Systems 17.

    Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT.

    Google Scholar 

  • Sutton, R. S., Precup, D., Singh, S. (1999). Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.

    Article  MathSciNet  MATH  Google Scholar 

  • Thrun, S., & Schwartz, A. (1995). Finding structure in reinforcement learning. In G. Tesauro, D. Touretzky, T. Leen (Eds.), Advances in neural information processing systems (NIPS) 7. Cambridge: MIT.

    Google Scholar 

  • Yamada, S., & Tsuji, S. (1989). Selective learning of macro-operators with perfect causality. In Proceedings of the 11th international joint conference on Artificial intelligence, Volume 1 (pp. 603–608), San Francisco: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  • Zacks, J. M., Speer, N. K., Swallow, K. M., Braver, T. S., Reynolds, J. R. (2007). Event perception: a mind-brain perspective. Psychological Bulletin, 133(2), 273–293.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Diuk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Diuk, C., Schapiro, A., Córdova, N., Ribas-Fernandes, J., Niv, Y., Botvinick, M. (2013). Divide and Conquer: Hierarchical Reinforcement Learning and Task Decomposition in Humans. In: Baldassarre, G., Mirolli, M. (eds) Computational and Robotic Models of the Hierarchical Organization of Behavior. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39875-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39875-9_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39874-2

  • Online ISBN: 978-3-642-39875-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics