Skip to main content

Action Discovery and Intrinsic Motivation: A Biologically Constrained Formalisation

  • Chapter
  • First Online:

Abstract

We introduce a biologically motivated, formal framework or “ontology” for dealing with many aspects of action discovery which we argue is an example of intrinsically motivated behaviour (as such, this chapter is a companion to that by Redgrave et al. in this volume). We argue that action discovery requires an interplay between separate internal forward models of prediction and inverse models mapping outcomes to actions. The process of learning actions is driven by transient changes in the animal’s policy (repetition bias) which is, in turn, a result of unpredicted, phasic sensory information (“surprise”). The notion of salience as value is introduced and broken down into contributions from novelty (or surprise), immediate reward acquisition, or general task/goal attainment. Many other aspects of biological action discovery emerge naturally in our framework which aims to guide future modelling efforts in this domain.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We use the term “vector”, but in the sense adopted in computer science to mean a 1D-array or n-tuple; there is no implication that these n-tuples form a true vector space. Indeed, if we use only positive-valued components (a natural choice to indicate presence of a feature), the space does not have an additive inverse.

  2. 2.

    We use the normal convention that y denotes a function and y(t) its value at time t.

  3. 3.

    Throughout this chapter, we use the expression “A is a subspace of B” (or A ⊂ B) to mean that A is defined over a subset of the features in B. Also, note that B is supposed to be upper case Greek β in keeping with the notation that Greek and Roman symbols refer to the external world and neural representations, respectively.

  4. 4.

    N S ∖ M is defined over the features in N S which are not contained in N M.

  5. 5.

    Other notions of action efficiency/optimality could have been used (e.g. minimal energy expenditure), but temporal optimality and action simplicity seem most appropriate in the context of action discovery.

  6. 6.

    This is equivalent to xH(x) where H(x) is the Heaviside function, but the notation in the text is more expedient here.

  7. 7.

    Some researchers call this a competence model  7.

  8. 8.

    The striatum is the main input nucleus of the basal ganglia.

  9. 9.

    Additionally, this may be supported by some kind of eligibility trace associated with s (a) which dopamine acts upon (Gurney et al. 2009a).

References

  1. Allport, A., Sanders, H., Heuer, A.: Selection for action: Some behavioural and neurophysiological considerations of attention and action. In: Perspectives on Perception and Action. Lawrence Erlbaum Associates Inc., Hillsdale (1987)

    Google Scholar 

  2. Baldi, P., Itti, L.: Of bits and wows: A bayesian theory of surprise with applications to attention. Neural Netw. 23(5), 649–666 (2010)

    Article  Google Scholar 

  3. Balleine, B.W., Dickinson, A.: Goal-directed instrumental action: Contingency and incentive learning and their cortical substrates. Neuropharmacology 37(4–5), 407–419 (1998)

    Article  Google Scholar 

  4. Barto, A., Singh, S., Chentanez, N.: Intrinsically motivated reinforcement learning. In: 18th Annual Conference on Neural Information Processing Systems (NIPS). Vancouver (2004)

    Google Scholar 

  5. Cisek, P.: Cortical mechanisms of action selection: The affordance competition hypothesis. Philos.Trans. R. Soc. Lond B Biol. Sci. 362(1485), 1585–1599 (2007)

    Article  Google Scholar 

  6. Cisek, P., Kalaska, J.: Neural mechanisms for interacting with a world full of action choices. Annu. Rev. Neurosci. 33, 269–298 (2010)

    Article  Google Scholar 

  7. Comoli, E., Coizet, V., Boyes, J., Bolam, J., Canteras, N., Quirk, R., Overton, P, Redgrave, P.: A direct projection from superior colliculus to substantia nigra for detecting salient visual events. Nat. Neurosci. 6(9), 974–980 (2003)

    Article  Google Scholar 

  8. Connor, C.E., Egeth, H.E., Yantis, S.: Visual attention: Bottom-Up versus Top-Down. Curr. Biol. 14(19), R850–R852 (2004)

    Article  Google Scholar 

  9. Cope, A., Chambers, J., Gurney, K.: Object-based biasing for attentional control of gaze: A comparison of biologically plausible mechanisms. BMC Neurosci. 10(Suppl. 1), P19 (2009)

    Article  Google Scholar 

  10. Dommett, E., Coizet, V., Blaha, C., Martindale, J., Lefebvre, V., Walton, N., Mayhew, J., Overton, P., Redgrave, P.: How visual stimuli activate dopaminergic neurons at short latency. Science 307(5714), 1476–1479 (2005)

    Article  Google Scholar 

  11. Fiorillo, C.D., Tobler, P.N., Schultz, W.: Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299(5614), 1898 (2003)

    Article  Google Scholar 

  12. Friston, K.: A theory of cortical responses. Philos. Trans. R. Soc. B Biol. Sci. 360(1456), 815–836 (2005)

    Article  Google Scholar 

  13. Gene Ontology Consortium: Creating the gene ontology resource: Design and implementation. Genome Res. 11(8), 1425–1433 (2001)

    Article  Google Scholar 

  14. Gruber, T.: A translation approach to portable ontology specification. http://www-ksl.stanford.edu/kst/what-is-an-ontology.html (1992)

  15. Gurney, K., Humphries, M., Redgrave, P.: Cortico-striatal plasticity for action-outcome learning using spike timing dependent eligibility. BMC Neurosci. 10(Suppl. 1), P135 (2009a)

    Article  Google Scholar 

  16. Gurney, K., Hussain, A., Chambers, J., Abdullah, R.: Controlled and automatic processing in animals and machines with application to autonomous vehicle control. In: Controlled and Automatic Processing in Animals and Machines with Application to Autonomous Vehicle Control, Lecture Notes in Computer Science, vol. 5768, pp. 198–207. Springer, Berlin (2009b)

    Google Scholar 

  17. Ikeda, T., Hikosaka, O.: Reward-dependent gain and bias of visual responses in primate superior colliculus. Neuron 39(4), 693–700 (2003)

    Article  Google Scholar 

  18. Körding, K.P., Wolpert, D.M.: Bayesian decision theory in sensorimotor control. Trends Cogn. Sci. 10(7), 319–326 (2006)

    Article  Google Scholar 

  19. Marr, D., Poggio, T.: From understanding computation to understanding neural circuitry. Technical report, MIT AI Laboratory (1976)

    Google Scholar 

  20. Matsumoto, M., Hikosaka, O.: Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447(7148), 1111–1115 (2007)

    Article  Google Scholar 

  21. Oudeyer, P., Kaplan, F.: What is intrinsic motivation? a typology of computational approaches. Front. Neurorobot. 1, 6 (2007). PMID 18958277

    Article  Google Scholar 

  22. Poggio, T., Koch, C.: Ill-posed problems in early vision: From computational theory to analogue networks. Proc. R. Soc. Lond. B. Biol. Sci. 226(1244), 303 (1985)

    Article  Google Scholar 

  23. Ranganath, C., Rainer, G.: Neural mechanisms for detecting and remembering novel events. Nat. Rev. Neurosci. 4(3), 193–202 (2003)

    Article  Google Scholar 

  24. Redgrave, P., Gurney, K.: The short-latency dopamine signal: A role in discovering novel actions? Nat. Rev. Neurosci. 7(12) (2006)

    Google Scholar 

  25. Redgrave, P., Gurney, K., Reynolds, J.: What is reinforced by phasic dopamine signals? Brain Res. Rev. 58(2), 322–339 (2008)

    Article  Google Scholar 

  26. Redgrave, P., Gurney, K., Stafford, T., Thirkettle, M., Lewis, J.: The role of the basal ganglia in discovering novel actions. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems, pp. 129–149. Springer, Berlin (2012)

    Google Scholar 

  27. Redgrave, P., Prescott, T., Gurney, K.: The basal ganglia: A vertebrate solution to the selection problem? Neuroscience 89, 1009–1023 (1999)

    Article  Google Scholar 

  28. Reynolds, J.N.J., Wickens, J.R.: Dopamine-dependent plasticity of corticostriatal synapses. Neural Netw. 15(4–6), 507–521 (2002)

    Article  Google Scholar 

  29. Ryan, R.M., Deci, E.L.: Intrinsic and extrinsic motivations: Classic definitions and new directions. 1. Contemp. Educ. Psychol. 25(1), 54–67 (2000)

    Google Scholar 

  30. Schleidt, M., Kien, J.: Segmentation in behavior and what it can tell us about brain function. Hum. Nat. 8(1), 77–111 (1997)

    Article  Google Scholar 

  31. Schmidhuber, J.: Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In: Anticipatory Behavior in Adaptive Learning Systems, pp. 48–76 (2009)

    Google Scholar 

  32. Schultz, W.: Dopamine signals for reward value and risk: Basic and recent data. Behav. Brain Funct. 6(1), 24 (2010)

    Article  Google Scholar 

  33. Schultz, W., Dayan, P., Montague, P.: A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)

    Article  Google Scholar 

  34. Snyder, L.H., Batista, A.P., Andersen, R.A.: Coding of intention in the posterior parietal cortex. Nature 386(6621), 167–170 (1997)

    Article  Google Scholar 

  35. Sokolov, E.N.: Higher nervous functions: The orienting reflex. Annu. Rev. Physiol. 25(1), 545–580 (1963)

    Article  Google Scholar 

  36. Sutton, R., Barto, A.: Reinforcement Learning : An Introduction. MIT, Cambridge (1998)

    MATH  Google Scholar 

  37. Thompson, K.G., Bichot, N.P., Sato, T.R.: Frontal eye field activity before visual search errors reveals the integration of Bottom-Up and Top-Down salience. J. Neurophysiol. 93(1), 337–351 (2005)

    Article  Google Scholar 

  38. Timberlake, W., Lucas, G.A.: The basis of superstitious behavior: Chance contingency, stimulus substitution, or appetitive behavior? J. Exp. Anal. Behav. 44(3), 279 (1985)

    Article  Google Scholar 

  39. Tobler, P., Fiorillo, C., Schultz, W.: Adaptive coding of reward value by dopamine neurons. Science 307(5715), 1642 (2005)

    Article  Google Scholar 

  40. Tolman, E.: Cognitive maps in rats and men. Psychol. Rev. 55(4), 189 (1948)

    Article  Google Scholar 

  41. Wurtz, R.H., Albano, J.E.: Visual-motor function of the primate superior colliculus. Annu. Rev. Neurosci. 3(1), 189–226 (1980)

    Article  Google Scholar 

  42. Yin, H.H., Knowlton, B.J.: The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 7(6), 464–476 (2006)

    Article  Google Scholar 

Download references

Acknowledgements

Written while the authors were in receipt of research funding from The Wellcome Trust, BBSRC and EPSRC.

This research has also received funds from the European Commission 7th Framework Programme (FP7/2007-2013), “Challenge 2 - Cognitive Systems, Interaction, Robotics”, Grant Agreement No. ICT-IP-231722, Project “IM-CLeVeR—Intrinsically Motivated Cumulative Learning Versatile Robots”. (NL was partially supported by EU Framework project EFAA (ICT-270490))

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin Gurney .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Gurney, K., Lepora, N., Shah, A., Koene, A., Redgrave, P. (2013). Action Discovery and Intrinsic Motivation: A Biologically Constrained Formalisation. In: Baldassarre, G., Mirolli, M. (eds) Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32375-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32375-1_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32374-4

  • Online ISBN: 978-3-642-32375-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics