Skip to main content
Log in

Learning and control of exploration primitives

  • Published:
Journal of Computational Neuroscience Aims and scope Submit manuscript

Abstract

Animals explore novel environments in a cautious manner, exhibiting alternation between curiosity-driven behavior and retreats. We present a detailed formal framework for exploration behavior, which generates behavior that maintains a constant level of novelty. Similar to other types of complex behaviors, the resulting exploratory behavior is composed of exploration motor primitives. These primitives can be learned during a developmental period, wherein the agent experiences repeated interactions with environments that share common traits, thus allowing transference of motor learning to novel environments. The emergence of exploration motor primitives is the result of reinforcement learning in which information gain serves as intrinsic reward. Furthermore, actors and critics are local and ego-centric, thus enabling transference to other environments. Novelty control, i.e. the principle which governs the maintenance of constant novelty, is implemented by a central action-selection mechanism, which switches between the emergent exploration primitives and a retreat policy, based on the currently-experienced novelty. The framework has only a few parameters, wherein time-scales, learning rates and thresholds are adaptive, and can thus be easily applied to many scenarios. We implement it by modeling the rodent’s whisking system and show that it can explain characteristic observed behaviors. A detailed discussion of the framework’s merits and flaws, as compared to other related models, concludes the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ahissar, E. (1998). Temporal-code to rate-code conversion by neuronal phase-locked loops. Neural Computer, 10(3), 597–650.

    Article  CAS  Google Scholar 

  • Ahissar, E., & Kleinfeld, D. (2003). Closed-loop neuronal computations: focus on vibrissa somatosensation in rat. Cereb Cortex, 13(1), 53–62.

    Article  PubMed  Google Scholar 

  • Ahissar, E., & Knutsen, P.M. (2008). Object localization with whiskers. Biol Cybern, 98, 449–458.

    Article  PubMed  Google Scholar 

  • Ahissar, E., & Oram, T. (2013). Thalamic relay or cortico-thalamic processing? Old question, New Answers. Cerebral Cortex: bht296.

  • Baldassarre, G. (2011). What are intrinsic motivations? a biological perspective. IEEE International conference developmental learning (ICDL), (Vol. 2, pp. 1–8).

  • Bahar, A., Dudai, Y., Ahissar, E. (2004). Neural signature of taste familiarity in the gustatory cortex of the freely behaving rat. J Neurophysiol, 92, 3298–3308.

    Article  PubMed  Google Scholar 

  • Barnett, S.A. (1958). Exploratory behaviour. Br J Psychol, 49(4), 289–310.

    Article  CAS  PubMed  Google Scholar 

  • Barto, A.G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamical System, 13(1–2), 41–77.

    Article  Google Scholar 

  • Barto, A.G., Singh, S., Chentanez, N. (2004). Intrinsically motivated learning of hierarchical collections of skills. In International conference on developmental learning (ICDL).

  • Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., Friston, K.J. (2012). Canonical microcircuits for predictive coding. Neuron, 76(4), 695–711.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Behera, L., Gopal, M., Chaudhury, S. (1995). Self-organizing neural networks for learning inverse dynamics of robot manipulator. In IEEE/IAS International conference on industrial automation and control (I A & C’95) (pp. 457–460).

  • Berg, R.W., & Kleinfeld, D. (2003). Rhythmic whisking by rat: Retraction as well as protraction of the vibrissae is under active muscular control. Journal of Neurophysiol, 89(1), 104–117.

    Article  Google Scholar 

  • Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M. (2007). Incremental natural actor-critic algorithms. In Twenty-first annual conference on advances in neural information processing systems (pp. 105–112).

  • Cools, R., Nakamura, K., Daw, N.D. (2011). Serotonin and dopamine: Unifying affective, activational, and decision functions. Neuropsychopharmacology, 36(1), 98–113.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Der, R., & Martius, G. (2012). The playful machine. Cognitive System Monographia. Springer.

  • Deschenes, M., Moore, J.W., Kleinfeld, D. (2012). Sniffing and whisking in rodents. Current Opinion in Neurobiology, 22(2), 243–250.

    Article  CAS  PubMed  Google Scholar 

  • Deutsch, D., Pietr, M., Knutsen, P.M., Ahissar, E., Schneidman, E. (2012). Fast feedback in active sensing: touch-induced changes to whisker-object interaction. PLoS One, 7(9), e44, 272.

    Article  CAS  Google Scholar 

  • Diamond, M.E., von Heimendahl, M., Knutsen, P.M., Kleinfeld, D., Ahissar, E. (2008). Where and what in the whisker sensorimotor system. Natural Reviews Neuroscience, 9(8), 601–612.

    Article  CAS  Google Scholar 

  • Elliot, A.J. (2006). The hierarchical model of approach-avoidance motivation. Motivaton and Emotion, 30, 111–116.

    Article  Google Scholar 

  • Fanselow, E.E., Sameshima, K., Baccala, L.A., Nicolelis, M.A. (2001). Thalamic bursting in rats during different awake behavioral states. Proceedings of the National Academy of Sciences of the United States of America, 98(26), 15330–5.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Feldmeyer, D., Brecht, M., Helmchen, F., Petersen, C.CH., Poulet, J.FA., Staiger, J.F., Luhmann, H.J., Schwarz, C. (2012). Barrel cortex function. Progress in Neurobiology, 103(0), 3–27.

    PubMed  Google Scholar 

  • File, S.E. (2001). Factors controlling measures of anxiety and responses to novelty in the mouse. Behavioural Brain Research, 125(1–2), 151–7.

    Article  CAS  PubMed  Google Scholar 

  • Flash, T., & Handzel, A.A. (2007). Affine differential geometry analysis of human arm movements. Biological Cybernetics, 96(6), 577–601.

    Article  PubMed Central  PubMed  Google Scholar 

  • Flash, T., & Hochner, B. (2005). Motor primitives in vertebrates and invertebrates. Current Opinion in Neurobiology, 15(6), 660–6.

    Article  CAS  PubMed  Google Scholar 

  • Fonio, E., Benjamini, Y., Golani, I. (2009). Freedom of movement and the stability of its unfolding in free exploration of mice. Proceedings of the National Academy of Sciences of the United States of America, 106(50), 21, 335–40.

    Article  CAS  Google Scholar 

  • Fox, C.J., Girdhar, N., Gurney, K.N. (2008). A causal bayesian network view of reinforcement learning. In Twenty-first international florida artificial intelligence research society conference (pp. 109–110). AAAI Press.

  • Friston, K. (2010). The free-energy principle: a unified brain theory?. Nature Reviews Neuroscience, 11(2), 127–38.

    Article  CAS  PubMed  Google Scholar 

  • Frommberger, L., & Wolter, D. (2010). Structural knowledge transfer by spatial abstraction for reinforcement learning agents. Adaptive Behavior - Animals, Animats, Software Agents, Robot, Adaptive System, 18(6), 507–525.

    Google Scholar 

  • Gao, P., Bermejo, R., Zeigler, H.P. (2001). Whisker deafferentation and rodent whisking patterns: Behavioral evidence for a central pattern generator. Journal of Neuroscience, 21(14), 5374–5380.

    CAS  PubMed  Google Scholar 

  • Gordon, G., & Ahissar, E. (2012). Hierarchical curiosity loops and active sensing. Neural Network, 32, 119–29.

    Article  Google Scholar 

  • Gordon, G., Kaplan, D.M., Lankow, B., Little, D.Y., Sherwin, J., Suter, B.A., Thaler, L. (2011). Toward an integrated approach to perception and action: conference report and future directions. Frontiers System Neuroscience, 5, 20.

    Article  Google Scholar 

  • Grant, R.A., Mitchinson, B., Fox, C.W., Prescott, T.J. (2009). Active touch sensing in the rat: anticipatory and regulatory control of whisker movements during surface exploration. Journal of Neurophysiology, 101(2), 862–74.

    Article  PubMed Central  PubMed  Google Scholar 

  • Grant, R.A., Mitchinson, B., Prescott, T.J. (2012). The development of whisker control in rats in relation to locomotion. Developmental Psychobiology, 54(2), 151–168.

    Article  PubMed  Google Scholar 

  • Guillery, R.W., & Sherman, S.M. (2012). The thalamus as a monitor of motor outputs. Philosophical Transactions R Society London B Biological Sciences, 357(1428), 1809–1821.

    Article  Google Scholar 

  • Harish, O., & Golomb, D. (2010). Control of the firing patterns of vibrissa motoneurons by modulatory and phasic synaptic inputs: a modeling study. Journal of Neurophysiology, 103(5), 2684–99.

    Article  PubMed  Google Scholar 

  • Harlow, H.F. (1950). Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. Journal of Comparative & Physiological Psychology, 43(4), 289–94.

    Article  CAS  Google Scholar 

  • Hill, D.N., Bermejo, R., Zeigler, H.P., Kleinfeld, D. (2008). Biomechanics of the vibrissa motor plant in rat: Rhythmic whisking consists of triphasic neuromuscular activity. Journal of Neurophysiology, 28(13), 3438–3455.

    CAS  Google Scholar 

  • Hughes, R.N. (2007). Neotic preferences in laboratory rodents: issues, assessment and substrates. Neuroscience and Biobehavioral, 31(3), 441–64.

    Article  Google Scholar 

  • Kaelbling, L.P., Littman, M.L., Moore, A.W. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237–285.

    Google Scholar 

  • Kawato, M.M. (1999). Internal models for motor control and trajectory planning. Current Opinion in Neurobiology, 9, 718–727.

    Article  CAS  PubMed  Google Scholar 

  • Kleinfeld, D., Ahissar, E., Diamond, M.E. (2006). Active sensation: insights from the rodent vibrissa sensorimotor system. Current Opinion in Neurobiology, 16(4), 435–44.

    Article  CAS  PubMed  Google Scholar 

  • Knutsen, P.M., Biess, A., Ahissar, E. (2008). Vibrissal kinematics in 3d: Tight coupling of azimuth, elevation, and torsion across different whisking modes. Neuron, 59(1), 35–42.

    Article  CAS  PubMed  Google Scholar 

  • Konidaris, G., & Barto, A. (2007). Building portable options: skill transfer in reinforcement learning. In Proceedings of the 20th international joint conference on artifical intelligence (pp. 895–900). Hyderabad: Morgan Kaufmann.

  • Lalazar, H., & Vaadia, E. (2008). Neural basis of sensorimotor learning: modifying internal models. Current Opinion in Neurobiology, 18((6)), 573–581.

    Article  CAS  PubMed  Google Scholar 

  • Leiser, S.C., & Moxon, K.A. (2007). Responses of trigeminal ganglion neurons during natural whisking behaviors in the awake rat. Neuron, 53(1), 117–33.

    Article  CAS  PubMed  Google Scholar 

  • Little, D.Y., & Sommer, F.T. (2013). Learning and exploration in action-perception loops. Frontiers in Neural Circuits (in press).

  • Matyas, F., Sreenivasan, V., Marbach, F., Wacongne, C., Barsy, B., Mateo, C., Aronoff, R., Petersen, C.C. (2010). Motor control by sensory cortex. Science, 330(6008), 1240–3.

    Article  CAS  PubMed  Google Scholar 

  • Misslin, R., & Cigrang, M. (1986). Does neophobia necessarily imply fear or anxiety?. Behavior Processes, 12(1), 45–50.

    Article  CAS  Google Scholar 

  • Mitchinson, B., Martin, C.J., Grant, R.A., Prescott, T.J. (2007). Feedback control in active sensing: rat exploratory whisking is modulated by environmental contact. Proceedings of the Biological Sciences, 274(1613), 1035–41.

    Article  Google Scholar 

  • Miyazaki, M., Yamamoto, S., Uchida, S., Kitazawa, S. (2006). Bayesian calibration of simultaneity in tactile temporal order judgment. Nature Neuroscience, 9(7), 875–7.

    Article  CAS  PubMed  Google Scholar 

  • Moldovan, T.M., & Abbeel, P. (2012). Safe exploration in markov decision processes. In ICML 2012.

  • Ngo, H., Luciw, M., Foerster, A., Schmidhuber, J. (2012). Learning skills from play: Artificial curiosity on a katana robot arm. In IJCNN 2012.

  • Nicolelis, M.A., Baccala, L.A., Lin, R.C., Chapin, J.K. (1995). Sensorimotor encoding by synchronous neural ensemble activity at multiple levels of the somatosensory system. Science, 268(5215), 1353–8.

    Article  CAS  PubMed  Google Scholar 

  • Niv, Y., Daw, N.D., Joel, D., Dayan, P. (2007). Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacol (Berl), 191(3), 507–520.

    Article  CAS  Google Scholar 

  • Oudeyer, P.Y., Kaplan, F., Hafner, V.V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computations, 11(2), 265–286.

    Article  Google Scholar 

  • Ouyang, P.R., Zhang, W.J., Gupta, M.M. (2006). An adaptive switching learning control method for trajectory tracking of robot manipulators. Metchatronics, 16, 51–61.

    Article  Google Scholar 

  • Pape, L., Oddo, C.M., Controzzi, M., Cipriani, C., Frster, A., Carrozza, M.C., Schmidhuber, J. (2012). Learning tactile skills through curious exploration. Frontiers in Neurorobotics, 6.

  • Polani, D. (2009). Information: currency of life. HFSP Journal, 3(5), 307–16.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Precup, D., Sutton, R.A., Dasgupta, S. (2001). Off-policy temporal difference learning with function approximation. In Proceedings of the eighteenth international conference on machine learning (pp. 417–424).

  • Redgrave, P. (2007). Basal ganglia. Scholarpedia, 2(6), 1825.

    Article  Google Scholar 

  • Richardson, M.J., & Flash, T. (2002). Comparing smooth arm movements with the two-thirds power law and the related segmented-control hypothesis. Journal of Neuroscience, 22(18), 8201–11.

    CAS  PubMed  Google Scholar 

  • Saig, A., Gordon, G., Assa, E., Arieli, A., Ahissar, E. (2012). Motor-sensory confluence in tactile perception. Journal of Neuroscience, 32(40), 14,022–32.

    Article  CAS  Google Scholar 

  • Schmidhuber, J. (1990). A possibility for implementing curiosity and boredom in model-building neural controllers. In Proceedings of the first international conference on simulation of adaptive behavior on from animals to animats (Vol. 116542, pp. 222? 227). MIT Press.

  • Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230–247.

    Article  Google Scholar 

  • Schultz, W., Dayan, P., Montague, P.R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–9.

    Article  CAS  PubMed  Google Scholar 

  • Semba, K., Szechtman, H., Komisaruk, B.R. (1980). Synchrony among rhythmical facial tremor, neocortical ’alpha’ waves, and thalamic non-sensory neuronal bursts in intact awake rats. Brain Research, 195(2), 281–98.

    Article  CAS  PubMed  Google Scholar 

  • Sesack, S.R., & Grace, A.A. (2009). Cortico-basal ganglia reward network: Microcircuitry. Neuropsychopharmacology, 35(1), 27–47.

    Article  PubMed Central  Google Scholar 

  • Shadmehr, R., & Krakauer, J.W. (2008). A computational neuroanatomy for motor control. Experimentalis Brain Research, 185(3), 359–81.

    Article  Google Scholar 

  • Simony, E., Bagdasarian, K., Herfst, L., Brecht, M., Ahissar, E., Golomb, D. (2010). Temporal and spatial characteristics of vibrissa responses to motor commands. Journal of Neuroscience, 30(26), 8935–8952.

    Article  CAS  PubMed  Google Scholar 

  • Singh, S, Lewis, R.L., Barto, A.G., Sorg, J. (2010). Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions Autonomous Mental Development, 2(2), 70–82.

    Article  CAS  Google Scholar 

  • Stolle, M., & Precup, D. (2002). Learning options in reinforcement learning, lecture notes in computer science (Vol. 2371, pp. 212–223). Berlin Heidelberg: Springer.

    Google Scholar 

  • Sutton, R., Precup, D., Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence in Engineering, 112, 181–211.

    Article  Google Scholar 

  • Sutton, R.A., Modayil, J., Delp, M., Degris, T., Pilarski, P.M., White, A., Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th international conference on autonomous agents and multiagent systems - volume 2, international foundation for autonomous agents and multiagent systems, 2031726 (pp. 761–768).

  • Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44.

    Google Scholar 

  • Szwed, M., Bagdasarian, K., Ahissar, E. (2003). Encoding of vibrissal active touch. Neuron, 40(3), 621–30.

    Article  CAS  PubMed  Google Scholar 

  • Szwed, M., Bagdasarian, K., Blumenfeld, B., Barak, O., Derdikman, D., Ahissar, E. (2006). Responses of trigeminal ganglion neurons to the radial distance of contact during active vibrissal touch. Journal of Neurophysiology, 95(2), 791–802.

    Article  PubMed  Google Scholar 

  • Tchernichovski, O., & Benjamini, Y. (1998). The dynamics of long-term exploration in the rat. part ii. an analytical model of the kinematic structure of rat exploratory behavior. Biological Cybernetics, 78(6), 433–40.

    Article  CAS  PubMed  Google Scholar 

  • Tchernichovski, O., Benjamini, Y., Golani, I. (1998). The dynamics of long-term exploration in the rat. part i. a phase-plane analysis of the relationship between location and velocity. Biological Cybernetics, 78(6), 423–32.

    Article  CAS  PubMed  Google Scholar 

  • Tinbergen, N. (1951). The study of instinct. New York: Oxford University Press.

  • Tishby, N., & Polani, D. (2011). Information theory of decisions and actions. Springer series in cognitive and neural systems (chap. 19 pp. 601–636). New York: Springer.

  • Towal, R.B., & Hartmann, M.J. (2006). Right-left asymmetries in the whisking behavior of rats anticipate head movements. Journal of Neuroscience, 26(34), 8838–46.

    Article  CAS  PubMed  Google Scholar 

  • Towal, R.B., & Hartmann, M.J. (2008). Variability in velocity profiles during free-air whisking behavior of unrestrained rats. Journal of Neurophysiology, 100(2), 740–52.

    Article  PubMed  Google Scholar 

  • Vergassola, M., Villermaux, E., Shraiman, B.I. (2007). Infotaxis as a strategy for searching without gradients. Natural, 445(7126), 406–9.

    Article  CAS  Google Scholar 

  • Wawrzynski, P., & Pacut, A. (2004). Model-free off-policy reinforcement learning in continuous environment. In Proceedings of the 2004 IEEE international joint conference on neural networks, 2004. (vol 2, pp. 1091–1096).

  • Weng, J. (2004). Developmental robotics: theory and experiments. International Journal Humanoid Robotics, 1(2), 199–236.

    Article  Google Scholar 

  • Whishaw, I.Q., Gharbawie, O.A., Clark, B.J., Lehmann, H. (2006). The exploratory behavior of rats in an open environment optimizes security. Behavior Brain Research, 171(2), 230–9.

    Article  Google Scholar 

  • Yu, C., Horev, G., Rubin, N., Derdikman, D., Haidarliu, S., Ahissar, E. (2013). Coding of object location in the vibrissal thalamocortical system. Cerebral Cortex: bht241.

Download references

Acknowledgments

We thank Karl Friston, Daniel Polani and Fritz Sommer for helpful comments and discussions. This work was supported by the Israel Science Foundation grant #749/10 and the United States-Israel Bi-national Science Foundation grant #2011432. E.A. holds the Helen Diller Family Chair in Neurobiology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Goren Gordon.

Additional information

Action Editor: T. Sejnowski

Conflict of interests

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gordon, G., Fonio, E. & Ahissar, E. Learning and control of exploration primitives. J Comput Neurosci 37, 259–280 (2014). https://doi.org/10.1007/s10827-014-0500-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10827-014-0500-1

Keywords

Navigation