Skip to main content

Intrinsic Motivation and Reinforcement Learning

  • Chapter
  • First Online:

Abstract

Psychologists distinguish between extrinsically motivated behavior, which is behavior undertaken to achieve some externally supplied reward, such as a prize, a high grade, or a high-paying job, and intrinsically motivated behavior, which is behavior done for its own sake. Is an analogous distinction meaningful for machine learning systems? Can we say of a machine learning system that it is motivated to learn, and if so, is it possible to provide it with an analog of intrinsic motivation? Despite the fact that a formal distinction between extrinsic and intrinsic motivation is elusive, this chapter argues that the answer to both questions is assuredly “yes” and that the machine learning framework of reinforcement learning is particularly appropriate for bringing learning together with what in animals one would call motivation. Despite the common perception that a reinforcement learning agent’s reward has to be extrinsic because the agent has a distinct input channel for reward signals, reinforcement learning provides a natural framework for incorporating principles of intrinsic motivation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The phrase computational RL is used here because this framework is not a theory of biological RL despite what it borrows from, and suggests about, biological RL. Throughout this chapter, RL refers to computational RL.

  2. 2.

    RL certainly does not exclude analogs of innate behavioral patterns in artificial agents. The success of many systems using RL methods depends on the careful definition of innate behaviors, as in Hart and Grupen (2012).

  3. 3.

    The term critic is used, and not “teacher”, because in machine learning a teacher provides more informative instructional information, such as directly telling the agent what its actions should have been instead of merely scoring them.

  4. 4.

    It is important to note that the adaptive critic of these methods is inside the RL agent, while the different critic shown in Fig. 1—that provides the primary reward signal—is in the RL agent’s environment.

  5. 5.

    Deci and Ryan (1985) mention that the term intrinsic motivation was first used by Harlow (1950) in a study showing that rhesus monkeys will spontaneously manipulate objects and work for hours to solve complicated mechanical puzzles without any explicit rewards.

  6. 6.

    Schmidhuber (2009) would argue that it is the other way around—that control is a result of behavior directed to improve predictive models, which in this author’s opinion is at odds with what we know about evolution.

  7. 7.

    These comments apply to the “passive” form of supervised learning and not necessarily to the extension known as “active learning” (Settles 2009), in which the learning agent itself chooses training examples. Although beyond this chapter’s scope, active supervised learning is indeed relevant to the subject of intrinsic motivation.

  8. 8.

    We are relying on a commonsense notion of an organism’s boundary with its external environment, recognizing that this may be not be easy to define.

  9. 9.

    Figure 2 shows the organism containing a single RL agent, but an organism might contain many, each possibly having its own reward signal. Although not considered here, the multi-agent RL case (Busoniu et al. 2008) poses many challenges and opportunities.

References

  1. Ackley, D.H., Littman, M.: Interactions between learning and evolution. In: Langton, C., Taylor, C., Farmer, C., Rasmussen, S. (eds.) Artificial Life II (Proceedings Volume X in the Santa Fe Institute Studies in the Sciences of Complexity, pp. 487–509. Addison-Wesley, Reading (1991)

    Google Scholar 

  2. Andry, P., Gaussier, P., Nadel, J., Hirsbrunner, B.: Learning invariant sensorimotor behaviors: A developmental approach to imitation mechanisms. Adap. Behav. 12, 117–140 (2004)

    Google Scholar 

  3. Arkes, H.R., Garske, J.P.: Psychological Theories of Motivation. Brooks/Cole, Monterey (1982)

    Google Scholar 

  4. Baranes, A., Oudeyer, P.-Y.: Intrinsically motivated goal exploration for active motor learning in robots: A case study. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010), Taipei, Taiwan 2010

    Google Scholar 

  5. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discr. Event Dynam. Syst. Theory Appl. 13, 341–379 (2003)

    MathSciNet  MATH  Google Scholar 

  6. Barto, A.G., Singh, S., Chentanez, N.: Intrinsically motivated learning of hierarchical collections of skills. In: Proceedings of the International Conference on Developmental Learning (ICDL), La Jolla, CA 2004

    Google Scholar 

  7. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike elements that can solve difficult learningcontrol problems. 13, 835–846 (1983). IEEE Trans. Sys. Man, Cybern. Reprinted in J.A. Anderson and E. Rosenfeld (eds.), Neurocomputing: Foundations of Research, pp. 535–549, MIT, Cambridge (1988)

    Google Scholar 

  8. Beck, R.C.: Motivation. Theories and Principles, 2nd edn. Prentice-Hall, Englewood Cliffs (1983)

    Google Scholar 

  9. Berlyne, D.E.: A theory of human curiosity. Br. J. Psychol. 45, 180–191 (1954)

    Google Scholar 

  10. Berlyne, D.E.: Conflict, Arousal., Curiosity. McGraw-Hill, New York (1960)

    Google Scholar 

  11. Berlyne, D.E.: Curiosity and exploration. Science 143, 25–33 (1966)

    Google Scholar 

  12. Berlyne, D.E.: Aesthetics and Psychobiology. Appleton-Century-Crofts, New York (1971)

    Google Scholar 

  13. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  14. Bindra, D.: How adaptive behavior is produced: A perceptual-motivational alternative to response reinforcement. Behav. Brain Sci. 1, 41–91 (1978)

    Google Scholar 

  15. Breazeal, C., Brooks, A., Gray, J., Hoffman, G., Lieberman, J., Lee, H., Lockerd, A., Mulanda, D.: Tutelage and collaboration for humanoid robots. Int. J. Human. Robot. 1 (2004)

    Google Scholar 

  16. Bush, V.: Science the endless frontier: Areport to the president. Technical report (1945)

    Google Scholar 

  17. Busoniu, L., Babuska, R., Schutter, B.D.: A comprehensive survey of multi-agent reinforcement learning. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 38(2), 156–172 (2008)

    Google Scholar 

  18. Cannon, W.B.: The Wisdom of the Body. W.W. Norton, New York (1932)

    Google Scholar 

  19. Clark, W.A., Farley, B.G.: Generalization of pattern recognition in a self-organizing system. In: AFIPS’ 55 (Western) Proceedings of the March 1–3, 1955, Western Joint Computer Conference, Los Angeles, CA, pp. 86–91, ACM, New York (1955)

    Google Scholar 

  20. Cofer, C.N., Appley, M.H.: Motivation: Theory and Research. Wiley, New York (1964)

    Google Scholar 

  21. Damoulas, T., Cos-Aguilera, I., Hayes, G.M., Taylor, T.: Valency for adaptive homeostatic agents: Relating evolution and learning. In: Capcarrere, M.S., Freitas, A.A., Bentley, P.J., Johnson, C.G., Timmis, J. (eds.) Advances in Artificial Life: 8th European Conference, ECAL 2005. Canterbury, UK LNAI vol. 3630, pp. 936–945. Springer, Berlin (2005)

    Google Scholar 

  22. Daw, N.D., Shohamy, D.: The cognitive neuroscience of motivation and learning. Soc. Cogn. 26(5), 593–620 (2008)

    Google Scholar 

  23. Dayan, P.: Motivated reinforcement learning. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14: Proceedings of the 2001 Conference, pp. 11–18. MIT, Cambridge (2001)

    Google Scholar 

  24. Deci, E.L., Ryan, R.M.: Intrinsic Motivation and Self-Determination in Human Behavior. Plenum, New York (1985)

    Google Scholar 

  25. Dember, W.N., Earl, R.W.: Analysis of exploratory, manipulatory, and curiosity behaviors. Psychol. Rev. 64, 91–96 (1957)

    Google Scholar 

  26. Dember, W.N., Earl, R.W., Paradise, N.: Response by rats to differential stimulus complexity. J. Comp. Physiol. Psychol. 50, 514–518 (1957)

    Google Scholar 

  27. Dickinson, A., Balleine, B.: The role of leaning in the operation of motivational systems. In: Gallistel, R. (ed.) Handbook of Experimental Psychology, 3rd edn. Learning, Motivation, and Emotion, pp. 497–533. Wiley, New York (2002)

    Google Scholar 

  28. Elfwing, S., Uchibe, E., Doya, K., Christensen, H.I.: Co-evolution of shaping rewards and meta-parameters in reinforcement learning. Adap. Behav. 16, 400–412 (2008)

    Google Scholar 

  29. Epstein, A.: Instinct and motivation as explanations of complex behavior. In: Pfaff, D.W. (ed.) The Physiological Mechanisms of Motivation. Springer, New York (1982)

    Google Scholar 

  30. Friston, K.J., Daunizeau, J., Kilner, J., Kiebel, S.J.: Action and behavior: A free-energy formulation. Biol. Cybern. (2010). Pubished online February 11, 2020

    Google Scholar 

  31. Groos, K.: The Play of Man. D. Appleton, New York (1901)

    Google Scholar 

  32. Harlow, H.F.: Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. J. Comp. Physiol. Psychol. 43, 289–294 (1950)

    Google Scholar 

  33. Harlow, H.F., Harlow, M.K., Meyer, D.R.: Learning motivated by a manipulation drive. J. Exp. Psychol. 40, 228–234 (1950)

    Google Scholar 

  34. Hart, S., Grupen, R.: Intrinsically motivated affordance discovery and modeling. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin (2012, this volume)

    Google Scholar 

  35. Hebb, D.O.: The Organization of Behavior. Wiley, New York (1949)

    Google Scholar 

  36. Hendrick, I.: Instinct and ego during infancy. Psychoanal. Quart. 11, 33–58 (1942)

    Google Scholar 

  37. Hesse, F., Der, R., Herrmann, M., Michael, J.: Modulated exploratory dynamics can shape self-organized behavior. Adv. Complex Syst. 12(2), 273–292 (2009)

    MathSciNet  MATH  Google Scholar 

  38. Hull, C.L.: Principles of Behavior. D. Appleton-Century, New York (1943)

    Google Scholar 

  39. Hull, C.L.: Essentials of Behavior. Yale University Press, New Haven (1951)

    Google Scholar 

  40. Hull, C.L.: A Behavior System: An Introduction to Behavior Theory Concerning the Individual Organism. Yale University Press, New Haven (1952)

    Google Scholar 

  41. Kimble, G.A.: Hilgard and Marquis’ Conditioning and Learning. Appleton-Century-Crofts, Inc., New York (1961)

    Google Scholar 

  42. Klein, S.B.: Motivation. Biosocial Approaches. McGraw-Hill, New York (1982)

    Google Scholar 

  43. Klopf, A.H.: Brain function and adaptive systems—A heterostatic theory. Technical report AFCRL-72-0164, Air Force Cambridge Research Laboratories, Bedford. A summary appears in Proceedings of the International Conference on Systems, Man, and Cybernetics, 1974, IEEE Systems, Man, and Cybernetics Society, Dallas (1972)

    Google Scholar 

  44. Klopf, A.H.: The Hedonistic Neuron: A Theory of Memory, Learning, and Intelligence. Hemisphere, Washington (1982)

    Google Scholar 

  45. Lenat, D.B.: AM: An artificial intelligence approach to discovery in mathematics. Ph.D. Thesis, Stanford University (1976)

    Google Scholar 

  46. Linden, D.J.: The Compass of Pleasure: How Our Brains Make Fatty Foods, Orgasm, Exercise, Marijuana, Generosity, Vodka, Learning, and Gambling Feel So Good. Viking, New York (2011)

    Google Scholar 

  47. Littman, M.L., Ackley, D.H.: Adaptation in constant utility nonstationary environments. In: Proceedings of the Fourth International Conference on Genetic Algorithms, San Diego, CA pp. 136–142 (1991)

    Google Scholar 

  48. Lungarella, M., Metta, G., Pfeiffer, R., Sandini, G.: Developmental robotics: A survey. Connect. Sci. 15, 151–190 (2003)

    Google Scholar 

  49. Mackintosh, N.J.: Conditioning and Associative Learning. Oxford University Press, New York (1983)

    Google Scholar 

  50. McFarland, D., Bösser, T.: Intelligent Behavior in Animals and Robots. MIT, Cambridge (1993)

    Google Scholar 

  51. Mendel, J.M., Fu, K.S. (eds.): Adaptive, Learning, and Pattern Recognition Systems: Theory and Applications. Academic, New York (1970)

    MATH  Google Scholar 

  52. Mendel, J.M., McLaren, R.W.: Reinforcement learning control and pattern recognition systems. In: Mendel, J.M., Fu, K.S. (eds.) Adaptive, Learning and Pattern Recognition Systems:Theory and Applications, pp. 287–318. Academic, New York (1970)

    Google Scholar 

  53. Michie, D., Chambers, R.A.: BOXES: An experiment in adaptive control. In: Dale, E., Michie, D. (eds.) Machine Intelligence 2, pp. 137–152. Oliver and Boyd, Edinburgh (1968)

    Google Scholar 

  54. Minsky, M.L.: Theory of neural-analog reinforcement systems and its application to the brain-model problem. Ph.D. Thesis, Princeton University (1954)

    Google Scholar 

  55. Minsky, M.L.: Steps toward artificial intelligence. Proc. Inst. Radio Eng. 49, 8–30 (1961). Reprinted in E.A. Feigenbaum and J. Feldman (eds.) Computers and Thought, pp. 406–450. McGraw-Hill, New York (1963)

    Google Scholar 

  56. Mollenauer, S.O.: Shifts in deprivations level: Different effects depending on the amount of preshift training. Learn. Motiv. 2, 58–66 (1971)

    Google Scholar 

  57. Narendra, K., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice Hall, Englewood Cliffs (1989)

    Google Scholar 

  58. Olds, J., Milner, P.: Positive reinforcement produced by electrical stimulation of septal areas and other regions of rat brain. J. Comp. Physiol. Psychol. 47, 419–427 (1954)

    Google Scholar 

  59. Oudeyer, P.-Y., Kaplan, F.: What is intrinsic motivation? A typology of computational approaches. Front. Neurorobot. 1:6, doi: 10.3389/neuro.12.006.2007 (2007)

    Google Scholar 

  60. Oudeyer, P.-Y., Kaplan, F., Hafner, V.: Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11, 265–286 (2007)

    Google Scholar 

  61. Petri, H.L.: Motivation: Theory and Research. Wadsworth Publishing Company, Belmont (1981)

    Google Scholar 

  62. Piaget, J.: The Origins of Intelligence in Children. Norton, New York (1952)

    Google Scholar 

  63. Picard, R.W.: Affective Computing. MIT, Cambridge (1997)

    Google Scholar 

  64. Prince, C.G., Demiris, Y., Marom, Y., Kozima, H., Balkenius, C. (eds.): Proceedings of the Second International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, vol. 94. Lund University, Lund (2001)

    Google Scholar 

  65. Rescorla, R.A., Wagner, A.R.: A theory of Pavlovian conditioning: Variationsin the effectiveness of reinforcement and nonreinforcement. In: Black, A.H., Prokasy, W.F. (eds.) Classical Conditioning, vol. II, pp. 64–99. Appleton-Century-Crofts, New York (1972)

    Google Scholar 

  66. Rosenblatt, F.: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington (1962)

    MATH  Google Scholar 

  67. Rumelhart, D., Hintont, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    MATH  Google Scholar 

  68. Ryan, R.M., Deci, E.L.: Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemp. Educ. Psychol. 25, 54–67 (2000)

    Google Scholar 

  69. Samuelson, L.: Introduction to the evolution of preferences. J. Econ. Theory 97, 225–230 (2001)

    MATH  Google Scholar 

  70. Samuelson, L., Swinkels, J.: Information, evolution, and utility. Theor. Econ. 1, 119–142 (2006)

    Google Scholar 

  71. Savage, T.: Artificial motives: A review of motivation in artificial creatures. Connect. Sci. 12, 211–277 (2000)

    Google Scholar 

  72. Schembri, M., Mirolli, M., Baldassarre, G.: Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot. In: Proceedings of the 6th International Conference on Development and Learning (ICDL2007), Imperial College, London 2007

    Google Scholar 

  73. Schmidhuber, J.: Adaptive confidence and adaptive curiosity. Technical report FKI-149-91, Institut für Informatik, Technische Universität München (1991a)

    Google Scholar 

  74. Schmidhuber, J.: A possibility for implementing curiosity and boredom in model-building neural controllers. In: From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, pp. 222–227. MIT, Cambridge (1991b)

    Google Scholar 

  75. Schmidhuber, J.: What’s interesting? Technical report TR-35-97. IDSIA, Lugano (1997)

    Google Scholar 

  76. Schmidhuber, J.: Artificial curiosity based on discovering novel algorithmic predictability through coevolution. In: Proceedings of the Congress on Evolutionary Computation, vol. 3, pp. 1612–1618. IEEE (1999)

    Google Scholar 

  77. Schmidhuber, J.: Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In: Pezzulo, G., Butz, M.V., Sigaud, O., Baldassarre, G. (eds.) Anticipatory Behavior in Adaptive Learning Systems. From Psychological Theories to Artificial Cognitive Systems, pp. 48–76. Springer, Berlin (2009)

    Google Scholar 

  78. Schultz, W.: Predictive reward signal of dopamine neurons. J. Neurophysiol. 80(1), 1–27 (1998)

    Google Scholar 

  79. Schultz, W.: Reward. Scholarpedia 2(3), 1652 (2007a)

    Google Scholar 

  80. Schultz, W.: Reward signals. Scholarpedia 2(6), 2184 (2007b)

    Google Scholar 

  81. Scott, P.D., Markovitch, S.: Learning novel domains through curiosity and conjecture. In: Sridharan, N.S. (ed.) Proceedings of the 11th International Joint Conference on Artificial Intelligence, Detroit, MI pp. 669–674. Morgan Kaufmann, San Francisco (1989)

    Google Scholar 

  82. Settles, B.: Active learning literature survey. Technical Report 1648, Computer Sciences, University of Wisconsin-Madison, Madison (2009)

    Google Scholar 

  83. Singh, S., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference. MIT, Cambridge (2005)

    Google Scholar 

  84. Singh, S., Lewis, R.L., Barto, A.G.: Where do rewards come from? In: Taatgen, N., van Rijn, H. (eds.) Proceedings of the 31st Annual Conference of the Cognitive Science Society, Amsterdam pp. 2601–2606. Cognitive Science Society (2009)

    Google Scholar 

  85. Singh, S., Lewis, R.L., Barto, A.G., Sorg, J.: Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Trans. Auton. Mental Dev. 2(2), 70–82 (2010). Special issue on Active Learning and Intrinsically Motivated Exploration in Robots: Advances and Challenges

    Google Scholar 

  86. Snel, M., Hayes, G.M.: Evolution of valence systems in an unstable environment. In: Proceedings of the 10th International Conference on Simulation of Adaptive Behavior: From Animals to Animats, Osaka, M. Asada, J.C. Hallam, J.-A. Meyer (Eds.) pp. 12–21 (2008)

    Google Scholar 

  87. Sorg, J., Singh, S., Lewis, R.L.: Internal rewards mitigate agent boundedness. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, Omnipress pp. 1007–1014 (2010)

    Google Scholar 

  88. Sutton, R.S.: Reinforcement learning architectures for animats. In: From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, J.-A. Meyer, S.W.Wilson (Eds.) pp. 288–296. MIT, Cambridge (1991)

    Google Scholar 

  89. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT, Cambridge (1998)

    MATH  Google Scholar 

  90. Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: A framework for temporal abstraction inreinforcement learning. Artif. Intell. 112, 181–211 (1999)

    MATH  Google Scholar 

  91. Tesauro, G.J.: TD—gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6(2), 215–219 (1994)

    Google Scholar 

  92. Thomaz, A.L., Breazeal, C.: Transparency and socially guided machine learning. In: Proceedings of the 5th International Conference on Developmental Learning (ICDL) Bloomington, IN (2006)

    Google Scholar 

  93. Thomaz, A.L., Hoffman, G., Breazeal, C.: Experiments in socially guided machine learning: Understanding how humans teach. In: Proceedings of the 1st Annual conference on Human-Robot Interaction (HRI) Salt Lake City, UT (2006)

    Google Scholar 

  94. Thorndike, E.L.: Animal Intelligence. Hafner, Darien (1911)

    Google Scholar 

  95. Toates, F.M. (1911): Motivational Systems. Cambridge University Press, Cambridge (1911)

    Google Scholar 

  96. Tolman, E.C.: Purposive Behavior in Animals and Men. Naiburg, New York (1932)

    Google Scholar 

  97. Trappl, R., Petta, P., Payr, S. (eds.): Emotions in Humans and Artifacts. MIT, Cambridge (1997)

    MATH  Google Scholar 

  98. Uchibe, E., Doya, K.: Finding intrinsic rewards by embodied evolution and constrained reinforcement learning. Neural Netw. 21(10), 1447–1455 (2008)

    MATH  Google Scholar 

  99. Waltz, M.D., Fu, K.S.: A heuristic approach to reinforcement learning control systems. IEEE Transactions on Automatic Control 10, 390–398 (1965)

    Google Scholar 

  100. Weng, J., McClelland, J., Pentland, A., Sporns, O., Stockman, I., Sur, M., Thelen, E.: Autonomous mental development by robots and animals. Science 291, 599–600 (2001)

    Google Scholar 

  101. Werbos, P.J.: Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Trans. Sys. Man Cybern. 17, 7–20 (1987)

    Google Scholar 

  102. White, R.W.: Motivation reconsidered: The concept of competence. Psychol. Rev. 66, 297–333 (1959)

    Google Scholar 

  103. Widrow, B., Gupta, N.K., Maitra, S.: Punish/reward: Learning with a critic in adaptive thresholdsystems. IEEE Trans. Sys. Man Cybern. 3, 455–465 (1973)

    MATH  Google Scholar 

  104. Widrow, B., Hoff, M.E.: Adaptive switching circuits. In: 1960 WESCON Convention Record Part IV, pp. 96–104. Institute of Radio Engineers, New York (1960). Reprinted in J.A. Anderson and E. Rosenfeld, Neurocomputing: Foundations of Research, pp. 126–134. MIT, Cambridge (1988)

    Google Scholar 

  105. Young, P.T.: Hedonic organization and regulation of behavior. Psychol. Rev. 73, 59–86 (1966)

    Google Scholar 

Download references

Acknowledgements

The author thanks Satinder Singh, Rich Lewis, and Jonathan Sorg for developing the evolutionary perspective on this subject and for their important insights, and colleagues Sridhar Mahadevan and Rod Grupen, along with current and former members of the Autonomous Learning Laboratory who have participated in discussing intrinsically motivated reinforcement learning: Bruno Castro da Silva, Will Dabney, Jody Fanto, George Konidaris, Scott Kuindersma, Scott Niekum, Özgür Şimşek, Andrew Stout, Phil Thomas, Chris Vigorito, and Pippin Wolfe. The author thanks Pierre-Yves Oudeyer for his many helpful suggestions, especially regarding non-RL approaches to intrinsic motivation. Special thanks are due to Prof. John W. Moore, whose expert guidance through the psychology literature was essential to writing this chapter, though any misrepresentation of psychological thought is strictly due to the author. This research has benefitted immensely from the author’s association with the European Community 7th Framework Programme (FP7/2007-2013), ”Challenge 2: Cognitive Systems, Interaction, Robotics”, Grant Agreement No. ICT-IP-231722, project ”IM-CLeVeR: Intrinsically Motivated Cumulative Learning Versatile Robots.” Some of the research described here was supported by the National Science Foundation under Grant No. IIS-0733581 and by the Air Force Office of Scientific Research under grant FA9550-08-1-0418. Any opinions, findings, conclusions, or recommendations expressed here are those of the author and do not necessarily reflect the views of the sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew G. Barto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Barto, A.G. (2013). Intrinsic Motivation and Reinforcement Learning. In: Baldassarre, G., Mirolli, M. (eds) Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32375-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32375-1_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32374-4

  • Online ISBN: 978-3-642-32375-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics