Reinforcement Learning: Psychologische und neurobiologische Aspekte

Tokic, Michel

doi:10.1007/s13218-013-0261-4

Reinforcement Learning: Psychologische und neurobiologische Aspekte

Technical Contribution
Published: 12 June 2013

Volume 27, pages 213–219, (2013)
Cite this article

KI - Künstliche Intelligenz Aims and scope Submit manuscript

Michel Tokic¹

825 Accesses
Explore all metrics

Zusammenfassung

Mathematische Modelle von neurobiologisch und psychologisch inspirierten Lernparadigmen gelten als Schlüsseltechnologie für Problemstellungen, die anhand klassischer Programmierung schwer zu lösen sind. Reinforcement Learning ist in diesem Zusammenhang eines dieser Paradigmen, welches mittlerweile recht erfolgreich in der Praxis eingesetzt wird (u. a. in der Robotik), um Verhalten durch Versuch und Irrtum zu erlernen. In diesem Artikel möchte ich etwas näher auf die in Zusammenhang stehenden neurobiologischen und psychologischen Aspekte eingehen, welche das Vorbild einer Vielzahl mathematischer Modelle sind. Gesamtheitlich betrachtet ist Reinforcement Learning nicht ausschließlich für Lernen im Gehirn von Menschen und Tieren verantwortlich. Stattdessen findet ein großartiges Zusammenspiel mehrerer Paradigmen aus unterschiedlichen Hirnarealen statt, bei welchem auch Supervised- und Unsupervised Learning beteiligt sind.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Bei Batch-Training wird der Fehler offline über einer Menge mehrerer Input-Output-Muster minimiert, anstatt online für jedes Einzelne.
Für alle s,a muss der Vorhersagefehler in (2) Null sein.

Literatur

Albus JS (1971) A theory of cerebellar function. Math Biosci 10(1–2):25–61
Article Google Scholar
Artola A, Bröcher S, Singer W (1990) Different voltage-dependent thresholds for inducing long-term depression and long-term potentiation in slices of rat visual cortex. Nature 347(6288):69–72
Article Google Scholar
Aston-Jones G, Cohen JD (2005) An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu Rev Neurosci 28:403–450
Article Google Scholar
Barto AG (1995) Adaptive critics and the basal ganglia. In: Models of information processing in the basal ganglia. MIT Press, Cambridge, S 215–232
Google Scholar
Blakemore C, Cooper GF (1970) Development of the brain depends on the visual environment. Nature 228(5270):477–478
Article Google Scholar
Bostan AC, Dum RP, Strick PL (2010) The basal ganglia communicate with the cerebellum. Proc Natl Acad Sci USA 107(18):8452–8456
Article Google Scholar
Cohen JD, McClure SM, Yu AJ (2007) Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philos Trans R Soc Lond B, Biol Sci 362(1481):933–942
Article Google Scholar
Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441(7095):876–879
Article Google Scholar
Dayan P (2009) Prospective and retrospective temporal difference learning. Networks 20(1):32–46
Article MathSciNet Google Scholar
Distler M (2012) Können Lernalgorithmen interagieren wie im Gehirn? Bachelor-thesis, Fachgebiet für Intelligente Autonome Systeme, Technische Universität Darmstadt
Doya K (1999) What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw 12(7–8):961–974
Article Google Scholar
Doya K (2000) Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol 10(6):732–739
Article Google Scholar
Doya K (2007) Reinforcement learning: computational theory and biological mechanisms. HFSP Journal 1(1):30–40
Article Google Scholar
Doya K (2008) Modulators of decision making. Nat Neurosci 11(4):410–416
Article Google Scholar
van Eck NJ, van Wezel M (2008) Application of reinforcement learning to the game of othello. Comput Oper Res 35:1999–2017
Article MathSciNet MATH Google Scholar
Ertle P, Tokic M, Cubek R, Voos H, Söffker D (2012) Towards learning of safety knowledge from human demonstrations. In: Proceedings of the 25th IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE Press, New York
Google Scholar
Faußer S, Schwenker F (2008) Neural approximation of Monte Carlo policy evaluation deployed in connect four. In: Artificial neural networks in pattern recognition. LNAI, Bd 5064. Springer, Berlin, S 90–100
Chapter Google Scholar
Faußer S, Schwenker F (2010) Learning a strategy with neural approximated temporal-difference methods in English draughts. In: Proceedings of the 20th international conference on pattern recognition (ICPR’10). IEEE Press, New York, S 2925–2928
Chapter Google Scholar
Handrich S, Herzog A, Wolf A, Herrmann CS (2011) Combining supervised, unsupervised, and reinforcement learning in a network of spiking neurons. In: Advances in cognitive neurodynamics (II). Springer, Berlin, S 163–176
Chapter Google Scholar
Hans A, SchneegaßD, Schäfer AM, Udluft S (2008) Safe exploration for reinforcement learning. In: Proceedings of the 16th European symposium on artificial neural networks (ESANN), S 143–148
Google Scholar
Hebb DO (1949) The organization of behavior: a neuropsychological theory. Wiley, New York
Google Scholar
Hirsch HVB, Spinelli DN (1970) Visual experience modifies distribution of horizontally and vertically oriented receptive fields in cats. Science 168(3933):869–871
Article Google Scholar
Houk JC, Wise SP (1995) Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: their role in planning and controlling action. Cereb Cortex 5(2):95–110
Article Google Scholar
Ito M, Sakurai M, Tongroach P (1982) Climbing fibre induced depression of both mossy fibre responsiveness and glutamate sensitivity of cerebellar purkinje cells. J Gen Physiol 324(1):113–134
Google Scholar
Kietzmann TC, Riedmiller M (2009) The neuro slot car racer: reinforcement learning in a real world setting. In: Proceedings of the 4th international conference on machine learning and applications (ICMLA). IEEE Press, New York, S 311–316
Google Scholar
Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robots 33(4):361–379
Article Google Scholar
Marr D (1969) A theory of cerebellar cortex. J Gen Physiol 202(2):437–470.1
Google Scholar
Maslow AH (1943) A theory of human motivation. Psychol Rev 50(4):370–396
Article Google Scholar
McClure SM, Gilzenrat MS, Cohen JD (2006) An exploration-exploitation model based on norepinephrine and dopamine activity. In: Advances in neural information processing systems, Bd 18. MIT Press, Cambridge, S 867–874
Google Scholar
Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006) Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9(8):1057–1063
Article Google Scholar
Ngo H, Luciw M, Förster A, Schmidhuber J (2012) Learning skills from play: artificial curiosity on a katana robot arm. In: Proceedings of the international joint conference of neural networks (IJCNN 2012), Brisbane, Australia, S 1–8
Chapter Google Scholar
Niv Y (2009) Reinforcement learning in the brain. J Math Psychol 53(3):139–154
MathSciNet MATH Google Scholar
Niv Y, Daw ND, Dayan P (2006) Choice values. Nat Neurosci 9(8):987–988
Article Google Scholar
Pavlov IP (1927) Conditioned reflexes—an investigation of the physiological activity of the cerebral cortex. Oxford University Press, London. Translated and edited by GV Anrep
Google Scholar
Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7–9):1180–1190
Article Google Scholar
Peters J, Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697
Article Google Scholar
Rescorla R, Wagner A (1972) A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Classical conditioning II: current research and theory. Appleton-Century-Crofts, New York, S 64–99
Google Scholar
Riedmiller M (2005) Neural fitted Q iteration—first experiences with a data efficient neural reinforcement learning method. In: Machine learning: ECML 2005. LNCS, Bd 3720. Springer, Berlin, S 317–328
Chapter Google Scholar
Riedmiller M, Gabel T, Hafner R, Lange S (2009) Reinforcement learning for robot soccer. Auton Robots 27(1):55–73
Article Google Scholar
Riedmiller M, Montemerlo M, Dahlkamp H (2007) Learning to drive a real car in 20 minutes. In: Proceedings of the FBIT 2007 conference, Jeju, Korea. Springer, Berlin
Google Scholar
Roesch MR, Calu DJ, Schoenbaum G (2007) Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci 10(12):1615–1624
Article Google Scholar
Sasakawa T, Hu J, Hirasawa K (2008) A brainlike learning system with supervised, unsupervised, and reinforcement learning. Electr Eng Jpn 162(1):32–39
Article Google Scholar
Schneider M, Ertel W (2010) Robot learning by demonstration with local Gaussian process regression. In: Proceedings of the 23rd IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE Press, New York, S 255–260
Google Scholar
Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80(1):1–27
Google Scholar
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599
Article Google Scholar
Simsek O, Barto AG (2006) An intrinsic reward mechanism for efficient exploration. In: Proceedings of the 23rd international conference on machine learning, S 833–840
Google Scholar
Skinner BF (1953) Science and human behavior. Macmillan, New York
Google Scholar
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Tesauro G (2002) Programming backgammon using self-teaching neural nets. Artif Intell 134(1–2):181–199
Article MATH Google Scholar
Thorndike EL (1911) Animal intelligence. Macmillan, New York
Google Scholar
Thrun S (1995) Learning to play the game of chess. In: Advances in neural information processing systems, Bd 7. MIT Press, Cambridge, S 1069–1076
Google Scholar
Togelius J, Schaul T, Wierstra D, Igel C, Gomez F, Schmidhuber J (2009) Ontogenetic and phylogenetic reinforcement learning. Künstl Intell 03/2009:30–33
Google Scholar
Tokic M, Fessler J, Ertel W (2009) The crawler, a class room demonstrator for reinforcement learning. In: Proceedings of the 22th international florida artificial intelligence research society conference (FLAIRS). AAAI Press, New York, S 160–165
Google Scholar
Tokic M, Palm G (2011) Value-difference based exploration: adaptive control between epsilon-greedy and softmax. In: KI 2011: advances in artificial intelligence. LNAI, Bd 7006. Springer, Berlin, S 335–346
Chapter Google Scholar
Tokic M, Palm G (2012) Adaptive exploration using stochastic neurons. In: Artificial neural networks and machine learning – ICANN 2012. LNCS, Bd 7553. Springer, Berlin, S 42–49
Chapter Google Scholar
Tokic M, Palm G (2012) Gradient algorithms for Exploration/Exploitation trade-offs: global and local variants. In: Artificial neural networks in pattern recognition. LNAI, Bd 7477. Springer, Berlin, S 60–71
Chapter Google Scholar
Tsumoto T, Suda K (1979) Cross-depression: an electrophysiological manifestation of binocular competition in the developing visual cortex. Brain Res 168(1):190–194
Article Google Scholar
Vitay J, Fix J, Beuth F, Schroll H, Hamker F (2009) Biological models of reinforcement learning. Künstl Intell 03(2009):12–18
Google Scholar
Wardle F (1987) Getting back to the basics of children’s play. Child Care Inf Exch 57:27–30
Google Scholar
Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge, Cambridge, England
Watson JB, Rayner R (1920) Conditioned emotional reactions. J Exp Psychol 3(1):1–14
Article Google Scholar
Wierstra D, Förster A, Peters J, Schmidhuber J (2010) Recurrent policy gradients. Log J IGPL 18(5):620–634
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Neuroinformatik, Universität Ulm, James-Frank-Ring, 89069, Ulm, Deutschland
Michel Tokic

Authors

Michel Tokic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michel Tokic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tokic, M. Reinforcement Learning: Psychologische und neurobiologische Aspekte. Künstl Intell 27, 213–219 (2013). https://doi.org/10.1007/s13218-013-0261-4

Download citation

Received: 15 March 2013
Accepted: 31 May 2013
Published: 12 June 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s13218-013-0261-4

Schlüsselwörter

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning: Psychologische und neurobiologische Aspekte

Zusammenfassung

Access this article

Notes

Literatur

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Schlüsselwörter

Search

Navigation