Abstract.
The present paper discusses an optimal learning control method using reinforcement learning for biological systems with a redundant actuator. It is difficult to apply reinforcement learning to biological control systems because of the redundancy in muscle activation space. We solve this problem with the following method. First, we divide the control input space into two subspaces according to a priority order of learning and restrict the search noise for reinforcement learning to the first priority subspace. Then the constraint is reduced as the learning progresses, with the search space extending to the second priority subspace. The higher priority subspace is designed so that the impedance of the arm can be high. A smooth reaching motion is obtained through reinforcement learning without any previous knowledge of the arm’s dynamics.
Similar content being viewed by others
References
An K, Kwak B, Chao E, Morrey B (1984) Determination of muscle and joint forces: a new technique to solve the indeterminate problem. Trans Am Soc Mech Eng 106:364–367
Barto A, Sutton R, Abderson C (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Sys Man Cybern 13(5):834–846
Bizzi E, Mussa-Ivaldi FA, Giszter S (1991) Computations underlying the execution of movement: a biological perspective. Science 253:287–291
Dormont J, Conde H, Farin D (1998) The role of the pedunculopontine tegmental nucleus in relation to conditioned motor performance in the cat. Exp Brain Res 121:401–410
Doya K (2000) Reinforcement Learning in continuous time and space. Neural Comput 12: 219–245
Doya K (1999) What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw 12:961–974
Doya K (2002) Metalearning and neuromodulation. Neural Netw 15:495–506
Feldman A (1986) Once more on the equilibrium point hypothesis (λ model) for motor control. J Mot Behav 18(1):17–54
Flash T, Hogan N (1985) The coordination of arm movements: an experimentally confirmed mathematical model. J Neurosci 5:1688–1703
Harris C (1998) On the optimal control of behavior: a stochastic perspective. J Neurosci Methods 83:73–88
Hogan N (1984) Adaptive control of mechanical impedance by coactivation of antagonistic muscles. IEEE Trans Automat Control AC (29):681–690
Houk J, Adams J, Barto A (1994) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Models of information processing in the basal ganglia. MIT Press, Cambridge, MA
Ito K, Ito M (1991) Motion control in living bodies and roberts (in Japanese). Soc Instrum Control Eng pp. 133–140
Joel D, Niv Y, Ruppin E (2002) Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw 15:535–547
Jordan M, Wolpert D (1999) The cognitive neuroscience, chap 42. MIT Press, Cambridge, MA
Katayama M, Inoue S, Kawato M (1993) A strategy of motor learning using adjustable parameters for arm movement. In: Proceedings of the 20th annual international conference of the IEEE Engineering in Medicine and Biology Society, pp 2370–2373
Mclntyre J, Bizzi E (1993) Servo hypotheses for the biological control of movement. J Mot Behav 25(3):193–202
Montague P, Dayan P, Sejnowski T (1996) A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16:1936–1947
Nelson W (1983) Physical principles for economies of skilled movements. Biol Cybern 46:135–147
Osu R, Franklin D, Kato H, Gomi H, Yoshioka KDT, Kawato M (2002) Short- and long-term changes in joint co-contraction associated with motor learning as revealed from surface emg. J Neurophysiol 88:991–1004
Pearson K, Gordon J (2000) Spinal reflexes, chap 36. McGraw-Hill, New York
Saltiel P, Wyler-duda K, D’Avella A, Tresch M, Bizzi E (2001) Muscle synergies encoded within the spinal cord: evidence from focal intraspinal nmda iontophoresis in the frog. J Neurophysiol 85:605–619
Sanger T (1994) Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Trans Robot Automat 10(3):323–333
Schultz W, Dayan P, Montague P (1997) A neural substrate of prediction and reward. Science 275(14):1593–1598
Shibata K, Sugisaka M, Ito K (2000) Hand reaching movement acquired through reinforcement learning. In: Proceedings of Korea Automatic Control Conference (KACC 2000), vol 90 (CD-ROM)
Suri R (2002) Td models of reward predictive responses in dopamine neurons. Neural Netw 15:523–533
Sutton R, Barto A (1998) Reinforcement learning. MIT Press, Cambridge, MA
Takakusaki K, Habaguchi T, Ohinata-sugimoto J, Saito K, Sakamoto T (2003) Basal ganglia efferents to the brainstem centers controlling postual muscle tone and locomotion: a new concept for understanding motor disorders in basal ganglia dysfunction. J Neurosci 119:293–308
Takakusaki K, Kohyama J, Matsuyama K, Mori S (2001) Medullary reticulospinal tract mediating the generalized motor inhibition in cats: parallel inhibitory mechanisms acting on motoneurons and on interneuronal transmission in reflex pathways. J Neurosci 103:511–527
Thelen E, Smith L (1994) Dynamic systems approach to the development of cognition and action. MIT Press/Bradford Books, Cambridge, MA
Thoroughman K, Shadmehr R (1999) Electromyographic correlates of learning an internal model of reaching movements. J Neurosci 19(19):8573–8588
Todrodov E, Jordan M (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5(11):1226–1235
Tresch M, Saltiel P, Bizzi E (1999) The construction of movement by the spinal cord. Nat Neurosci 2:162–167
Uno Y, Kawato M, Suzuki R (1989) Formation and control of optimal trajectory in human multijoint arm movement. Minimum torque-change model. Biol Cybern 61(2):89–101
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Izawa, J., Kondo, T. & Ito, K. Biological arm motion through reinforcement learning. Biol. Cybern. 91, 10–22 (2004). https://doi.org/10.1007/s00422-004-0485-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00422-004-0485-3