Abstract
Reinforcement learning (RL) enables agents to learn actions that can give maximum reward in an interactive environment. The environment is normally described as a predefined Markov Decision Process (MDP) which is assumed to remain unchanged throughout the life of RL agents. However, RL faces challenges when the environment changes. First, the agent needs to be able to detect the environment change rapidly when it occurs. Second, the agent needs to retain the knowledge learned before the environmental change. When facing an environment that was interacted with before, the learned knowledge should be recalled and utilized. To overcome these two challenges, we developed a biologically-inspired neuromodulation system that enables RL agents to quickly detect and adapt to environment changes. Our neuromodulation system is inspired by the effects of the cholinergic (ACh) and noradrenergic (NE) neuromodulatory systems on tracking uncertainties in the environment. We conducted experiments in the Gridworld environment and on a simulated MuJoCo robot to demonstrate the efficacy of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Akkaya, I., et al.: Solving Rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113 (2019)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Kahn, G., Abbeel, P., Levine, S.: BADGR: an autonomous self-supervised learning-based navigation system. IEEE Robot. Autom. Lett. 6(2), 1312–1319 (2021)
Xing, J., Zou, X., Krichmar, J.L.: Neuromodulated patience for robot and self-driving vehicle navigation. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, July 2020
Montague, P.R., Dayan, P., Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16(5), 1936–1947 (1996)
Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)
Hare, T.A., O’Doherty, J., Camerer, C.F., Schultz, W., Rangel, A.: Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. J. Neurosci. 28(22), 5623–5630 (2008)
Parker, N.F., et al.: Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19(6), 845–854 (2016)
Angela, J.Y., Dayan, P.: Uncertainty, neuromodulation, and attention. Neuron 46(4), 681–692 (2005)
Zou, X., Kolouri, S., Pilly, P.K., Krichmar, J.L.: Neuromodulated attention and goal-driven perception in uncertain domains. Neural Netw. 125, 56–69 (2020)
Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for OpenAI Gym. GitHub Repository (2018). https://github.com/maximecb/gym-minigrid
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587–1596. PMLR, July 2018
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Grella, S.L., et al.: Locus coeruleus phasic, but not tonic, activation initiates global remapping in a familiar environment. J. Neurosci. 39(3), 445–455 (2019)
Bouret, S., Sara, S.J.: Network reset: a simplified overarching theory of locus coeruleus noradrenaline function. Trends Neurosci. 28(11), 574–582 (2005)
Tassa, Y., et al.: Deepmind control suite. arXiv preprint arXiv:1801.00690 (2018)
Acknowledgements
This material is partially based upon work supported by the United States Air Force and DARPA under Contract No. FA8750-18-C-0103. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force and DARPA. Authors are also thankful to computing resources provided by CHASE-CI under NSF Grant CNS-1730158.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xing, J., Zou, X., Pilly, P.K., Ketz, N.A., Krichmar, J.L. (2022). Adapting to Environment Changes Through Neuromodulation of Reinforcement Learning. In: Cañamero, L., Gaussier, P., Wilson, M., Boucenna, S., Cuperlier, N. (eds) From Animals to Animats 16. SAB 2022. Lecture Notes in Computer Science(), vol 13499. Springer, Cham. https://doi.org/10.1007/978-3-031-16770-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-16770-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16769-0
Online ISBN: 978-3-031-16770-6
eBook Packages: Computer ScienceComputer Science (R0)