Adapting to Environment Changes Through Neuromodulation of Reinforcement Learning

Xing, Jinwei; Zou, Xinyun; Pilly, Praveen K.; Ketz, Nicholas A.; Krichmar, Jeffrey L.

doi:10.1007/978-3-031-16770-6_10

Jinwei Xing¹²,
Xinyun Zou¹²,
Praveen K. Pilly¹³,
Nicholas A. Ketz¹³ &
…
Jeffrey L. Krichmar¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13499))

Included in the following conference series:

International Conference on Simulation of Adaptive Behavior

518 Accesses

Abstract

Reinforcement learning (RL) enables agents to learn actions that can give maximum reward in an interactive environment. The environment is normally described as a predefined Markov Decision Process (MDP) which is assumed to remain unchanged throughout the life of RL agents. However, RL faces challenges when the environment changes. First, the agent needs to be able to detect the environment change rapidly when it occurs. Second, the agent needs to retain the knowledge learned before the environmental change. When facing an environment that was interacted with before, the learned knowledge should be recalled and utilized. To overcome these two challenges, we developed a biologically-inspired neuromodulation system that enables RL agents to quickly detect and adapt to environment changes. Our neuromodulation system is inspired by the effects of the cholinergic (ACh) and noradrenergic (NE) neuromodulatory systems on tracking uncertainties in the environment. We conducted experiments in the Gridworld environment and on a simulated MuJoCo robot to demonstrate the efficacy of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Akkaya, I., et al.: Solving Rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113 (2019)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Kahn, G., Abbeel, P., Levine, S.: BADGR: an autonomous self-supervised learning-based navigation system. IEEE Robot. Autom. Lett. 6(2), 1312–1319 (2021)
Article Google Scholar
Xing, J., Zou, X., Krichmar, J.L.: Neuromodulated patience for robot and self-driving vehicle navigation. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, July 2020
Google Scholar
Montague, P.R., Dayan, P., Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16(5), 1936–1947 (1996)
Article Google Scholar
Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)
Article Google Scholar
Hare, T.A., O’Doherty, J., Camerer, C.F., Schultz, W., Rangel, A.: Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. J. Neurosci. 28(22), 5623–5630 (2008)
Article Google Scholar
Parker, N.F., et al.: Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19(6), 845–854 (2016)
Article MathSciNet Google Scholar
Angela, J.Y., Dayan, P.: Uncertainty, neuromodulation, and attention. Neuron 46(4), 681–692 (2005)
Article Google Scholar
Zou, X., Kolouri, S., Pilly, P.K., Krichmar, J.L.: Neuromodulated attention and goal-driven perception in uncertain domains. Neural Netw. 125, 56–69 (2020)
Article Google Scholar
Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for OpenAI Gym. GitHub Repository (2018). https://github.com/maximecb/gym-minigrid
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587–1596. PMLR, July 2018
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Grella, S.L., et al.: Locus coeruleus phasic, but not tonic, activation initiates global remapping in a familiar environment. J. Neurosci. 39(3), 445–455 (2019)
Article Google Scholar
Bouret, S., Sara, S.J.: Network reset: a simplified overarching theory of locus coeruleus noradrenaline function. Trends Neurosci. 28(11), 574–582 (2005)
Article Google Scholar
Tassa, Y., et al.: Deepmind control suite. arXiv preprint arXiv:1801.00690 (2018)

Download references

Acknowledgements

This material is partially based upon work supported by the United States Air Force and DARPA under Contract No. FA8750-18-C-0103. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force and DARPA. Authors are also thankful to computing resources provided by CHASE-CI under NSF Grant CNS-1730158.

Author information

Authors and Affiliations

University of California, Irvine, CA, 92697, USA
Jinwei Xing, Xinyun Zou & Jeffrey L. Krichmar
HRL Laboratories, LLC, Malibu, CA, 90265, USA
Praveen K. Pilly & Nicholas A. Ketz

Authors

Jinwei Xing
View author publications
You can also search for this author in PubMed Google Scholar
Xinyun Zou
View author publications
You can also search for this author in PubMed Google Scholar
Praveen K. Pilly
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas A. Ketz
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey L. Krichmar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinwei Xing .

Editor information

Editors and Affiliations

ETIS, CY Cergy Paris Université, Cergy-Pontoise, France
Lola Cañamero
ETIS, CY Cergy Paris Université, Cergy-Pontoise, France
Philippe Gaussier
Aberystwyth University, Aberystwyth, UK
Myra Wilson
ETIS, CY Cergy Paris Université, Cergy-Pontoise, France
Sofiane Boucenna
ETIS, CY Cergy Paris Université, Cergy-Pontoise, France
Nicolas Cuperlier

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xing, J., Zou, X., Pilly, P.K., Ketz, N.A., Krichmar, J.L. (2022). Adapting to Environment Changes Through Neuromodulation of Reinforcement Learning. In: Cañamero, L., Gaussier, P., Wilson, M., Boucenna, S., Cuperlier, N. (eds) From Animals to Animats 16. SAB 2022. Lecture Notes in Computer Science(), vol 13499. Springer, Cham. https://doi.org/10.1007/978-3-031-16770-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-16770-6_10
Published: 09 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16769-0
Online ISBN: 978-3-031-16770-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adapting to Environment Changes Through Neuromodulation of Reinforcement Learning