Skip to main content

Adapting to Environment Changes Through Neuromodulation of Reinforcement Learning

  • Conference paper
  • First Online:
From Animals to Animats 16 (SAB 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13499))

Included in the following conference series:

  • 518 Accesses

Abstract

Reinforcement learning (RL) enables agents to learn actions that can give maximum reward in an interactive environment. The environment is normally described as a predefined Markov Decision Process (MDP) which is assumed to remain unchanged throughout the life of RL agents. However, RL faces challenges when the environment changes. First, the agent needs to be able to detect the environment change rapidly when it occurs. Second, the agent needs to retain the knowledge learned before the environmental change. When facing an environment that was interacted with before, the learned knowledge should be recalled and utilized. To overcome these two challenges, we developed a biologically-inspired neuromodulation system that enables RL agents to quickly detect and adapt to environment changes. Our neuromodulation system is inspired by the effects of the cholinergic (ACh) and noradrenergic (NE) neuromodulatory systems on tracking uncertainties in the environment. We conducted experiments in the Gridworld environment and on a simulated MuJoCo robot to demonstrate the efficacy of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  2. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  3. Akkaya, I., et al.: Solving Rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113 (2019)

  4. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  5. Kahn, G., Abbeel, P., Levine, S.: BADGR: an autonomous self-supervised learning-based navigation system. IEEE Robot. Autom. Lett. 6(2), 1312–1319 (2021)

    Article  Google Scholar 

  6. Xing, J., Zou, X., Krichmar, J.L.: Neuromodulated patience for robot and self-driving vehicle navigation. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, July 2020

    Google Scholar 

  7. Montague, P.R., Dayan, P., Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16(5), 1936–1947 (1996)

    Article  Google Scholar 

  8. Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)

    Article  Google Scholar 

  9. Hare, T.A., O’Doherty, J., Camerer, C.F., Schultz, W., Rangel, A.: Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. J. Neurosci. 28(22), 5623–5630 (2008)

    Article  Google Scholar 

  10. Parker, N.F., et al.: Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19(6), 845–854 (2016)

    Article  MathSciNet  Google Scholar 

  11. Angela, J.Y., Dayan, P.: Uncertainty, neuromodulation, and attention. Neuron 46(4), 681–692 (2005)

    Article  Google Scholar 

  12. Zou, X., Kolouri, S., Pilly, P.K., Krichmar, J.L.: Neuromodulated attention and goal-driven perception in uncertain domains. Neural Netw. 125, 56–69 (2020)

    Article  Google Scholar 

  13. Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for OpenAI Gym. GitHub Repository (2018). https://github.com/maximecb/gym-minigrid

  14. Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587–1596. PMLR, July 2018

    Google Scholar 

  15. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  16. Grella, S.L., et al.: Locus coeruleus phasic, but not tonic, activation initiates global remapping in a familiar environment. J. Neurosci. 39(3), 445–455 (2019)

    Article  Google Scholar 

  17. Bouret, S., Sara, S.J.: Network reset: a simplified overarching theory of locus coeruleus noradrenaline function. Trends Neurosci. 28(11), 574–582 (2005)

    Article  Google Scholar 

  18. Tassa, Y., et al.: Deepmind control suite. arXiv preprint arXiv:1801.00690 (2018)

Download references

Acknowledgements

This material is partially based upon work supported by the United States Air Force and DARPA under Contract No. FA8750-18-C-0103. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force and DARPA. Authors are also thankful to computing resources provided by CHASE-CI under NSF Grant CNS-1730158.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinwei Xing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xing, J., Zou, X., Pilly, P.K., Ketz, N.A., Krichmar, J.L. (2022). Adapting to Environment Changes Through Neuromodulation of Reinforcement Learning. In: Cañamero, L., Gaussier, P., Wilson, M., Boucenna, S., Cuperlier, N. (eds) From Animals to Animats 16. SAB 2022. Lecture Notes in Computer Science(), vol 13499. Springer, Cham. https://doi.org/10.1007/978-3-031-16770-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16770-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16769-0

  • Online ISBN: 978-3-031-16770-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics