Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring

Hussein, Maryem; Keshk, Marwa; Hussein, Aya

doi:10.1007/978-981-99-8391-9_28

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14472))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

546 Accesses

Abstract

The remarkable success achieved by Reinforcement learning (RL) in recent years is mostly confined to stationary environments. In realistic settings, RL agents can encounter non-stationarity when the environmental dynamics change over time. Detecting when this change occurs is crucial for activating adaptation mechanisms at the right time. Existing research on change detection mostly relies on model-based techniques which are challenging for tasks with large state and action spaces. In this paper, we propose a model-free, low-cost approach based on value functions (V or Q) for detecting non-stationarity. The proposed approach calculates the change in the value function (\(\varDelta V\) or \(\varDelta Q\)) and monitors the distribution of this change over time. Statistical hypothesis testing is used to detect if the distribution of \(\varDelta V\) or \(\varDelta Q\) changes significantly over time, reflecting non-stationarity. We evaluate the proposed approach in three benchmark RL environments and show that it can successfully detect non-stationarity when changes in the environmental dynamics are introduced at different magnitudes and speeds. Our experiments also show that changes in \(\varDelta V\) or \(\varDelta Q\) can be used for context identification leading to a classification accuracy of up to 88%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Achiam, J.: Spinning up in deep reinforcement learning. GitHub repository (2018)
Google Scholar
Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., Abbeel, P.: Continuous adaptation via meta-learning in nonstationary and competitive environments. In: International Conference on Learning Representations (2018)
Google Scholar
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Sig. Process. Mag. 34(6), 26–38 (2017)
Article Google Scholar
Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019)
Canonaco, G., Restelli, M., Roveri, M.: Model-free non-stationarity detection and adaptation in reinforcement learning. In: Proceedings of the Twenty-Fourth European Conference on Artificial Intelligence, ECAI, pp. 1047–1054. IOS Press (2020)
Google Scholar
Choi, S.P.M., Yeung, D.-Y., Zhang, N.L.: Hidden-mode Markov decision processes for nonstationary sequential decision making. In: Sun, R., Giles, C.L. (eds.) Sequence Learning. LNCS (LNAI), vol. 1828, pp. 264–287. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44565-X_12
Chapter Google Scholar
Da Silva, B.C., Basso, E.W., Perotto, F.S., Bazzan, A.L.C., Engel, P.M.: Improving reinforcement learning with context detection. In: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent systems, pp. 810–812 (2006)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Google Scholar
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: Proceedings of the Thirty-Fifth International Conference on Machine Learning, pp. 1587–1596. PMLR (2018)
Google Scholar
Gamrian, S., Goldberg, Y.: Transfer learning for related reinforcement learning tasks via image-to-image translation. In: International Conference on Machine Learning, pp. 2063–2072. PMLR (2019)
Google Scholar
Hadoux, E., Beynier, A., Weng, P.: Sequential decision-making under non-stationary environments via sequential change-point detection. In: Learning Over Multiple Contexts (LMCE) (2014)
Google Scholar
Jaderberg, M., et al.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364(6443), 859–865 (2019)
Article MathSciNet Google Scholar
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, vol. 12 (1999)
Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
Google Scholar
van Otterlo, M., Wiering, M.: Reinforcement learning and Markov decision processes. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 3–42. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3_1
Chapter Google Scholar
Pope, A.P., et al.: Hierarchical reinforcement learning for air-to-air combat. arXiv preprint arXiv:2105.00990 (2021)
Qiao, G., Weiss, B.A.: Quick health assessment for industrial robot health degradation and the supporting advanced sensing development. J. Manuf. Syst. 48, 51–59 (2018). Special Issue on Smart Manufacturing
Article Google Scholar
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10(7), 1633–1685 (2009)
MathSciNet Google Scholar
Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 5824–5836. Curran Associates, Inc. (2020)
Google Scholar

Download references

Acknowledgement

This work was funded by the Department of Defence and the Office of National Intelligence under the AI for Decision Making Program, delivered in partnership with the NSW Defence Innovation Network Grant Number RG213520.

Author information

Authors and Affiliations

Faculty of Engineering, Cairo University, Giza, Egypt
Maryem Hussein
School of Professional Studies, University of New South Wales, Canberra, Australia
Marwa Keshk
School of Systems and Computing, University of New South Wales, Canberra, Australia
Aya Hussein

Authors

Maryem Hussein
View author publications
You can also search for this author in PubMed Google Scholar
Marwa Keshk
View author publications
You can also search for this author in PubMed Google Scholar
Aya Hussein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aya Hussein .

Editor information

Editors and Affiliations

The University of Sydney, Darlington, NSW, Australia
Tongliang Liu
Monash University, Clayton, VIC, Australia
Geoff Webb
The University of Newcastle, Callaghan, NSW, Australia
Lin Yue
CSIRO Data61, Sydney, NSW, Australia
Dadong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hussein, M., Keshk, M., Hussein, A. (2024). Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_28

Download citation

DOI: https://doi.org/10.1007/978-981-99-8391-9_28
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring