Abstract
The remarkable success achieved by Reinforcement learning (RL) in recent years is mostly confined to stationary environments. In realistic settings, RL agents can encounter non-stationarity when the environmental dynamics change over time. Detecting when this change occurs is crucial for activating adaptation mechanisms at the right time. Existing research on change detection mostly relies on model-based techniques which are challenging for tasks with large state and action spaces. In this paper, we propose a model-free, low-cost approach based on value functions (V or Q) for detecting non-stationarity. The proposed approach calculates the change in the value function (\(\varDelta V\) or \(\varDelta Q\)) and monitors the distribution of this change over time. Statistical hypothesis testing is used to detect if the distribution of \(\varDelta V\) or \(\varDelta Q\) changes significantly over time, reflecting non-stationarity. We evaluate the proposed approach in three benchmark RL environments and show that it can successfully detect non-stationarity when changes in the environmental dynamics are introduced at different magnitudes and speeds. Our experiments also show that changes in \(\varDelta V\) or \(\varDelta Q\) can be used for context identification leading to a classification accuracy of up to 88%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Achiam, J.: Spinning up in deep reinforcement learning. GitHub repository (2018)
Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., Abbeel, P.: Continuous adaptation via meta-learning in nonstationary and competitive environments. In: International Conference on Learning Representations (2018)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Sig. Process. Mag. 34(6), 26–38 (2017)
Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019)
Canonaco, G., Restelli, M., Roveri, M.: Model-free non-stationarity detection and adaptation in reinforcement learning. In: Proceedings of the Twenty-Fourth European Conference on Artificial Intelligence, ECAI, pp. 1047–1054. IOS Press (2020)
Choi, S.P.M., Yeung, D.-Y., Zhang, N.L.: Hidden-mode Markov decision processes for nonstationary sequential decision making. In: Sun, R., Giles, C.L. (eds.) Sequence Learning. LNCS (LNAI), vol. 1828, pp. 264–287. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44565-X_12
Da Silva, B.C., Basso, E.W., Perotto, F.S., Bazzan, A.L.C., Engel, P.M.: Improving reinforcement learning with context detection. In: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent systems, pp. 810–812 (2006)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: Proceedings of the Thirty-Fifth International Conference on Machine Learning, pp. 1587–1596. PMLR (2018)
Gamrian, S., Goldberg, Y.: Transfer learning for related reinforcement learning tasks via image-to-image translation. In: International Conference on Machine Learning, pp. 2063–2072. PMLR (2019)
Hadoux, E., Beynier, A., Weng, P.: Sequential decision-making under non-stationary environments via sequential change-point detection. In: Learning Over Multiple Contexts (LMCE) (2014)
Jaderberg, M., et al.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364(6443), 859–865 (2019)
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, vol. 12 (1999)
Mnih, V., et al.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
van Otterlo, M., Wiering, M.: Reinforcement learning and Markov decision processes. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 3–42. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3_1
Pope, A.P., et al.: Hierarchical reinforcement learning for air-to-air combat. arXiv preprint arXiv:2105.00990 (2021)
Qiao, G., Weiss, B.A.: Quick health assessment for industrial robot health degradation and the supporting advanced sensing development. J. Manuf. Syst. 48, 51–59 (2018). Special Issue on Smart Manufacturing
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10(7), 1633–1685 (2009)
Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 5824–5836. Curran Associates, Inc. (2020)
Acknowledgement
This work was funded by the Department of Defence and the Office of National Intelligence under the AI for Decision Making Program, delivered in partnership with the NSW Defence Innovation Network Grant Number RG213520.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hussein, M., Keshk, M., Hussein, A. (2024). Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_28
Download citation
DOI: https://doi.org/10.1007/978-981-99-8391-9_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)