Adaptive deep reinforcement learning for non-stationary environments

Zhu, Jin; Wei, Yutong; Kang, Yu; Jiang, Xiaofeng; Dullerud, Geir E.

doi:10.1007/s11432-021-3347-8

Adaptive deep reinforcement learning for non-stationary environments

Research Paper
Published: 27 September 2022

Volume 65, article number 202204, (2022)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Jin Zhu¹,
Yutong Wei¹,
Yu Kang¹,
Xiaofeng Jiang¹ &
…
Geir E. Dullerud²

361 Accesses
7 Citations
Explore all metrics

Abstract

Deep reinforcement learning (DRL) is currently used to solve Markov decision process problems for which the environment is typically assumed to be stationary. In this paper, we propose an adaptive DRL method for non-stationary environments. First, we introduce model uncertainty and propose the self-adjusting deep Q-learning algorithm, which can achieve the rebalance of exploration and exploitation automatically as the environment changes. Second, we propose a feasible criterion to judge the appropriateness of parameter setting of deep Q-networks and minimize the misjudgment probability based on the large deviation principle (LDP). The effectiveness of the proposed adaptive DRL method is illustrated in terms of an advanced persistent threat (APT) attack simulation game. Experimental results show that compared with the classic deep Q-learning algorithms in non-stationary and stationary environments, the adaptive DRL method improves performance by at least 14.28% and 30.56%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling

Adversarial Deep Reinforcement Learning Based Adaptive Moving Target Defense

Solving Cyber Alert Allocation Markov Games with Deep Reinforcement Learning

References

Puterman M L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Hoboken: John Wiley & Sons, 2014
MATH Google Scholar
Jia Q S, Wu J J. On distributed event-based optimization for shared economy in cyber-physical energy systems. Sci China Inf Sci, 2018, 61: 110203
Article Google Scholar
Zhao Q C, Geirhofer S, Tong L, et al. Opportunistic spectrum access via periodic channel sensing. IEEE Trans Signal Process, 2008, 56: 785–796
Article MathSciNet MATH Google Scholar
Zhang R R, Guo L. Controllability of stochastic game-based control systems. SIAM J Control Optim, 2019, 57: 3799–3826
Article MathSciNet MATH Google Scholar
Tian F K, Yang C C. Deep belief network-hidden Markov model based nonlinear equalizer for VCSEL based optical interconnect. Sci China Inf Sci, 2020, 63: 160406
Article MathSciNet Google Scholar
Dit-Yan S C, Choi S P, Yeung D Y, et al. Hidden-mode Markov decision processes. In: Proceedings of IJCAI Workshop on Neural, Symbolic, and Reinforcement Methods for Sequence Learning, 1999
Chen C L, Dong D Y, Li H X, et al. Hybrid MDP based integrated hierarchical Q-learning. Sci China Inf Sci, 2011, 54: 2279–2294
Article MathSciNet MATH Google Scholar
Ayoub A, Jia Z Y, Szepesvari C, et al. Model-based reinforcement learning with value-targeted regression. In: Proceedings of International Conference on Machine Learning, 2020. 463–474
Chebotar Y, Hausman K, Zhang M, et al. Combining model-based and model-free updates for trajectory-centric reinforcement learning. In: Proceedings of International Conference on Machine Learning, Sydney, 2017. 703–711
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 2018
MATH Google Scholar
Igl M, Zintgraf L, Le T A, et al. Deep variational reinforcement learning for POMDPs. In: Proceedings of International Conference on Machine Learning, Stockholm, 2018. 2117–2126
Perkins T J. Reinforcement learning for POMDPs based on action values and stochastic optimization. In: Proceedings of the AAAI Conference on Artificial Intelligence, Edmonton, 2002. 199–204
Zhang C J, Lesser V. Coordinated multi-agent reinforcement learning in networked distributed POMDPs. In: Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, 2011. 764–770
Bui V H, Hussain A, Kim H M. Double deep Q-learning-based distributed operation of battery energy storage system considering uncertainties. IEEE Trans Smart Grid, 2020, 11: 457–469
Article Google Scholar
Morozs N, Clarke T, Grace D. Distributed heuristically accelerated Q-learning for robust cognitive spectrum management in LTE cellular systems. IEEE Trans Mobile Comput, 2016, 15: 817–825
Article Google Scholar
Gimelfarb M, Sanner S, Lee C G. ∊-BMC: a Bayesian ensemble approach to epsilon-greedy exploration in model-free reinforcement learning. In: Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, Tel Aviv, 2019. 476–485
Hasselt H V, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, 2016. 2094–2100
Xiao L, Sheng G Y, Liu S C, et al. Deep reinforcement learning-enabled secure visible light communication against eavesdropping. IEEE Trans Commun, 2019, 67: 6994–7005
Article Google Scholar
Hester T, Vecerik M, Pietquin O, et al. Deep Q-learning from demonstrations. In: Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, 2018. 3223–3230
Chen C L, Dong D Y, Li H X, et al. Fidelity-based probabilistic Q-learning for control of quantum systems. IEEE Trans Neural Netw Learn Syst, 2014, 25: 920–933
Article Google Scholar
Khadka S, Majumdar S, Nassar T, et al. Collaborative evolutionary reinforcement learning. In: Proceedings of International Conference on Machine Learning, Long Beach, 2019. 3341–3350
Wiering M, Otterlo M V. Reinforcement Learning. In: Adaptation, Learning, and Optimization. Berlin: Springer, 2012
Google Scholar
Dong D Y, Chen C L, Li H X, et al. Quantum reinforcement learning. IEEE Trans Syst Man Cybern B, 2008, 38: 1207–1220
Article Google Scholar
Pathak D, Agrawal P, Efros A A, et al. Curiosity-driven exploration by self-supervised prediction. In: Proceedings of International Conference on Machine Learning, Sydney, 2017. 2778–2787
Liu Y, Liu Q, Zhao H K, et al. Adaptive quantitative trading: an imitative deep reinforcement learning approach. In: Proceedings of the AAAI Conference on Artificial Intelligence, New York, 2020. 2128–2135
Liu Z, Li X, Kang B, et al. Regularization matters in policy optimization — an empirical study on continuous control. In: Proceedings of International Conference on Learning Representations, 2021. 1–44
Sung T T, Kim D, Park S J, et al. Dropout acts as auxiliary exploration. Int J Applied Eng Res, 2018, 13: 7977–7982
Google Scholar
Xie S R, Huang J N, Liu C X, et al. NADPEx: an on-policy temporally consistent exploration method for deep reinforcement learning. In: Proceedings of International Conference on Learning Representations, New Orleans, 2019. 1–18
Wang X S, Gu Y, Cheng Y H, et al. Approximate policy-based accelerated deep reinforcement learning. IEEE Trans Neural Netw Learn Syst, 2020, 31: 1820–1830
Article MathSciNet Google Scholar
Garcelon E, Ghavamzadeh M, Lazaric A, et al. Conservative exploration in reinforcement learning. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2020. 1431–1441
Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, 2017. 1492–1500
Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of International Conference on Machine Learning, New York, 2016. 1050–1059
Arulkumaran K, Deisenroth M P, Brundage M, et al. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag, 2017, 34: 26–38
Article Google Scholar
Kendall A, Gal Y. What uncertainties do we need in Bayesian deep learning for computer vision? In: Proceedings of International Conference on Neural Information Processing Systems, Long Beach, 2017. 5580–5590
Gal Y, Islam R, Ghahramani Z. Deep Bayesian active learning with image data. In: Proceedings of International Conference on Machine Learning, Sydney, 2017. 1183–1192
Rutkowska A. Properties of the cox-stuart test for trend in application to hydrological series: the simulation study. Commun Stat-Simul Comput, 2015, 44: 565–579
Article MathSciNet MATH Google Scholar
Canonaco G, Restelli M, Roveri M. Model-free non-stationarity detection and adaptation in reinforcement learning. In: Proceedings of the 24th European Conference on Artificial Intelligence, Santiago de Compostela, 2020. 1047–1054
Lei Y G, Li N P, Lin J. A new method based on stochastic process models for machine remaining useful life prediction. IEEE Trans Instrum Meas, 2016, 65: 2671–2684
Article Google Scholar
Bernardo J, Smith A F. Bayesian Theory. Hoboken: John Wiley & Sons, 2009
Google Scholar
Alliney S, Ruzinsky S A. An algorithm for the minimization of mixed l₁ and l₂ norms with application to Bayesian estimation. IEEE Trans Signal Process, 1994, 42: 618–627
Article Google Scholar
Sadowsky J S, Bucklew J A. On large deviations theory and asymptotically efficient Monte Carlo estimation. IEEE Trans Inform Theor, 1990, 36: 579–588
Article MathSciNet MATH Google Scholar
Touchette H. The large deviation approach to statistical mechanics. Phys Rep, 2009, 478: 1–69
Article MathSciNet Google Scholar
Hu P F, Li H X, Fu H, et al. Dynamic defense strategy against advanced persistent threat with insiders. In: Proceedings of IEEE Conference on Computer Communications, Hong Kong, 2015. 747–755
Laszka A, Horvath G, Felegyhazi M, et al. FlipThem: modeling targeted attacks with FlipIt for multiple resources. In: Proceedings of International Conference on Decision and Game Theory for Security, Los Angeles, 2014. 175–194
Mu Y, Guo L. How cooperation arises from rational players? In: Proceedings of IEEE Conference on Decision and Control, Atlanta, 2010. 6149–6154
Greige L, Chin P. Reinforcement learning in FlipIt. 2020. ArXiv:2002.12909

Download references

Acknowledgements

This work was supported by National Key Research and Development Project (Grant No. 2018AAA0100802).

Author information

Authors and Affiliations

Department of Automation, University of Science and Technology of China, Hefei, 230022, China
Jin Zhu, Yutong Wei, Yu Kang & Xiaofeng Jiang
Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Geir E. Dullerud

Authors

Jin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yutong Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yu Kang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Geir E. Dullerud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jin Zhu or Yu Kang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, J., Wei, Y., Kang, Y. et al. Adaptive deep reinforcement learning for non-stationary environments. Sci. China Inf. Sci. 65, 202204 (2022). https://doi.org/10.1007/s11432-021-3347-8

Download citation

Received: 02 April 2021
Revised: 04 August 2021
Accepted: 01 September 2021
Published: 27 September 2022
DOI: https://doi.org/10.1007/s11432-021-3347-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive deep reinforcement learning for non-stationary environments

Abstract

Access this article

Similar content being viewed by others

Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling

Adversarial Deep Reinforcement Learning Based Adaptive Moving Target Defense

Solving Cyber Alert Allocation Markov Games with Deep Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive deep reinforcement learning for non-stationary environments

Abstract

Access this article

Similar content being viewed by others

Defeating the Non-stationary Opponent Using Deep Reinforcement Learning and Opponent Modeling

Adversarial Deep Reinforcement Learning Based Adaptive Moving Target Defense

Solving Cyber Alert Allocation Markov Games with Deep Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation