Skip to main content
Log in

Adaptive deep reinforcement learning for non-stationary environments

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Deep reinforcement learning (DRL) is currently used to solve Markov decision process problems for which the environment is typically assumed to be stationary. In this paper, we propose an adaptive DRL method for non-stationary environments. First, we introduce model uncertainty and propose the self-adjusting deep Q-learning algorithm, which can achieve the rebalance of exploration and exploitation automatically as the environment changes. Second, we propose a feasible criterion to judge the appropriateness of parameter setting of deep Q-networks and minimize the misjudgment probability based on the large deviation principle (LDP). The effectiveness of the proposed adaptive DRL method is illustrated in terms of an advanced persistent threat (APT) attack simulation game. Experimental results show that compared with the classic deep Q-learning algorithms in non-stationary and stationary environments, the adaptive DRL method improves performance by at least 14.28% and 30.56%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Puterman M L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Hoboken: John Wiley & Sons, 2014

    MATH  Google Scholar 

  2. Jia Q S, Wu J J. On distributed event-based optimization for shared economy in cyber-physical energy systems. Sci China Inf Sci, 2018, 61: 110203

    Article  Google Scholar 

  3. Zhao Q C, Geirhofer S, Tong L, et al. Opportunistic spectrum access via periodic channel sensing. IEEE Trans Signal Process, 2008, 56: 785–796

    Article  MathSciNet  MATH  Google Scholar 

  4. Zhang R R, Guo L. Controllability of stochastic game-based control systems. SIAM J Control Optim, 2019, 57: 3799–3826

    Article  MathSciNet  MATH  Google Scholar 

  5. Tian F K, Yang C C. Deep belief network-hidden Markov model based nonlinear equalizer for VCSEL based optical interconnect. Sci China Inf Sci, 2020, 63: 160406

    Article  MathSciNet  Google Scholar 

  6. Dit-Yan S C, Choi S P, Yeung D Y, et al. Hidden-mode Markov decision processes. In: Proceedings of IJCAI Workshop on Neural, Symbolic, and Reinforcement Methods for Sequence Learning, 1999

  7. Chen C L, Dong D Y, Li H X, et al. Hybrid MDP based integrated hierarchical Q-learning. Sci China Inf Sci, 2011, 54: 2279–2294

    Article  MathSciNet  MATH  Google Scholar 

  8. Ayoub A, Jia Z Y, Szepesvari C, et al. Model-based reinforcement learning with value-targeted regression. In: Proceedings of International Conference on Machine Learning, 2020. 463–474

  9. Chebotar Y, Hausman K, Zhang M, et al. Combining model-based and model-free updates for trajectory-centric reinforcement learning. In: Proceedings of International Conference on Machine Learning, Sydney, 2017. 703–711

  10. Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 2018

    MATH  Google Scholar 

  11. Igl M, Zintgraf L, Le T A, et al. Deep variational reinforcement learning for POMDPs. In: Proceedings of International Conference on Machine Learning, Stockholm, 2018. 2117–2126

  12. Perkins T J. Reinforcement learning for POMDPs based on action values and stochastic optimization. In: Proceedings of the AAAI Conference on Artificial Intelligence, Edmonton, 2002. 199–204

  13. Zhang C J, Lesser V. Coordinated multi-agent reinforcement learning in networked distributed POMDPs. In: Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, 2011. 764–770

  14. Bui V H, Hussain A, Kim H M. Double deep Q-learning-based distributed operation of battery energy storage system considering uncertainties. IEEE Trans Smart Grid, 2020, 11: 457–469

    Article  Google Scholar 

  15. Morozs N, Clarke T, Grace D. Distributed heuristically accelerated Q-learning for robust cognitive spectrum management in LTE cellular systems. IEEE Trans Mobile Comput, 2016, 15: 817–825

    Article  Google Scholar 

  16. Gimelfarb M, Sanner S, Lee C G. -BMC: a Bayesian ensemble approach to epsilon-greedy exploration in model-free reinforcement learning. In: Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, Tel Aviv, 2019. 476–485

  17. Hasselt H V, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, 2016. 2094–2100

  18. Xiao L, Sheng G Y, Liu S C, et al. Deep reinforcement learning-enabled secure visible light communication against eavesdropping. IEEE Trans Commun, 2019, 67: 6994–7005

    Article  Google Scholar 

  19. Hester T, Vecerik M, Pietquin O, et al. Deep Q-learning from demonstrations. In: Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, 2018. 3223–3230

  20. Chen C L, Dong D Y, Li H X, et al. Fidelity-based probabilistic Q-learning for control of quantum systems. IEEE Trans Neural Netw Learn Syst, 2014, 25: 920–933

    Article  Google Scholar 

  21. Khadka S, Majumdar S, Nassar T, et al. Collaborative evolutionary reinforcement learning. In: Proceedings of International Conference on Machine Learning, Long Beach, 2019. 3341–3350

  22. Wiering M, Otterlo M V. Reinforcement Learning. In: Adaptation, Learning, and Optimization. Berlin: Springer, 2012

    Google Scholar 

  23. Dong D Y, Chen C L, Li H X, et al. Quantum reinforcement learning. IEEE Trans Syst Man Cybern B, 2008, 38: 1207–1220

    Article  Google Scholar 

  24. Pathak D, Agrawal P, Efros A A, et al. Curiosity-driven exploration by self-supervised prediction. In: Proceedings of International Conference on Machine Learning, Sydney, 2017. 2778–2787

  25. Liu Y, Liu Q, Zhao H K, et al. Adaptive quantitative trading: an imitative deep reinforcement learning approach. In: Proceedings of the AAAI Conference on Artificial Intelligence, New York, 2020. 2128–2135

  26. Liu Z, Li X, Kang B, et al. Regularization matters in policy optimization — an empirical study on continuous control. In: Proceedings of International Conference on Learning Representations, 2021. 1–44

  27. Sung T T, Kim D, Park S J, et al. Dropout acts as auxiliary exploration. Int J Applied Eng Res, 2018, 13: 7977–7982

    Google Scholar 

  28. Xie S R, Huang J N, Liu C X, et al. NADPEx: an on-policy temporally consistent exploration method for deep reinforcement learning. In: Proceedings of International Conference on Learning Representations, New Orleans, 2019. 1–18

  29. Wang X S, Gu Y, Cheng Y H, et al. Approximate policy-based accelerated deep reinforcement learning. IEEE Trans Neural Netw Learn Syst, 2020, 31: 1820–1830

    Article  MathSciNet  Google Scholar 

  30. Garcelon E, Ghavamzadeh M, Lazaric A, et al. Conservative exploration in reinforcement learning. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2020. 1431–1441

  31. Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, 2017. 1492–1500

  32. Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of International Conference on Machine Learning, New York, 2016. 1050–1059

  33. Arulkumaran K, Deisenroth M P, Brundage M, et al. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag, 2017, 34: 26–38

    Article  Google Scholar 

  34. Kendall A, Gal Y. What uncertainties do we need in Bayesian deep learning for computer vision? In: Proceedings of International Conference on Neural Information Processing Systems, Long Beach, 2017. 5580–5590

  35. Gal Y, Islam R, Ghahramani Z. Deep Bayesian active learning with image data. In: Proceedings of International Conference on Machine Learning, Sydney, 2017. 1183–1192

  36. Rutkowska A. Properties of the cox-stuart test for trend in application to hydrological series: the simulation study. Commun Stat-Simul Comput, 2015, 44: 565–579

    Article  MathSciNet  MATH  Google Scholar 

  37. Canonaco G, Restelli M, Roveri M. Model-free non-stationarity detection and adaptation in reinforcement learning. In: Proceedings of the 24th European Conference on Artificial Intelligence, Santiago de Compostela, 2020. 1047–1054

  38. Lei Y G, Li N P, Lin J. A new method based on stochastic process models for machine remaining useful life prediction. IEEE Trans Instrum Meas, 2016, 65: 2671–2684

    Article  Google Scholar 

  39. Bernardo J, Smith A F. Bayesian Theory. Hoboken: John Wiley & Sons, 2009

    Google Scholar 

  40. Alliney S, Ruzinsky S A. An algorithm for the minimization of mixed l1 and l2 norms with application to Bayesian estimation. IEEE Trans Signal Process, 1994, 42: 618–627

    Article  Google Scholar 

  41. Sadowsky J S, Bucklew J A. On large deviations theory and asymptotically efficient Monte Carlo estimation. IEEE Trans Inform Theor, 1990, 36: 579–588

    Article  MathSciNet  MATH  Google Scholar 

  42. Touchette H. The large deviation approach to statistical mechanics. Phys Rep, 2009, 478: 1–69

    Article  MathSciNet  Google Scholar 

  43. Hu P F, Li H X, Fu H, et al. Dynamic defense strategy against advanced persistent threat with insiders. In: Proceedings of IEEE Conference on Computer Communications, Hong Kong, 2015. 747–755

  44. Laszka A, Horvath G, Felegyhazi M, et al. FlipThem: modeling targeted attacks with FlipIt for multiple resources. In: Proceedings of International Conference on Decision and Game Theory for Security, Los Angeles, 2014. 175–194

  45. Mu Y, Guo L. How cooperation arises from rational players? In: Proceedings of IEEE Conference on Decision and Control, Atlanta, 2010. 6149–6154

  46. Greige L, Chin P. Reinforcement learning in FlipIt. 2020. ArXiv:2002.12909

Download references

Acknowledgements

This work was supported by National Key Research and Development Project (Grant No. 2018AAA0100802).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jin Zhu or Yu Kang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, J., Wei, Y., Kang, Y. et al. Adaptive deep reinforcement learning for non-stationary environments. Sci. China Inf. Sci. 65, 202204 (2022). https://doi.org/10.1007/s11432-021-3347-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-021-3347-8

Keywords

Navigation