Skip to main content
Log in

Dynamic fusion for ensemble of deep Q-network

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Ensemble reinforcement learning, which combines the decisions of a set of base agents, is proposed to enhance the decision making process and speed up training time. Many studies indicate that an ensemble model may achieve better results than a single agent because of the complement of base agents, in which the error of an agent may be corrected by others. However, the fusion method is a fundamental issue in ensemble. Currently, existing studies mainly focus on static fusion which either assumes all agents have the same ability or ignores the ones with poor average performance. This assumption causes current static fusion methods to overlook base agents with poor overall performance, but excellent results in select scenarios, which results in the ability of some agents not being fully utilized. This study aims to propose a dynamic fusion method which utilizes each base agent according to its local competence on test states. The performance of a base agent on the validation set is measured in terms of the rewards achieved by the agent in next n steps. The similarity between a validation state and a new state is quantified by Euclidian distance in the latent space and the weights of each base agent are updated according to its performance on validation states and their similarity to a new state. The experimental studies confirm that the proposed dynamic fusion method outperforms its base agents and also the static fusion methods. This is the first dynamic fusion method proposed for deep reinforcement learning, which extends the study on dynamic fusion from classification to reinforcement learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Anschel O, Baram N, Shimkin N (2017) Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. In: International conference on machine learning, pp 176–185

  2. Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J. Artificial Intell. Res. 47:253–279

    Article  Google Scholar 

  3. Bonaccio S, Dalal RS (2006) Advice taking and decision-making: an integrative literature review, and implications for the organizational sciences. Organizational behavior and human decision processes 101(2):127–151

    Article  Google Scholar 

  4. Brys T, Harutyunyan A, Vrancx P, Nowé A, Taylor ME (2017) Multi-objectivization and ensembles of shapings in reinforcement learning. Neurocomputing 263:48–59

    Article  Google Scholar 

  5. Buckman J, Hafner D, Tucker G, Brevdo E, Lee H (2018) Sample-efficient reinforcement learning with stochastic ensemble value expansion. In: Advances in Neural Information Processing Systems, pp 8224–8234

  6. Chan PP, Yeung DS, Ng WW, Lin CM, Liu JN (2012) Dynamic fusion method using localized generalization error model. Inform Sci 217:1–20

    Article  Google Scholar 

  7. Chan PP, Wang YX, Yeung DS (2020) Adversarial attack against deep reinforcement learning with static reward impact map. In: ACM ASIACCS, InPress

  8. Chen L, Lu L, Feng K, Li W, Song J, Zheng L, Yuan Y, Zeng Z, Feng K, Lu W et al (2009) Multiple classifier integration for the prediction of protein structural classes. J Comput Chem 30(14):2248–2254

    Google Scholar 

  9. Xl Chen (2018) Cao L, Li Cx, Xu Zx, Lai J (2018) Ensemble network architecture for deep reinforcement learning. Mathematical Problems in Engineering

  10. Cruz RM, Sabourin R, Cavalcanti GD (2018) Dynamic classifier selection: recent advances and perspectives. Inform Fusion 41:195–216

    Article  Google Scholar 

  11. Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems, Springer, pp 1–15

  12. Duell S, Udluft S (2013) Ensembles for continuous actions in reinforcement learning. In: ESANN

  13. Faußer S, Schwenker F (2011) Ensemble methods for reinforcement learning with function approximation. In: International workshop on multiple classifier systems, Springer, pp 56–65

  14. Faußer S, Schwenker F (2015) Neural network ensembles in reinforcement learning. Neural Process Lett 41(1):55–69

    Article  Google Scholar 

  15. Feng X, Xiao Z, Zhong B, Qiu J, Dong Y (2018) Dynamic ensemble classification for credit scoring using soft probability. Appl Soft Comput 65:139–151

    Article  Google Scholar 

  16. Gao Z, Gao Y, Hu Y, Jiang Z, Su J (2020) Application of deep q-network in portfolio management. In: 2020 5th IEEE international conference on big data analytics (ICBDA), IEEE, pp 268–275

  17. Gosavi A (2004) A reinforcement learning algorithm based on policy iteration for average reward: empirical results with yield management and convergence analysis. Mach Learning 55(1):5–29

    Article  Google Scholar 

  18. Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 3389–3396

  19. Hans A, Udluft S (2010) Ensembles of neural networks for robust reinforcement learning. In: 2010 ninth international conference on machine learning and applications, IEEE, pp 401–406

  20. Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: Combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence

  21. Huang W, Chan PP, Zhou D, Fang Y, Liu H, Yeung DS (2016) Multiple classifier system with sensitivity based dynamic weighting fusion for hand gesture recognition. In: 2016 international conference on wavelet analysis and pattern recognition (ICWAPR), IEEE, pp 31–36

  22. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:13125602

  23. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529

    Article  Google Scholar 

  24. Saleena N et al (2018) An ensemble classification system for twitter sentiment analysis. Proc Comput Sci 132:937–946

    Article  Google Scholar 

  25. Sallab AE, Abdou M, Perot E (2017) Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. Electron Imaging 19:70–76

    Article  Google Scholar 

  26. Saphal R, Ravindran B, Mudigere D, Avancha S, Kaul B (2020) Seerl: Sample efficient ensemble reinforcement learning. arXiv preprint arXiv:200105209

  27. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897

  28. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. (2016) Mastering the game of go with deep neural networks and tree search. nature 529(7587):484–489

  29. Soares SG, Araújo R (2016) An adaptive ensemble of on-line extreme learning machines with variable forgetting factor for dynamic system prediction. Neurocomputing 171:693–707

    Article  Google Scholar 

  30. Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence

  31. Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2014) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654

    Article  Google Scholar 

  32. Watkins CJCH (1989) Learning from delayed rewards. PhD thesis, King’s College, Cambridge

  33. Wiering MA, Van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 38(4):930–936

  34. Wu J (2020) Li H (2020) Deep ensemble reinforcement learning with multiple deep deterministic policy gradient algorithm. Mathematical Problems in Engineering

  35. Yeung DS, Chan PP (2009) A novel dynamic fusion method using localized generalization error model. 2009 IEEE international conference on systems. Man and Cybernetics, IEEE, pp 623–628

    Google Scholar 

  36. Yu H, Chen Y, Lingras P, Wang G (2019) A three-way cluster ensemble approach for large-scale data. Intl J Approximate Reasoning 115:32–49

    Article  MathSciNet  Google Scholar 

  37. Yu Z, Li L, Liu J, Zhang J, Han G (2015) Adaptive noise immune cluster ensemble using affinity propagation. IEEE Trans Knowl Data Eng 27(12):3176–3189

    Article  Google Scholar 

  38. Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 3357–3364

Download references

Acknowledgements

This paper is supported by the Natural Science Foundation of Guangdong Province, China (No. 2018A030313203) and the Fundamental Research Funds for the Central Universities (No. 2018ZD32).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meng Xiao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chan, P.P.K., Xiao, M., Qin, X. et al. Dynamic fusion for ensemble of deep Q-network. Int. J. Mach. Learn. & Cyber. 12, 1031–1040 (2021). https://doi.org/10.1007/s13042-020-01218-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01218-z

Keywords

Navigation