Abstract
Ensemble reinforcement learning, which combines the decisions of a set of base agents, is proposed to enhance the decision making process and speed up training time. Many studies indicate that an ensemble model may achieve better results than a single agent because of the complement of base agents, in which the error of an agent may be corrected by others. However, the fusion method is a fundamental issue in ensemble. Currently, existing studies mainly focus on static fusion which either assumes all agents have the same ability or ignores the ones with poor average performance. This assumption causes current static fusion methods to overlook base agents with poor overall performance, but excellent results in select scenarios, which results in the ability of some agents not being fully utilized. This study aims to propose a dynamic fusion method which utilizes each base agent according to its local competence on test states. The performance of a base agent on the validation set is measured in terms of the rewards achieved by the agent in next n steps. The similarity between a validation state and a new state is quantified by Euclidian distance in the latent space and the weights of each base agent are updated according to its performance on validation states and their similarity to a new state. The experimental studies confirm that the proposed dynamic fusion method outperforms its base agents and also the static fusion methods. This is the first dynamic fusion method proposed for deep reinforcement learning, which extends the study on dynamic fusion from classification to reinforcement learning.
Similar content being viewed by others
References
Anschel O, Baram N, Shimkin N (2017) Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. In: International conference on machine learning, pp 176–185
Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J. Artificial Intell. Res. 47:253–279
Bonaccio S, Dalal RS (2006) Advice taking and decision-making: an integrative literature review, and implications for the organizational sciences. Organizational behavior and human decision processes 101(2):127–151
Brys T, Harutyunyan A, Vrancx P, Nowé A, Taylor ME (2017) Multi-objectivization and ensembles of shapings in reinforcement learning. Neurocomputing 263:48–59
Buckman J, Hafner D, Tucker G, Brevdo E, Lee H (2018) Sample-efficient reinforcement learning with stochastic ensemble value expansion. In: Advances in Neural Information Processing Systems, pp 8224–8234
Chan PP, Yeung DS, Ng WW, Lin CM, Liu JN (2012) Dynamic fusion method using localized generalization error model. Inform Sci 217:1–20
Chan PP, Wang YX, Yeung DS (2020) Adversarial attack against deep reinforcement learning with static reward impact map. In: ACM ASIACCS, InPress
Chen L, Lu L, Feng K, Li W, Song J, Zheng L, Yuan Y, Zeng Z, Feng K, Lu W et al (2009) Multiple classifier integration for the prediction of protein structural classes. J Comput Chem 30(14):2248–2254
Xl Chen (2018) Cao L, Li Cx, Xu Zx, Lai J (2018) Ensemble network architecture for deep reinforcement learning. Mathematical Problems in Engineering
Cruz RM, Sabourin R, Cavalcanti GD (2018) Dynamic classifier selection: recent advances and perspectives. Inform Fusion 41:195–216
Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems, Springer, pp 1–15
Duell S, Udluft S (2013) Ensembles for continuous actions in reinforcement learning. In: ESANN
Faußer S, Schwenker F (2011) Ensemble methods for reinforcement learning with function approximation. In: International workshop on multiple classifier systems, Springer, pp 56–65
Faußer S, Schwenker F (2015) Neural network ensembles in reinforcement learning. Neural Process Lett 41(1):55–69
Feng X, Xiao Z, Zhong B, Qiu J, Dong Y (2018) Dynamic ensemble classification for credit scoring using soft probability. Appl Soft Comput 65:139–151
Gao Z, Gao Y, Hu Y, Jiang Z, Su J (2020) Application of deep q-network in portfolio management. In: 2020 5th IEEE international conference on big data analytics (ICBDA), IEEE, pp 268–275
Gosavi A (2004) A reinforcement learning algorithm based on policy iteration for average reward: empirical results with yield management and convergence analysis. Mach Learning 55(1):5–29
Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 3389–3396
Hans A, Udluft S (2010) Ensembles of neural networks for robust reinforcement learning. In: 2010 ninth international conference on machine learning and applications, IEEE, pp 401–406
Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: Combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Huang W, Chan PP, Zhou D, Fang Y, Liu H, Yeung DS (2016) Multiple classifier system with sensitivity based dynamic weighting fusion for hand gesture recognition. In: 2016 international conference on wavelet analysis and pattern recognition (ICWAPR), IEEE, pp 31–36
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:13125602
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Saleena N et al (2018) An ensemble classification system for twitter sentiment analysis. Proc Comput Sci 132:937–946
Sallab AE, Abdou M, Perot E (2017) Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. Electron Imaging 19:70–76
Saphal R, Ravindran B, Mudigere D, Avancha S, Kaul B (2020) Seerl: Sample efficient ensemble reinforcement learning. arXiv preprint arXiv:200105209
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. (2016) Mastering the game of go with deep neural networks and tree search. nature 529(7587):484–489
Soares SG, Araújo R (2016) An adaptive ensemble of on-line extreme learning machines with variable forgetting factor for dynamic system prediction. Neurocomputing 171:693–707
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence
Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2014) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654
Watkins CJCH (1989) Learning from delayed rewards. PhD thesis, King’s College, Cambridge
Wiering MA, Van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 38(4):930–936
Wu J (2020) Li H (2020) Deep ensemble reinforcement learning with multiple deep deterministic policy gradient algorithm. Mathematical Problems in Engineering
Yeung DS, Chan PP (2009) A novel dynamic fusion method using localized generalization error model. 2009 IEEE international conference on systems. Man and Cybernetics, IEEE, pp 623–628
Yu H, Chen Y, Lingras P, Wang G (2019) A three-way cluster ensemble approach for large-scale data. Intl J Approximate Reasoning 115:32–49
Yu Z, Li L, Liu J, Zhang J, Han G (2015) Adaptive noise immune cluster ensemble using affinity propagation. IEEE Trans Knowl Data Eng 27(12):3176–3189
Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 3357–3364
Acknowledgements
This paper is supported by the Natural Science Foundation of Guangdong Province, China (No. 2018A030313203) and the Fundamental Research Funds for the Central Universities (No. 2018ZD32).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chan, P.P.K., Xiao, M., Qin, X. et al. Dynamic fusion for ensemble of deep Q-network. Int. J. Mach. Learn. & Cyber. 12, 1031–1040 (2021). https://doi.org/10.1007/s13042-020-01218-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01218-z