Skip to main content
Log in

A novel action decision method of deep reinforcement learning based on a neural network and confidence bound

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

From the perspective of the deep reinforcement learning algorithm, the training effect of the agent will be affected because of the excessive randomness of the ε-greedy method. This paper proposes a novel action decision method to replace the ε-greedy method and avoid excessive randomness. First, a confidence bound span fitting model based on a deep neural network is proposed to fundamentally solve the problem that UCB cannot estimate the confidence bound span of each action in high-dimensional state space. Then, a confidence bound span balance model based on target value in reverse order is proposed. The parameters of the U network are updated after each action decision using the backpropagation of the neural network to balance the confidence bound span. Finally, an exploration-exploitation dynamic balance factor \(\alpha\) is introduced to balance exploration and exploitation in the training process. Experiments are conducted using the Nature DQN and Double DQN algorithms, and the results demonstrate that the proposed method achieves higher performance than the ε-greedy method under the basic algorithm and experimental environment of this paper. The method presented in this paper has significance for applying a confidence bound to solve complex reinforcement problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets analysed during the current study are available in the OpenAI Gym library, https://www.gymlibrary.dev/.

References

  1. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge

  2. Stapelberg B, Malan KM (2020) A survey of benchmarking frameworks for reinforcement learning. South Afr Comput J 32(2):258–292

    Google Scholar 

  3. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

    Article  MATH  Google Scholar 

  4. Upadhyay SK, Kumar A (2021) A novel approach for rice plant diseases classification with deep convolutional neural network. Int J Inform Technol 14(1):185–199

    MathSciNet  Google Scholar 

  5. Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  6. Senior AW, Evans R, Jumper J et al (2020) Improved protein structure prediction using potentials from deep learning. Nature 577(7792):706–710

    Article  Google Scholar 

  7. Pang K, Zhang Y, Yin C (2020) A decision-making method for Self-driving based on deep reinforcement learning. J Phys: Conf Ser 1576(1):012025

  8. Lee J, Koh H, Choe HJ (2021) Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning. Appl Intell 51(8):6202–6223

    Article  Google Scholar 

  9. Kotsiopoulos T, Sarigiannidis P, Ioannidis D et al (2021) Machine learning and deep learning in smart manufacturing: the smart grid paradigm. Comput Sci Rev 40: 00341

  10. Hua J, Zeng L, Li G et al (2021) Learning for a robot: deep reinforcement learning, imitation learning, transfer learning. Sensors 21(4):1278

    Article  Google Scholar 

  11. Valarezo Añazco E, Rivera Lopez P, Park N et al (2021) Natural object manipulation using anthropomorphic robotic hand through deep reinforcement learning and deep grasping probability network. Appl Intell 51:1041–1055

    Article  Google Scholar 

  12. Liu T, Wang J, Yang B et al (2021) NGDNet: nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220

    Article  Google Scholar 

  13. Liu H, Nie H, Zhang Z et al (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322

    Article  Google Scholar 

  14. Liu H, Zheng C, Li D, et al (2021) EDMF: efficient deep matrix factorization with review feature learning for industrial recommender system. IEEE Trans Ind Inf 18(7):4361-4371

    Article  Google Scholar 

  15. Nian R, Liu J, Huang B (2020) A review on reinforcement learning: introduction and applications in industrial process control. Comput Chem Eng 139:106886

    Article  Google Scholar 

  16. De Ath G, Everson RM, Rahat AA, et al (2021) Greed is good: exploration and exploitation trade-offs in bayesian optimisation. ACM Trans Evol Learn Optim 1(1):1-22

    Article  Google Scholar 

  17. Li Q, Zhong J, Cao Z et al (2020) Optimizing streaming graph partitioning via a heuristic greedy method and caching strategy. Optim Methods Softw 35(6):1144–1159

    Article  MathSciNet  MATH  Google Scholar 

  18. Yao Y, Wang HY (2019) Optimal subsampling for softmax regression. Stat Pap 60(2):585–599

    Article  MathSciNet  MATH  Google Scholar 

  19. Alshahrani M, Zhu F, Mekouar S et al (2021) Identification of Top-K Influencers based on Upper confidence bound and local structure. Big Data Research 25(1):100208

    Article  Google Scholar 

  20. Ye W, Chen D (2022) Analysis of performance measure in Q Learning with UCB exploration. Mathematics 10(4):575

    Article  Google Scholar 

  21. Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence 30(1):2094–2100

  22. Sutton RS, McAllester D, Singh S et al (1999) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst 12:1057–1063

  23. Lillicrap TP, Hunt JJ, Pritzel A et al (2016) Continuous control with deep reinforcement learning. Comput Sci 8(6):A187

    Google Scholar 

  24. Plappert M, Houthooft R, Dhariwal P et al (2018) Parameter space noise for exploration. International Conference on Learning Representations. Vancouver Convention Center, Canada

  25. Colas C, Sigaud O, Oudeyer PY (2018) Gep-pg: Decoupling exploration and exploitation in deep reinforcement learning algorithms. International Conference on Machine Learning, PMLR: 1039–1048

  26. Zhang J, Wetzel N, Dorka N et al (2019) Scheduled intrinsic drive: A hierarchical take on intrinsically motivated exploration. arXiv preprint arXiv: 1903.07400

  27. Bougie N, Ichise R (2021) Fast and slow curiosity for high-level exploration in reinforcement learning. Appl Intell 51:1086–1107

    Article  Google Scholar 

  28. Bougie N, Ichise R (2022) Hierarchical learning from human preferences and curiosity. Appl Intell 52:7459–7479

    Article  Google Scholar 

  29. Beyer L, Vincent D, Teboul O et al (2019) MULEX: Disentangling exploitation from exploration in deep rl. arXiv preprint arXiv: 1907.00868

  30. Souissi B, Ghorbel A (2022) Upper confidence bound integrated genetic algorithm-optimized long short‐term memory network for click‐through rate prediction. Appl Stoch Models Bus Ind 38(3):475–496

    Article  MathSciNet  Google Scholar 

  31. Zheng L, Ratliff L (2020) Constrained upper confidence reinforcement learning. Learn Dynamics Control PMLR:620–629

    Google Scholar 

  32. Liang Y, Huang C, Bao X et al (2020) Sequential dynamic event recommendation in event-based social networks: an upper confidence bound approach. Inf Sci 542:1–23

    Article  MathSciNet  Google Scholar 

  33. Zhou D, Li L, Gu Q (2020) Neural contextual bandits with UCB-based exploration. International Conference on Machine Learning, PMLR: 11492–11502

  34. Gym Documentation. https://www.gymlibrary.dev/#gym-is-a-standard-api-for-reinforcement-learning-and-a-diverse-collection-of-reference-environments. Accessed 11 Jan 2023

  35. Liu H, Fang S, Zhang Z et al (2022) MFDNet: collaborative poses perception and matrix Fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460

    Article  Google Scholar 

  36. Liu H, Liu T, Zhang Z, et al (2022) ARHPE: asymmetric relation-aware representation learning for head pose estimation in industrial human–computer interaction. IEEE Trans Ind Inf 18(10):7107-7117

    Article  Google Scholar 

  37. Liu H, Liu T, Chen Y et al (2023) EHPE: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2022.3197364

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 62073245), the Natural Science Foundation of Shanghai (20ZR1440500), and Pudong New Area Science & Technology Development Fund (PKX2021-R07).

Author information

Authors and Affiliations

Authors

Contributions

Study design: Wenhao Zhang, Yaqing Song, Xiangpeng Liu, Qianqian Shangguan and Kang An; Conduct of the study: Wenhao Zhang; Writing—original draft: Wenhao Zhang and Yaqing Song; Supervision: Kang An, Xiangpeng Liu and Qianqian Shangguan; Writing—review and editing: Yaqing Song, Wenhao Zhang and Kang An. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kang An.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Song, Y., Liu, X. et al. A novel action decision method of deep reinforcement learning based on a neural network and confidence bound. Appl Intell 53, 21299–21311 (2023). https://doi.org/10.1007/s10489-023-04695-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04695-1

Keywords

Navigation