A novel action decision method of deep reinforcement learning based on a neural network and confidence bound

Zhang, Wenhao; Song, Yaqing; Liu, Xiangpeng; Shangguan, Qianqian; An, Kang

doi:10.1007/s10489-023-04695-1

A novel action decision method of deep reinforcement learning based on a neural network and confidence bound

Published: 29 May 2023

Volume 53, pages 21299–21311, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Wenhao Zhang¹,
Yaqing Song¹,
Xiangpeng Liu¹,
Qianqian Shangguan¹ &
…
Kang An¹

246 Accesses
Explore all metrics

Abstract

From the perspective of the deep reinforcement learning algorithm, the training effect of the agent will be affected because of the excessive randomness of the ε-greedy method. This paper proposes a novel action decision method to replace the ε-greedy method and avoid excessive randomness. First, a confidence bound span fitting model based on a deep neural network is proposed to fundamentally solve the problem that UCB cannot estimate the confidence bound span of each action in high-dimensional state space. Then, a confidence bound span balance model based on target value in reverse order is proposed. The parameters of the U network are updated after each action decision using the backpropagation of the neural network to balance the confidence bound span. Finally, an exploration-exploitation dynamic balance factor \(\alpha\) is introduced to balance exploration and exploitation in the training process. Experiments are conducted using the Nature DQN and Double DQN algorithms, and the results demonstrate that the proposed method achieves higher performance than the ε-greedy method under the basic algorithm and experimental environment of this paper. The method presented in this paper has significance for applying a confidence bound to solve complex reinforcement problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

More effective and efficient exploration via more refined gradient information

Article 22 August 2023

Skill Reward for Safe Deep Reinforcement Learning

Uncertainty-aware hierarchical reinforcement learning for long-horizon tasks

Article 06 October 2023

Data availability

The datasets analysed during the current study are available in the OpenAI Gym library, https://www.gymlibrary.dev/.

References

Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge
Stapelberg B, Malan KM (2020) A survey of benchmarking frameworks for reinforcement learning. South Afr Comput J 32(2):258–292
Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Article MATH Google Scholar
Upadhyay SK, Kumar A (2021) A novel approach for rice plant diseases classification with deep convolutional neural network. Int J Inform Technol 14(1):185–199
MathSciNet Google Scholar
Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Senior AW, Evans R, Jumper J et al (2020) Improved protein structure prediction using potentials from deep learning. Nature 577(7792):706–710
Article Google Scholar
Pang K, Zhang Y, Yin C (2020) A decision-making method for Self-driving based on deep reinforcement learning. J Phys: Conf Ser 1576(1):012025
Lee J, Koh H, Choe HJ (2021) Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning. Appl Intell 51(8):6202–6223
Article Google Scholar
Kotsiopoulos T, Sarigiannidis P, Ioannidis D et al (2021) Machine learning and deep learning in smart manufacturing: the smart grid paradigm. Comput Sci Rev 40: 00341
Hua J, Zeng L, Li G et al (2021) Learning for a robot: deep reinforcement learning, imitation learning, transfer learning. Sensors 21(4):1278
Article Google Scholar
Valarezo Añazco E, Rivera Lopez P, Park N et al (2021) Natural object manipulation using anthropomorphic robotic hand through deep reinforcement learning and deep grasping probability network. Appl Intell 51:1041–1055
Article Google Scholar
Liu T, Wang J, Yang B et al (2021) NGDNet: nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220
Article Google Scholar
Liu H, Nie H, Zhang Z et al (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322
Article Google Scholar
Liu H, Zheng C, Li D, et al (2021) EDMF: efficient deep matrix factorization with review feature learning for industrial recommender system. IEEE Trans Ind Inf 18(7):4361-4371
Article Google Scholar
Nian R, Liu J, Huang B (2020) A review on reinforcement learning: introduction and applications in industrial process control. Comput Chem Eng 139:106886
Article Google Scholar
De Ath G, Everson RM, Rahat AA, et al (2021) Greed is good: exploration and exploitation trade-offs in bayesian optimisation. ACM Trans Evol Learn Optim 1(1):1-22
Article Google Scholar
Li Q, Zhong J, Cao Z et al (2020) Optimizing streaming graph partitioning via a heuristic greedy method and caching strategy. Optim Methods Softw 35(6):1144–1159
Article MathSciNet MATH Google Scholar
Yao Y, Wang HY (2019) Optimal subsampling for softmax regression. Stat Pap 60(2):585–599
Article MathSciNet MATH Google Scholar
Alshahrani M, Zhu F, Mekouar S et al (2021) Identification of Top-K Influencers based on Upper confidence bound and local structure. Big Data Research 25(1):100208
Article Google Scholar
Ye W, Chen D (2022) Analysis of performance measure in Q Learning with UCB exploration. Mathematics 10(4):575
Article Google Scholar
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence 30(1):2094–2100
Sutton RS, McAllester D, Singh S et al (1999) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst 12:1057–1063
Lillicrap TP, Hunt JJ, Pritzel A et al (2016) Continuous control with deep reinforcement learning. Comput Sci 8(6):A187
Google Scholar
Plappert M, Houthooft R, Dhariwal P et al (2018) Parameter space noise for exploration. International Conference on Learning Representations. Vancouver Convention Center, Canada
Colas C, Sigaud O, Oudeyer PY (2018) Gep-pg: Decoupling exploration and exploitation in deep reinforcement learning algorithms. International Conference on Machine Learning, PMLR: 1039–1048
Zhang J, Wetzel N, Dorka N et al (2019) Scheduled intrinsic drive: A hierarchical take on intrinsically motivated exploration. arXiv preprint arXiv: 1903.07400
Bougie N, Ichise R (2021) Fast and slow curiosity for high-level exploration in reinforcement learning. Appl Intell 51:1086–1107
Article Google Scholar
Bougie N, Ichise R (2022) Hierarchical learning from human preferences and curiosity. Appl Intell 52:7459–7479
Article Google Scholar
Beyer L, Vincent D, Teboul O et al (2019) MULEX: Disentangling exploitation from exploration in deep rl. arXiv preprint arXiv: 1907.00868
Souissi B, Ghorbel A (2022) Upper confidence bound integrated genetic algorithm-optimized long short‐term memory network for click‐through rate prediction. Appl Stoch Models Bus Ind 38(3):475–496
Article MathSciNet Google Scholar
Zheng L, Ratliff L (2020) Constrained upper confidence reinforcement learning. Learn Dynamics Control PMLR:620–629
Google Scholar
Liang Y, Huang C, Bao X et al (2020) Sequential dynamic event recommendation in event-based social networks: an upper confidence bound approach. Inf Sci 542:1–23
Article MathSciNet Google Scholar
Zhou D, Li L, Gu Q (2020) Neural contextual bandits with UCB-based exploration. International Conference on Machine Learning, PMLR: 11492–11502
Gym Documentation. https://www.gymlibrary.dev/#gym-is-a-standard-api-for-reinforcement-learning-and-a-diverse-collection-of-reference-environments. Accessed 11 Jan 2023
Liu H, Fang S, Zhang Z et al (2022) MFDNet: collaborative poses perception and matrix Fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460
Article Google Scholar
Liu H, Liu T, Zhang Z, et al (2022) ARHPE: asymmetric relation-aware representation learning for head pose estimation in industrial human–computer interaction. IEEE Trans Ind Inf 18(10):7107-7117
Article Google Scholar
Liu H, Liu T, Chen Y et al (2023) EHPE: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2022.3197364
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 62073245), the Natural Science Foundation of Shanghai (20ZR1440500), and Pudong New Area Science & Technology Development Fund (PKX2021-R07).

Author information

Wenhao Zhang and Yaqing Song contributed to the work equally and should be regarded as co-first authors.

Authors and Affiliations

The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, 201418, Shanghai, China
Wenhao Zhang, Yaqing Song, Xiangpeng Liu, Qianqian Shangguan & Kang An

Authors

Wenhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yaqing Song
View author publications
You can also search for this author in PubMed Google Scholar
Xiangpeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qianqian Shangguan
View author publications
You can also search for this author in PubMed Google Scholar
Kang An
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Study design: Wenhao Zhang, Yaqing Song, Xiangpeng Liu, Qianqian Shangguan and Kang An; Conduct of the study: Wenhao Zhang; Writing—original draft: Wenhao Zhang and Yaqing Song; Supervision: Kang An, Xiangpeng Liu and Qianqian Shangguan; Writing—review and editing: Yaqing Song, Wenhao Zhang and Kang An. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kang An.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, W., Song, Y., Liu, X. et al. A novel action decision method of deep reinforcement learning based on a neural network and confidence bound. Appl Intell 53, 21299–21311 (2023). https://doi.org/10.1007/s10489-023-04695-1

Download citation

Accepted: 05 May 2023
Published: 29 May 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10489-023-04695-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel action decision method of deep reinforcement learning based on a neural network and confidence bound

Abstract

Access this article

Similar content being viewed by others

More effective and efficient exploration via more refined gradient information

Skill Reward for Safe Deep Reinforcement Learning

Uncertainty-aware hierarchical reinforcement learning for long-horizon tasks

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel action decision method of deep reinforcement learning based on a neural network and confidence bound

Abstract

Access this article

Similar content being viewed by others

More effective and efficient exploration via more refined gradient information

Skill Reward for Safe Deep Reinforcement Learning

Uncertainty-aware hierarchical reinforcement learning for long-horizon tasks

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation