skip to main content
10.1145/3342999.3343012acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdltConference Proceedingsconference-collections
research-article

A Priority Experience Replay Sampling Method Based on Upper Confidence Bound

Published: 05 July 2019 Publication History

Abstract

With the development of deep learning and computer computing ability, the end-to-end learning mechanism of reinforcement learning gradually shows its advantages in control and strategy and other application fields. One of the keys to the success of deep reinforcement learning comes from the establishment of sample experience pool and experience replay algorithm. In the experience replay mechanism of reinforcement learning, the traditional random sampling algorithm is inefficient in learning and fails to make full use of the information of the sample itself. In this paper, a new sample sampling algorithm for experience replay is proposed, which avoids the random sampling in the experience pool of the traditional algorithm and only takes some time to train the samples repeatedly. By integrating the characteristics of UCB algorithm and the diversity of samples required in the sampling process, it is proved that the algorithm can converge to the optimal solution at a fast speed in the continuous motion control grasping experiment of the mechanical arm.

References

[1]
Owens J D, Houston M, Luebke D, et al. GPU Computing{J}. Proceedings of the IEEE, 2008, 96(5):879--899.
[2]
Kruger J, Westermann R. Acceleration techniques for GPU-based volume rendering{C}// Visualization, Vis. 2003.
[3]
Chang L, Yu C, Yan L, et al. DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment{J}. 2016.
[4]
Dobhal T, Shitole V, Thomas G, et al. Human Activity Recognition using Binary Motion Image and Deep Learning {J}. Procedia Computer Science, 2015, 58:178--185.
[5]
He X, Li D. Deep Learning for Image-to-Text Generation: A Technical Overview{J}. IEEE Signal Processing Magazine, 2017, 34(6):109--116.
[6]
Dai Q, Chopp H, Pouyet E, et al. Adaptive Image Sampling using Deep Learning and its Application on X-Ray Fluorescence Image Reconstruction{J}. 2018.
[7]
Young T, Hazarika D, Poria S, et al. Recent Trends in Deep Learning Based Natural Language Processing {Review Article}{J}. IEEE Computational Intelligence Magazine, 2018, 13(3):55--75.
[8]
Lin C T, Lee C S G. Neural-Network-Based Fuzzy Logic Control and Decision System{J}. IEEE Trans on Computers, 1991, 40(12):1320--1336.
[9]
Wu Z, Xi W, Jiang Y G, et al. Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification{J}. 2015.
[10]
Tian H, Tao Y, Pouyanfar S, et al. Multimodal deep representation learning for video classification{J}. World Wide Web, 2018(6):1--17.
[11]
Hecht-Nielsen. Theory of the backpropagation neural network{C}// International Joint Conference on Neural Networks. 2002.
[12]
Hinton G, Vinyals O, Dean J. Distilling the Knowledge in a Neural Network{J}. Computer Science, 2015, 14(7):38--39.
[13]
Lin C T, Lee C S G. Neural-Network-Based Fuzzy Logic Control and Decision System{J}. IEEE Trans on Computers, 1991, 40(12):1320--1336.
[14]
Lawrence S, Giles C L, Tsoi A C, et al. Face recognition: a convolutional neural-network approach.{J}. IEEE Transactions on Neural Networks, 1997, 8(1):98--113.
[15]
Zhang G P. Time series forecasting using a hybrid ARIMA and neural network model{J}. Neurocomputing, 2003, 50(none):159--175.
[16]
Doya K, Samejima K, Katagiri K, et al. Multiple Model-Based Reinforcement Learning{J}. Neural Computation, 2010, 89(9):54--69.
[17]
Dayan P, Balleine B W. Reward, Motivation, and Reinforcement Learning{J}. Neuron, 2002, 36(2):285--298.
[18]
Tan M. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents{J}. Machine Learning Proceedings, 1993:330--337.
[19]
Taylor M E, Stone P. Transfer Learning for Reinforcement Learning Domains: A Survey{J}. Journal of Machine Learning Research, 2009, 10(10):1633--1685.
[20]
Lin L J. Self-improving reactive agents based on reinforcement learning, planning and teaching{J}. Machine Learning, 1992, 8(3-4):293--321.
[21]
Bruce J, Suenderhauf N, Mirowski P, et al. One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay{J}. 2017.
[22]
Ren Z, Dong D, Li H, et al. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.{J}. IEEE Transactions on Neural Networks & Learning Systems, 2018, 29(6):2216--2226.
[23]
Quan, Zhou, Qiming, et al. Experience Replay for Least-Squares Policy Iteration{J}. IEEE/CAA Journal of Automatica Sinica, 2015, 1(3):274--281.
[24]
Genders W, Razavi S. Using a Deep Reinforcement Learning Agent for Traffic Signal Control{J}. 2016.
[25]
Tripp C, Shachter R. Backtracking for More Efficient Large Scale Dynamic Programming{C}// International Conference on Machine Learning & Applications. 2013.
[26]
Rolnick D, Ahuja A, Schwarz J, et al. Experience Replay for Continual Learning{J}. 2018.
[27]
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning.{J}. Nature, 2015, 518(7540):529.
[28]
Kaelbling L P, Littman M L, Moore A P. Reinforcement Learning: A Survey{J}. J Artificial Intelligence Research, 1996, 4(1):237--285.
[29]
Adam S, Busoniu L, Babuska R. Experience Replay for Real-Time Reinforcement Learning Control{J}. IEEE Transactions on Systems Man & Cybernetics Part C, 2012, 42(2):201--212.
[30]
Foerster J, Nardelli N, Farquhar G, et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning{J}. 2017.
[31]
Wang Z, Bapst V, Heess N, et al. Sample Efficient Actor-Critic with Experience Replay{J}. 2016.
[32]
Zhao D, Zhang Q, Wang D, et al. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics.{J}. IEEE Transactions on Cybernetics, 2015, 46(3):854--865.
[33]
Modares H, Lewis F L, Naghibi-Sistani M B. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems {J}. Automatica, 2014, 50(1):193--202.
[34]
Isele D, Cosgun A. Selective Experience Replay for Lifelong Learning{J}. 2018.
[35]
Horgan D, Quan J, Budden D, et al. Distributed Prioritized Experience Replay{J}. 2018.
[36]
Pieters M, Wiering M A. Q-learning with experience replay in a dynamic environment{C}// Computational Intelligence. 2017.
[37]
Han S, Sung Y. Multi-Batch Experience Replay for Fast Convergence of Continuous Action Control{J}. 2017.
[38]
Schulze C, Schulze M. ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling{J}. 2018.
[39]
Zhai Jianwei. The research of algorithms and architectures on deep Q-network{D}. 2017.
[40]
Lin M, Zhu J, Sun Z. Impact of experience replay with fixed history length on Q-learning. Computer Engineering{J}, 2006, 32(6):7--10.
[41]
Zhu F, Wu W, et al., A deep Q-network method based on upper confidence bound experience sampling. Journal of Computer Research and Development{J}. 2018, v.55(08):100--111

Cited By

View all
  • (2024)Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environmentsChinese Journal of Aeronautics10.1016/j.cja.2023.09.03337:3(237-257)Online publication date: Mar-2024

Index Terms

  1. A Priority Experience Replay Sampling Method Based on Upper Confidence Bound

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICDLT '19: Proceedings of the 2019 3rd International Conference on Deep Learning Technologies
    July 2019
    106 pages
    ISBN:9781450371605
    DOI:10.1145/3342999
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Nanyang Technological University
    • Chongqing University of Posts and Telecommunications

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Continuous Control
    2. Reinforcement Learning
    3. Scara Robot
    4. Upper Confidence Bound

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Scientific Research Foundation of Science and Technology Department of Hubei Province
    • Scientific and Technological Research of Education Department of Hubei Province
    • Doctor Launching Fund of Hubei University of Technology

    Conference

    ICDLT 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environmentsChinese Journal of Aeronautics10.1016/j.cja.2023.09.03337:3(237-257)Online publication date: Mar-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media