research-article

A Priority Experience Replay Sampling Method Based on Upper Confidence Bound

Authors:

Wei FengAuthors Info & Claims

ICDLT '19: Proceedings of the 2019 3rd International Conference on Deep Learning Technologies

Pages 38 - 41

https://doi.org/10.1145/3342999.3343012

Published: 05 July 2019 Publication History

Abstract

With the development of deep learning and computer computing ability, the end-to-end learning mechanism of reinforcement learning gradually shows its advantages in control and strategy and other application fields. One of the keys to the success of deep reinforcement learning comes from the establishment of sample experience pool and experience replay algorithm. In the experience replay mechanism of reinforcement learning, the traditional random sampling algorithm is inefficient in learning and fails to make full use of the information of the sample itself. In this paper, a new sample sampling algorithm for experience replay is proposed, which avoids the random sampling in the experience pool of the traditional algorithm and only takes some time to train the samples repeatedly. By integrating the characteristics of UCB algorithm and the diversity of samples required in the sampling process, it is proved that the algorithm can converge to the optimal solution at a fast speed in the continuous motion control grasping experiment of the mechanical arm.

References

[1]

Owens J D, Houston M, Luebke D, et al. GPU Computing{J}. Proceedings of the IEEE, 2008, 96(5):879--899.

[2]

Kruger J, Westermann R. Acceleration techniques for GPU-based volume rendering{C}// Visualization, Vis. 2003.

Digital Library

[3]

Chang L, Yu C, Yan L, et al. DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment{J}. 2016.

[4]

Dobhal T, Shitole V, Thomas G, et al. Human Activity Recognition using Binary Motion Image and Deep Learning {J}. Procedia Computer Science, 2015, 58:178--185.

[5]

He X, Li D. Deep Learning for Image-to-Text Generation: A Technical Overview{J}. IEEE Signal Processing Magazine, 2017, 34(6):109--116.

[6]

Dai Q, Chopp H, Pouyet E, et al. Adaptive Image Sampling using Deep Learning and its Application on X-Ray Fluorescence Image Reconstruction{J}. 2018.

[7]

Young T, Hazarika D, Poria S, et al. Recent Trends in Deep Learning Based Natural Language Processing {Review Article}{J}. IEEE Computational Intelligence Magazine, 2018, 13(3):55--75.

[8]

Lin C T, Lee C S G. Neural-Network-Based Fuzzy Logic Control and Decision System{J}. IEEE Trans on Computers, 1991, 40(12):1320--1336.

Digital Library

[9]

Wu Z, Xi W, Jiang Y G, et al. Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification{J}. 2015.

Digital Library

[10]

Tian H, Tao Y, Pouyanfar S, et al. Multimodal deep representation learning for video classification{J}. World Wide Web, 2018(6):1--17.

Digital Library

[11]

Hecht-Nielsen. Theory of the backpropagation neural network{C}// International Joint Conference on Neural Networks. 2002.

[12]

Hinton G, Vinyals O, Dean J. Distilling the Knowledge in a Neural Network{J}. Computer Science, 2015, 14(7):38--39.

[13]

Lin C T, Lee C S G. Neural-Network-Based Fuzzy Logic Control and Decision System{J}. IEEE Trans on Computers, 1991, 40(12):1320--1336.

Digital Library

[14]

Lawrence S, Giles C L, Tsoi A C, et al. Face recognition: a convolutional neural-network approach.{J}. IEEE Transactions on Neural Networks, 1997, 8(1):98--113.

Digital Library

[15]

Zhang G P. Time series forecasting using a hybrid ARIMA and neural network model{J}. Neurocomputing, 2003, 50(none):159--175.

[16]

Doya K, Samejima K, Katagiri K, et al. Multiple Model-Based Reinforcement Learning{J}. Neural Computation, 2010, 89(9):54--69.

Digital Library

[17]

Dayan P, Balleine B W. Reward, Motivation, and Reinforcement Learning{J}. Neuron, 2002, 36(2):285--298.

[18]

Tan M. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents{J}. Machine Learning Proceedings, 1993:330--337.

Digital Library

[19]

Taylor M E, Stone P. Transfer Learning for Reinforcement Learning Domains: A Survey{J}. Journal of Machine Learning Research, 2009, 10(10):1633--1685.

Digital Library

[20]

Lin L J. Self-improving reactive agents based on reinforcement learning, planning and teaching{J}. Machine Learning, 1992, 8(3-4):293--321.

Digital Library

[21]

Bruce J, Suenderhauf N, Mirowski P, et al. One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay{J}. 2017.

[22]

Ren Z, Dong D, Li H, et al. Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.{J}. IEEE Transactions on Neural Networks & Learning Systems, 2018, 29(6):2216--2226.

[23]

Quan, Zhou, Qiming, et al. Experience Replay for Least-Squares Policy Iteration{J}. IEEE/CAA Journal of Automatica Sinica, 2015, 1(3):274--281.

[24]

Genders W, Razavi S. Using a Deep Reinforcement Learning Agent for Traffic Signal Control{J}. 2016.

[25]

Tripp C, Shachter R. Backtracking for More Efficient Large Scale Dynamic Programming{C}// International Conference on Machine Learning & Applications. 2013.

Digital Library

[26]

Rolnick D, Ahuja A, Schwarz J, et al. Experience Replay for Continual Learning{J}. 2018.

[27]

Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning.{J}. Nature, 2015, 518(7540):529.

[28]

Kaelbling L P, Littman M L, Moore A P. Reinforcement Learning: A Survey{J}. J Artificial Intelligence Research, 1996, 4(1):237--285.

Digital Library

[29]

Adam S, Busoniu L, Babuska R. Experience Replay for Real-Time Reinforcement Learning Control{J}. IEEE Transactions on Systems Man & Cybernetics Part C, 2012, 42(2):201--212.

Digital Library

[30]

Foerster J, Nardelli N, Farquhar G, et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning{J}. 2017.

Digital Library

[31]

Wang Z, Bapst V, Heess N, et al. Sample Efficient Actor-Critic with Experience Replay{J}. 2016.

[32]

Zhao D, Zhang Q, Wang D, et al. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics.{J}. IEEE Transactions on Cybernetics, 2015, 46(3):854--865.

[33]

Modares H, Lewis F L, Naghibi-Sistani M B. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems {J}. Automatica, 2014, 50(1):193--202.

Digital Library

[34]

Isele D, Cosgun A. Selective Experience Replay for Lifelong Learning{J}. 2018.

[35]

Horgan D, Quan J, Budden D, et al. Distributed Prioritized Experience Replay{J}. 2018.

[36]

Pieters M, Wiering M A. Q-learning with experience replay in a dynamic environment{C}// Computational Intelligence. 2017.

[37]

Han S, Sung Y. Multi-Batch Experience Replay for Fast Convergence of Continuous Action Control{J}. 2017.

[38]

Schulze C, Schulze M. ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling{J}. 2018.

[39]

Zhai Jianwei. The research of algorithms and architectures on deep Q-network{D}. 2017.

[40]

Lin M, Zhu J, Sun Z. Impact of experience replay with fixed history length on Q-learning. Computer Engineering{J}, 2006, 32(6):7--10.

[41]

Zhu F, Wu W, et al., A deep Q-network method based on upper confidence bound experience sampling. Journal of Computer Research and Development{J}. 2018, v.55(08):100--111

Cited By

WANG FZHU XZHOU ZTANG Y(2024)Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environmentsChinese Journal of Aeronautics10.1016/j.cja.2023.09.03337:3(237-257)Online publication date: Mar-2024
https://doi.org/10.1016/j.cja.2023.09.033

Index Terms

A Priority Experience Replay Sampling Method Based on Upper Confidence Bound
1. Computing methodologies
  1. Artificial intelligence
    1. Control methods
      1. Computational control theory

Recommendations

Actor-critic with familiarity-based trajectory experience replay
Abstract
This paper aims to solve sample inefficiency in Asynchronous Advantage Actor-Critic (A3C). First, we design a new off-policy actor-critic algorithm, which combines actor-critic with experience replay to improve sample efficiency. Next, ...
rSoccer: A Framework for Studying Reinforcement Learning in Small and Very Small Size Robot Soccer
RoboCup 2021: Robot World Cup XXIV
Abstract
Reinforcement learning is an active research area with a vast number of applications in robotics, and the RoboCup competition is an interesting environment for studying and evaluating reinforcement learning methods. A known difficulty in applying ...
Off-policy RL algorithms can be sample-efficient for continuous control via sample multiple reuse
Abstract
Sample efficiency is one of the most critical issues for online reinforcement learning (RL). Existing methods achieve higher sample efficiency by adopting model-based methods, Q-ensemble, or better exploration mechanisms. We, instead, propose to ...
Highlights
- We propose a novel method of reusing samples in the off-policy RL algorithms, sample multiple reuse (SMR).
- We theoretically analyze SMR with Q-learning in the tabular case and show that it converges to the optimal Q value.
- We ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICDLT '19: Proceedings of the 2019 3rd International Conference on Deep Learning Technologies

July 2019

106 pages

ISBN:9781450371605

DOI:10.1145/3342999

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Nanyang Technological University
Chongqing University of Posts and Telecommunications

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Scientific Research Foundation of Science and Technology Department of Hubei Province
Scientific and Technological Research of Education Department of Hubei Province
Doctor Launching Fund of Hubei University of Technology

Conference

ICDLT 2019

ICDLT 2019: 2019 3rd International Conference on Deep Learning Technologies

July 5 - 7, 2019

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
104
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

WANG FZHU XZHOU ZTANG Y(2024)Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environmentsChinese Journal of Aeronautics10.1016/j.cja.2023.09.03337:3(237-257)Online publication date: Mar-2024
https://doi.org/10.1016/j.cja.2023.09.033

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten