skip to main content
10.1145/3341069.3341082acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcctConference Proceedingsconference-collections
research-article

Precise Evaluation for Continuous Action Control in Reinforcement Learning

Published: 22 June 2019 Publication History

Abstract

With the development of deep learning, reinforcement learning also gradually into the eye, reinforcement learning has made remarkable achievements in games, go games and other fields, but most of the control problems involved in these fields or tasks are discrete action control with sufficient rewards. Continuous action control in reinforcement learning is closer to the actual control problem, and is considered as one of the main channels leading to artificial intelligence, so it is also one of the research hotspots of researchers. The traditional continuous control algorithm for reinforcement learning evaluates the network with multiple outputs of a single scalar value. In this paper, an accurate evaluation mechanism and corresponding objective function are proposed to accelerate the reinforcement learning training process. The experimental results show that the accurate evaluation of log-cosh objective function can make the robot arm grasp the task more quickly, converge and complete the training task.

References

[1]
Yu Kai, Jia Lei, Chen Yu-Qiang, Xu Wei. Deep learning: yesterday, today, and tomorrow. Journal of Computer Research and Development, 2013, 50(9): 1799--1804.
[2]
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks//Proceedings of the 26th Annual Conference on Neural Information Processing Systems. Nevada, USA, 2012: 1097--1105.
[3]
He X, Li D. Deep Learning for Image-to-Text Generation: A Technical Overview. IEEE Signal Processing Magazine, 2017, 34(6):109--116.
[4]
Cho K, Merrienboer B V, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation//Proceedings of Conference on Empirical Methods in Natural Language Processing. Doha, Qatar, 2014: 1724--1734.
[5]
Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 1725--1732.
[6]
Li Ya-Xiong, Zhang Jian-Qiang, Pan Deng, Hu Dan. A study of speech recognition based on RNN-RBM language model. Journal of Computer Research and Development, 2014, 51(9): 1936--1944.
[7]
Sutton R S, Barto A G. Reinforcement learning: an introduction. Cambridge: MIT press, 1998.
[8]
Fu Qi-Ming, Liu Quan, Wang Hui, Xiao Fei, Yu Jun, Li Jiao. A novel off policy Q(λ) algorithm based on linear function approximation. Chinese Journal of Computers, 2014, 37(3): 677--686.
[9]
Wei Ying-Zi, Zhao Ming-Yang. A reinforcement learning-based approach to dynamic job-shop scheduling. Acta Automatica Sinica, 2005, 31(5): 765--771.
[10]
Tesauro G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 1994, 6(2): 215--219.
[11]
Kober J, Peters J. Reinforcement learning in robotics: a survey. International Journal of Robotics Research, 2013, 32(11): 1238--1274.
[12]
Kocsis L, Szepesvári C. Bandit based Monte-Carlo planning//Proceedings of the 17th European Conference on Machine Learning. Berlin, Germany, 2006: 282--293.
[13]
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529--533.
[14]
Van H V, Guez A, Silver D. Deep reinforcement learning with double q-learning//Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, USA, 2016: 2094--2100.
[15]
Bellemare M G, Ostrovski G, Guez A, et al. Increasing the action gap: new operators for reinforcement learning//Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, USA, 2016: 1476--1483.
[16]
Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay//Proceedings of the 4th International Conference on Learning Representations. San Juan, Puerto Rico, 2016:322--355.
[17]
Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992, 8(3--4): 229--256.
[18]
Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms//Proceedings of the International Conference on Machine Learning. Beijing, China, 2014: 387--395.
[19]
Joel D, Niv Y, Ruppin E. Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Networks, 2002, 15(4):535--547.
[20]
Bahdanau D, Brakel P, Xu K, et al. An Actor-Critic Algorithm for Sequence Prediction. 2016.
[21]
Silver D, Lever G, Heess N, et al. Deterministic Policy Gradient Algorithms// International Conference on International Conference on Machine Learning. 2014.
[22]
Usunier N, Synnaeve G, Lin Z, et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks. 2016.
[23]
Potjans W, Morrison A, Diesmann M. A spiking neural network model of an actor-critic learning agent// A Spiking Neural Network Model of an Actor-Critic Learning Agent. 2009.
[24]
Lowe R, Wu Y, Tamar A, et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. 2017.
[25]
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. Computer Science, 2015, 8(6):A187.
[26]
Yan D, Xi C, Houthooft R, et al. Benchmarking deep reinforcement learning for continuous control// International Conference on International Conference on Machine Learning. 2016.
[27]
Singh S, Jaakkola T, Littman M L, et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms. Machine Learning, 2000, 38(3):287--308.
[28]
Littman M L, Szepesvári C. A Generalized Reinforcement-Learning Model: Convergence and Applications.// 1996.
[29]
Wiering M A. Convergence and Divergence in Standard and Averaging Reinforcement Learning. Lecture Notes in Computer Science, 2004, 3201:477--488.
[30]
Papavassiliou V A, Russell S J. Convergence of Reinforcement Learning with General Function Approximators// Sixteenth International Joint Conference on Artificial Intelligence. 1999.
[31]
Chasparis G C, Shamma J S, Rantzer A. Nonconvergence to saddle boundary points under perturbed reinforcement learning. International Journal of Game Theory, 2015, 44(3):667--699.
[32]
Lim S H, Dejong G. Towards finite-sample convergence of direct reinforcement learning// European Conference on Machine Learning. 2005.
[33]
Fu J, Lin Z, Chen D, et al. Deep Reinforcement Learning for Accelerating the Convergence Rate. 2016.
[34]
Jie S, Lin H, Zhang K. Swarm robots reinforcement learning convergence Accuracy-based learning classifier systems with Gradient descent (XCS-GD)// International Conference on Computer Science & Network Technology. 2014.
[35]
Tutsoy O, Brown M. Convergence analysis of reinforcement learning approaches to humanoid locomotion// Ukacc International Conference on Control. 2013.
[36]
Iftekharuddin K M, Li Y. A reinforcement learning approach in rotated image recognition and its convergence analysis. Proceedings of SPIE - The International Society for Optical Engineering, 2006, 6310:63100N-63100N-12.
[37]
Hasselt H V, Wiering M A. Convergence of Model-Based Temporal Difference Learning for Control// IEEE International Symposium on Approximate Dynamic Programming & Reinforcement Learning. 2007.
[38]
Arai S, Xu H. Faster convergence to cooperative policy by autonomous detection of interference states in multiagent reinforcement learning// Pacific Rim International Conference on Trends in Artificial Intelligence. 2016.
[39]
Cedolin L, Nilson A H. A convergence study of iterative methods applied to finite element analysis of reinforced concrete. International Journal for Numerical Methods in Engineering, 2010, 12(3):437--451.
[40]
Kamal M A S, Murata J. Reinforcement learning for high-dimensional problems with symmetrical actions// IEEE International Conference on Systems. 2004.

Cited By

View all
  • (2022)Checking Scheduling-Induced Violations of Control Safety PropertiesAutomated Technology for Verification and Analysis10.1007/978-3-031-19992-9_7(100-116)Online publication date: 25-Oct-2022

Index Terms

  1. Precise Evaluation for Continuous Action Control in Reinforcement Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HPCCT '19: Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference
    June 2019
    293 pages
    ISBN:9781450371858
    DOI:10.1145/3341069
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Continuous Control
    2. Precise Evaluation
    3. Reinforcement Learning
    4. Robot

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Research Foundation of Science and Technology Department of Hubei Province
    • Doctor Launching Fund of Hubei University of Technology
    • Scientific and Technological Research of Education Department of Hubei Province

    Conference

    HPCCT 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Checking Scheduling-Induced Violations of Control Safety PropertiesAutomated Technology for Verification and Analysis10.1007/978-3-031-19992-9_7(100-116)Online publication date: 25-Oct-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media