research-article

Precise Evaluation for Continuous Action Control in Reinforcement Learning

Authors:

Wei FengAuthors Info & Claims

HPCCT '19: Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference

Pages 67 - 70

https://doi.org/10.1145/3341069.3341082

Published: 22 June 2019 Publication History

Abstract

With the development of deep learning, reinforcement learning also gradually into the eye, reinforcement learning has made remarkable achievements in games, go games and other fields, but most of the control problems involved in these fields or tasks are discrete action control with sufficient rewards. Continuous action control in reinforcement learning is closer to the actual control problem, and is considered as one of the main channels leading to artificial intelligence, so it is also one of the research hotspots of researchers. The traditional continuous control algorithm for reinforcement learning evaluates the network with multiple outputs of a single scalar value. In this paper, an accurate evaluation mechanism and corresponding objective function are proposed to accelerate the reinforcement learning training process. The experimental results show that the accurate evaluation of log-cosh objective function can make the robot arm grasp the task more quickly, converge and complete the training task.

References

[1]

Yu Kai, Jia Lei, Chen Yu-Qiang, Xu Wei. Deep learning: yesterday, today, and tomorrow. Journal of Computer Research and Development, 2013, 50(9): 1799--1804.

[2]

Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks//Proceedings of the 26th Annual Conference on Neural Information Processing Systems. Nevada, USA, 2012: 1097--1105.

Digital Library

[3]

He X, Li D. Deep Learning for Image-to-Text Generation: A Technical Overview. IEEE Signal Processing Magazine, 2017, 34(6):109--116.

[4]

Cho K, Merrienboer B V, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation//Proceedings of Conference on Empirical Methods in Natural Language Processing. Doha, Qatar, 2014: 1724--1734.

[5]

Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 1725--1732.

Digital Library

[6]

Li Ya-Xiong, Zhang Jian-Qiang, Pan Deng, Hu Dan. A study of speech recognition based on RNN-RBM language model. Journal of Computer Research and Development, 2014, 51(9): 1936--1944.

[7]

Sutton R S, Barto A G. Reinforcement learning: an introduction. Cambridge: MIT press, 1998.

Digital Library

[8]

Fu Qi-Ming, Liu Quan, Wang Hui, Xiao Fei, Yu Jun, Li Jiao. A novel off policy Q(λ) algorithm based on linear function approximation. Chinese Journal of Computers, 2014, 37(3): 677--686.

[9]

Wei Ying-Zi, Zhao Ming-Yang. A reinforcement learning-based approach to dynamic job-shop scheduling. Acta Automatica Sinica, 2005, 31(5): 765--771.

[10]

Tesauro G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 1994, 6(2): 215--219.

Digital Library

[11]

Kober J, Peters J. Reinforcement learning in robotics: a survey. International Journal of Robotics Research, 2013, 32(11): 1238--1274.

Digital Library

[12]

Kocsis L, Szepesvári C. Bandit based Monte-Carlo planning//Proceedings of the 17th European Conference on Machine Learning. Berlin, Germany, 2006: 282--293.

Digital Library

[13]

Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529--533.

[14]

Van H V, Guez A, Silver D. Deep reinforcement learning with double q-learning//Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, USA, 2016: 2094--2100.

Digital Library

[15]

Bellemare M G, Ostrovski G, Guez A, et al. Increasing the action gap: new operators for reinforcement learning//Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, USA, 2016: 1476--1483.

Digital Library

[16]

Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay//Proceedings of the 4th International Conference on Learning Representations. San Juan, Puerto Rico, 2016:322--355.

[17]

Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992, 8(3--4): 229--256.

Digital Library

[18]

Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms//Proceedings of the International Conference on Machine Learning. Beijing, China, 2014: 387--395.

Digital Library

[19]

Joel D, Niv Y, Ruppin E. Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Networks, 2002, 15(4):535--547.

Digital Library

[20]

Bahdanau D, Brakel P, Xu K, et al. An Actor-Critic Algorithm for Sequence Prediction. 2016.

[21]

Silver D, Lever G, Heess N, et al. Deterministic Policy Gradient Algorithms// International Conference on International Conference on Machine Learning. 2014.

Digital Library

[22]

Usunier N, Synnaeve G, Lin Z, et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks. 2016.

[23]

Potjans W, Morrison A, Diesmann M. A spiking neural network model of an actor-critic learning agent// A Spiking Neural Network Model of an Actor-Critic Learning Agent. 2009.

Digital Library

[24]

Lowe R, Wu Y, Tamar A, et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. 2017.

Digital Library

[25]

Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. Computer Science, 2015, 8(6):A187.

[26]

Yan D, Xi C, Houthooft R, et al. Benchmarking deep reinforcement learning for continuous control// International Conference on International Conference on Machine Learning. 2016.

Digital Library

[27]

Singh S, Jaakkola T, Littman M L, et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms. Machine Learning, 2000, 38(3):287--308.

Digital Library

[28]

Littman M L, Szepesvári C. A Generalized Reinforcement-Learning Model: Convergence and Applications.// 1996.

Digital Library

[29]

Wiering M A. Convergence and Divergence in Standard and Averaging Reinforcement Learning. Lecture Notes in Computer Science, 2004, 3201:477--488.

Digital Library

[30]

Papavassiliou V A, Russell S J. Convergence of Reinforcement Learning with General Function Approximators// Sixteenth International Joint Conference on Artificial Intelligence. 1999.

Digital Library

[31]

Chasparis G C, Shamma J S, Rantzer A. Nonconvergence to saddle boundary points under perturbed reinforcement learning. International Journal of Game Theory, 2015, 44(3):667--699.

Digital Library

[32]

Lim S H, Dejong G. Towards finite-sample convergence of direct reinforcement learning// European Conference on Machine Learning. 2005.

Digital Library

[33]

Fu J, Lin Z, Chen D, et al. Deep Reinforcement Learning for Accelerating the Convergence Rate. 2016.

[34]

Jie S, Lin H, Zhang K. Swarm robots reinforcement learning convergence Accuracy-based learning classifier systems with Gradient descent (XCS-GD)// International Conference on Computer Science & Network Technology. 2014.

[35]

Tutsoy O, Brown M. Convergence analysis of reinforcement learning approaches to humanoid locomotion// Ukacc International Conference on Control. 2013.

[36]

Iftekharuddin K M, Li Y. A reinforcement learning approach in rotated image recognition and its convergence analysis. Proceedings of SPIE - The International Society for Optical Engineering, 2006, 6310:63100N-63100N-12.

[37]

Hasselt H V, Wiering M A. Convergence of Model-Based Temporal Difference Learning for Control// IEEE International Symposium on Approximate Dynamic Programming & Reinforcement Learning. 2007.

[38]

Arai S, Xu H. Faster convergence to cooperative policy by autonomous detection of interference states in multiagent reinforcement learning// Pacific Rim International Conference on Trends in Artificial Intelligence. 2016.

Digital Library

[39]

Cedolin L, Nilson A H. A convergence study of iterative methods applied to finite element analysis of reinforced concrete. International Journal for Numerical Methods in Engineering, 2010, 12(3):437--451.

[40]

Kamal M A S, Murata J. Reinforcement learning for high-dimensional problems with symmetrical actions// IEEE International Conference on Systems. 2004.

Cited By

Yeolekar AMetta RHobbs CChakraborty S(2022)Checking Scheduling-Induced Violations of Control Safety PropertiesAutomated Technology for Verification and Analysis10.1007/978-3-031-19992-9_7(100-116)Online publication date: 25-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-19992-9_7

Index Terms

Precise Evaluation for Continuous Action Control in Reinforcement Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Control methods
      1. Computational control theory

Recommendations

Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning
Abstract
This paper presents a learning-based method that uses simulation data to learn an object manipulation task using two model-free reinforcement learning (RL) algorithms. The learning performance is compared across on-policy and off-policy algorithms:...
Continuous Control in Car Simulator with Deep Reinforcement Learning
CSAI '18: Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence

Deep reinforcement learning (DRL), which can be trained without abundant labeled data required in supervised learning, plays an important role in autonomous vehicle researches. According to action space, DRL can be further divided into two classes: ...
Evaluation of reinforcement learning techniques
IITM '10: Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia

Reinforcement learning is became one of the most important approaches to machine intelligence. Now RL is widely use by different research field as intelligent control, robotics and neuroscience. It provides us possible solution within unknown ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HPCCT '19: Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference

June 2019

293 pages

ISBN:9781450371858

DOI:10.1145/3341069

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Research Foundation of Science and Technology Department of Hubei Province
Doctor Launching Fund of Hubei University of Technology
Scientific and Technological Research of Education Department of Hubei Province

Conference

HPCCT 2019

HPCCT 2019: 2019 The 3rd High Performance Computing and Cluster Technologies Conference

June 22 - 24, 2019

Guangzhou, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
75
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yeolekar AMetta RHobbs CChakraborty S(2022)Checking Scheduling-Induced Violations of Control Safety PropertiesAutomated Technology for Verification and Analysis10.1007/978-3-031-19992-9_7(100-116)Online publication date: 25-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-19992-9_7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten