research-article

Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent

Authors:
Stephen Dankwa

Automation Engineering, University of Electronic Science and Technology of China

Automation Engineering, University of Electronic Science and Technology of China
View Profile

,
Wenfeng Zheng

Automation Engineering, University of Electronic Science and Technology of China

Automation Engineering, University of Electronic Science and Technology of China
View Profile

ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal ProcessingAugust 2019Article No.: 66Pages 1–5https://doi.org/10.1145/3387168.3387199

Published:25 May 2020Publication History

ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing

Pages 1–5

ABSTRACT

In this current research, Twin-Delayed DDPG (TD3) algorithm has been used to solve the most challenging virtual Artificial Intelligence application by training a 4-ant-legged robot as an Intelligent Agent to run across a field. Twin-Delayed DDPG (TD3) is an incredibly smart AI model of a Deep Reinforcement Learning which combines the state-of-the-art methods in Artificial Intelligence. These includes Policy gradient, Actor-Critics, and continuous Double Deep Q-Learning. These Deep Reinforcement Learning approaches trained an Intelligent agent to interact with an environment with automatic feature engineering, that is, necessitating minimal domain knowledge. For the implementation of the TD3, we used a two-layer feedforward neural network of 400 and 300 hidden nodes respectively, with Rectified Linear Units (ReLU) as an activation function between each layer for both the Actor and Critics. We, then added a final tanh unit after the output of the Actor. The Critic receives both the state and action as input to the first layer. Both the network parameters were updated using Adam optimizer. The idea behind the Twin-Delayed DDPG (TD3) is to reduce overestimation bias in Deep Q-Learning with discrete actions which are ineffective in an Actor-Critic domain setting. Based on the Maximum Average Reward over the evaluation time-step, our model achieved an approximate maximum of 2364. Therefore, we can truly say that, TD3 has obviously improved on both the learning speed and performance of the Deep Deterministic Policy Gradient (DDPG) in a challenging environment in a continuous control domain.

References

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. Openai gym, 2016.Google Scholar
Erwin Coumans, Yunfei Bai and Jasmine Hsu. PyBullet. Available on: https://pypi.org/project/pybullet/. Retrieved: 5/30/2019Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290, 2018Google Scholar
Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google Scholar
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015. Available online: https://arxiv.org/abs/1509.02971Google Scholar
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan, and Riedmiller, Martin. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.Google Scholar
R.S. Sutton, D.A. McAllester, S.P. Singh, Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, in: Advances in Neural Information Processing Systems, 2000, pp. 1057--1063.Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. Trust region policy optimization. In International Conference on Machine Learning, pp. 1889--1897, 2015.Google ScholarDigital Library
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.Google Scholar
Scott Fujimoto, Herke van Hoof and David Meger, Addressing Function Approximation Error in Actor-Critic Methods. https://arxiv.org/pdf/1802.09477.pdf, 2018Google Scholar
Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas, Wierstra, Daan, and Riedmiller, Martin. Deterministic policy gradient algorithms. In ICML, 2014.Google ScholarDigital Library
Sutton, S. Richard and Andrew G. Barto. Reinforcement learning: An introduction. 2nd edition, The MIT Press, Cambridge, Massachusetts, ISBN 9780262039246. Available online: http://www.incompleteideas.net/book/the-book.html, 2018.Google Scholar
Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In AAAI, pp. 2094--2100, 2016.Google ScholarDigital Library
Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., and Ba, J. Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In Advances in Neural Information Processing Systems, pp. 5285--5294, 2017.Google ScholarDigital Library

Index Terms

Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation

Recommendations

Research on Optimization Strategy Based on Improved DDPG Algorithm in WSN
ICDSP '20: Proceedings of the 2020 4th International Conference on Digital Signal Processing

Deep Deterministic Policy Gradient (DDPG) is a combination of Actor-Critic and DQN algorithms. It is one of the most classic algorithms in deep-strong chemistry and can be applied to wireless sensor networks (WSN). Aiming at the problems of low training ...
Read More
Evaluation of Deep Reinforcement Learning Based Stock Trading
Information Retrieval
Abstract
Stock is one of the most important targets in investment. However, it is challenging to manually design a profitable strategy in the highly dynamic and complex stock market. Modern portfolio management usually employs quantitative trading, which ...
Read More
Structural relational inference actor-critic for multi-agent reinforcement learning
Highlights
- A novel MARL algorithm involving the interaction relationship between agents.
- ...
Abstract
Multi-agent reinforcement learning (MARL) is essential for a wide range of high-dimensional scenarios and complicated tasks with multiple agents. Many attempts have been made for agents with prior domain knowledge and predefined ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing
August 2019
584 pages
ISBN:9781450376259
DOI:10.1145/3387168

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 May 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Actor-Critic
Artificial Intelligence
Deep Reinforcement Learning
Twin-Delayed Deep Deterministic Policy Gradient
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
ICVISP 2019 Paper Acceptance Rate126of277submissions,45%Overall Acceptance Rate186of424submissions,44%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 1,592
  Total Downloads
- Downloads (Last 12 months)596
- Downloads (Last 6 weeks)64
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent

ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Research on Optimization Strategy Based on Improved DDPG Algorithm in WSN

Evaluation of Deep Reinforcement Learning Based Stock Trading

Structural relational inference actor-critic for multi-agent reinforcement learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent

ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Research on Optimization Strategy Based on Improved DDPG Algorithm in WSN

Evaluation of Deep Reinforcement Learning Based Stock Trading

Structural relational inference actor-critic for multi-agent reinforcement learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media