skip to main content
10.1145/3387168.3387199acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvispConference Proceedingsconference-collections
research-article

Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent

Authors Info & Claims
Published:25 May 2020Publication History

ABSTRACT

In this current research, Twin-Delayed DDPG (TD3) algorithm has been used to solve the most challenging virtual Artificial Intelligence application by training a 4-ant-legged robot as an Intelligent Agent to run across a field. Twin-Delayed DDPG (TD3) is an incredibly smart AI model of a Deep Reinforcement Learning which combines the state-of-the-art methods in Artificial Intelligence. These includes Policy gradient, Actor-Critics, and continuous Double Deep Q-Learning. These Deep Reinforcement Learning approaches trained an Intelligent agent to interact with an environment with automatic feature engineering, that is, necessitating minimal domain knowledge. For the implementation of the TD3, we used a two-layer feedforward neural network of 400 and 300 hidden nodes respectively, with Rectified Linear Units (ReLU) as an activation function between each layer for both the Actor and Critics. We, then added a final tanh unit after the output of the Actor. The Critic receives both the state and action as input to the first layer. Both the network parameters were updated using Adam optimizer. The idea behind the Twin-Delayed DDPG (TD3) is to reduce overestimation bias in Deep Q-Learning with discrete actions which are ineffective in an Actor-Critic domain setting. Based on the Maximum Average Reward over the evaluation time-step, our model achieved an approximate maximum of 2364. Therefore, we can truly say that, TD3 has obviously improved on both the learning speed and performance of the Deep Deterministic Policy Gradient (DDPG) in a challenging environment in a continuous control domain.

References

  1. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. Openai gym, 2016.Google ScholarGoogle Scholar
  2. Erwin Coumans, Yunfei Bai and Jasmine Hsu. PyBullet. Available on: https://pypi.org/project/pybullet/. Retrieved: 5/30/2019Google ScholarGoogle Scholar
  3. Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290, 2018Google ScholarGoogle Scholar
  4. Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google ScholarGoogle Scholar
  5. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015. Available online: https://arxiv.org/abs/1509.02971Google ScholarGoogle Scholar
  6. Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan, and Riedmiller, Martin. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.Google ScholarGoogle Scholar
  7. R.S. Sutton, D.A. McAllester, S.P. Singh, Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, in: Advances in Neural Information Processing Systems, 2000, pp. 1057--1063.Google ScholarGoogle Scholar
  8. Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. Trust region policy optimization. In International Conference on Machine Learning, pp. 1889--1897, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.Google ScholarGoogle Scholar
  10. Scott Fujimoto, Herke van Hoof and David Meger, Addressing Function Approximation Error in Actor-Critic Methods. https://arxiv.org/pdf/1802.09477.pdf, 2018Google ScholarGoogle Scholar
  11. Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas, Wierstra, Daan, and Riedmiller, Martin. Deterministic policy gradient algorithms. In ICML, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sutton, S. Richard and Andrew G. Barto. Reinforcement learning: An introduction. 2nd edition, The MIT Press, Cambridge, Massachusetts, ISBN 9780262039246. Available online: http://www.incompleteideas.net/book/the-book.html, 2018.Google ScholarGoogle Scholar
  13. Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In AAAI, pp. 2094--2100, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., and Ba, J. Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In Advances in Neural Information Processing Systems, pp. 5285--5294, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing
      August 2019
      584 pages
      ISBN:9781450376259
      DOI:10.1145/3387168

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 May 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      ICVISP 2019 Paper Acceptance Rate126of277submissions,45%Overall Acceptance Rate186of424submissions,44%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader