ABSTRACT
In this current research, Twin-Delayed DDPG (TD3) algorithm has been used to solve the most challenging virtual Artificial Intelligence application by training a 4-ant-legged robot as an Intelligent Agent to run across a field. Twin-Delayed DDPG (TD3) is an incredibly smart AI model of a Deep Reinforcement Learning which combines the state-of-the-art methods in Artificial Intelligence. These includes Policy gradient, Actor-Critics, and continuous Double Deep Q-Learning. These Deep Reinforcement Learning approaches trained an Intelligent agent to interact with an environment with automatic feature engineering, that is, necessitating minimal domain knowledge. For the implementation of the TD3, we used a two-layer feedforward neural network of 400 and 300 hidden nodes respectively, with Rectified Linear Units (ReLU) as an activation function between each layer for both the Actor and Critics. We, then added a final tanh unit after the output of the Actor. The Critic receives both the state and action as input to the first layer. Both the network parameters were updated using Adam optimizer. The idea behind the Twin-Delayed DDPG (TD3) is to reduce overestimation bias in Deep Q-Learning with discrete actions which are ineffective in an Actor-Critic domain setting. Based on the Maximum Average Reward over the evaluation time-step, our model achieved an approximate maximum of 2364. Therefore, we can truly say that, TD3 has obviously improved on both the learning speed and performance of the Deep Deterministic Policy Gradient (DDPG) in a challenging environment in a continuous control domain.
- Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. Openai gym, 2016.Google Scholar
- Erwin Coumans, Yunfei Bai and Jasmine Hsu. PyBullet. Available on: https://pypi.org/project/pybullet/. Retrieved: 5/30/2019Google Scholar
- Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290, 2018Google Scholar
- Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google Scholar
- Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015. Available online: https://arxiv.org/abs/1509.02971Google Scholar
- Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan, and Riedmiller, Martin. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.Google Scholar
- R.S. Sutton, D.A. McAllester, S.P. Singh, Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, in: Advances in Neural Information Processing Systems, 2000, pp. 1057--1063.Google Scholar
- Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. Trust region policy optimization. In International Conference on Machine Learning, pp. 1889--1897, 2015.Google ScholarDigital Library
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.Google Scholar
- Scott Fujimoto, Herke van Hoof and David Meger, Addressing Function Approximation Error in Actor-Critic Methods. https://arxiv.org/pdf/1802.09477.pdf, 2018Google Scholar
- Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas, Wierstra, Daan, and Riedmiller, Martin. Deterministic policy gradient algorithms. In ICML, 2014.Google ScholarDigital Library
- Sutton, S. Richard and Andrew G. Barto. Reinforcement learning: An introduction. 2nd edition, The MIT Press, Cambridge, Massachusetts, ISBN 9780262039246. Available online: http://www.incompleteideas.net/book/the-book.html, 2018.Google Scholar
- Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In AAAI, pp. 2094--2100, 2016.Google ScholarDigital Library
- Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., and Ba, J. Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In Advances in Neural Information Processing Systems, pp. 5285--5294, 2017.Google ScholarDigital Library
Index Terms
- Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent
Recommendations
Research on Optimization Strategy Based on Improved DDPG Algorithm in WSN
ICDSP '20: Proceedings of the 2020 4th International Conference on Digital Signal ProcessingDeep Deterministic Policy Gradient (DDPG) is a combination of Actor-Critic and DQN algorithms. It is one of the most classic algorithms in deep-strong chemistry and can be applied to wireless sensor networks (WSN). Aiming at the problems of low training ...
Evaluation of Deep Reinforcement Learning Based Stock Trading
Information RetrievalAbstractStock is one of the most important targets in investment. However, it is challenging to manually design a profitable strategy in the highly dynamic and complex stock market. Modern portfolio management usually employs quantitative trading, which ...
Structural relational inference actor-critic for multi-agent reinforcement learning
Highlights- A novel MARL algorithm involving the interaction relationship between agents.
- ...
AbstractMulti-agent reinforcement learning (MARL) is essential for a wide range of high-dimensional scenarios and complicated tasks with multiple agents. Many attempts have been made for agents with prior domain knowledge and predefined ...
Comments