Abstract
Actor learning and critic learning are two components of the outstanding and mostly used Deep Deterministic Policy Gradient (DDPG) reinforcement learning method. Although such a method plays a significant role in the overall robot’s learning, the performance of the DDPG approach is relatively sensitive and unstable. To further enhance the performance and stability of DDPG, this paper introduces a multi-actor-critic DDPG for reliable actor-critic learning, which will be then used to create a new deep learning framework called AACHER and integrated with Hindsight Experience Replay (HER). The AACHER uses the average value of multiple actors or critics to substitute the single actor or critic in DDPG in order to increase resistance when one actor or critic performs poorly. Using numerous independent actors and critics is expected to gain knowledge from the environment more broadly. The developed AACHER is validated with goal-based environments, including AuboReach, FetchReach-v1, FetchPush-v1, FetchSlide-v1, and FetchPickAndPlace-v1. Various instances of actor/critic combinations are used to experimentally validate the new approach. Results reveal that AACHER outperforms the traditional algorithm (DDPG+HER) in all aspects of the actor/critic number combinations used for evaluation. When combined with FetchPickAndPlace-v1, the performance boost for A20C20 (20 actors and 20 critics) is as high as roughly 3.8 times the success rate in DDPG+HER.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbeel, P., Coates, A., Quigley, M., Ng, A.: An application of reinforcement learning to aerobatic helicopter flight. Advances in neural information processing systems 19 (2006)
Ale, L., King, S.A., Zhang, N., Sattar, A.R., Skandaraniyam, J.: D3pg: Dirichlet ddpg for task partitioning and offloading with constrained hybrid action space in mobile-edge computing. IEEE Internet Things J. 9(19), 19260–19272 (2022)
Andrychowicz, M., et al.: Hindsight experience replay. Advances in neural information processing systems 30 (2017)
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Dhariwal, P., et al.: Openai baselines (2017). https://github.com/openai/baselines
Dong, H., Dong, H., Ding, Z., Zhang, S., Chang: deep reinforcement learning. Springer (2020)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation. arXiv preprint arXiv:1610.006331 (2016)
Hernandez-Mendez, S., Maldonado-Mendez, C., Marin-Hernandez, A., Rios-Figueroa, H.V., Vazquez-Leal, H., Palacios-Hernandez, E.R.: Design and implementation of a robotic arm using ros and moveit! In: 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC). pp. 1–6. IEEE (2017)
Jin, X., Ma, H., Tang, J., Kang, Y.: A self-adaptive vibration reduction method based on deep deterministic policy gradient (ddpg) reinforcement learning algorithm. Appl. Sci. 12(19), 9703 (2022)
Khalid, J., Ramli, M.A., Khan, M.S., Hidayat, T.: Efficient load frequency control of renewable integrated power system: A twin delayed ddpg-based deep reinforcement learning approach. IEEE Access 10, 51561–51574 (2022)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kohl, N., Stone, P.: Policy gradient reinforcement learning for fast quadrupedal locomotion. In: IEEE International Conference on Robotics and Automation, vol. 3, pp. 2619–2624 (2004)
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Advances in neural information processing systems 12 (1999)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3), 293–321 (1992)
Melnik, A., Lach, L., Plappert, M., Korthals, T., Haschke, R., Ritter, H.: Tactile sensing and deep reinforcement learning for in-hand manipulation tasks. In: IROS Workshop on Autonomous Object Manipulation (2019)
Peng, Y., Chen, G., Zhang, M., Pang, S.: A sandpile model for reliable actor-critic reinforcement learning. In: Inter. Joint Conf. on Neural Networks (IJCNN), pp. 4014–4021. IEEE (2017)
Peters, J., Mulling, K., Altun, Y.: Relative entropy policy search. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)
Rus, D., Tolley, M.T.: Design, fabrication and control of soft robots. Nature 521(7553), 467–475 (2015)
Sehgal, A.: Genetic Algorithm as Function Optimizer in Reinforcement Learning and Sensor Odometry. Master’s thesis, University of Nevada, Reno (2019)
Sehgal, A.: Deep Reinforcement Learning for Robotic Tasks: Manipulation and Sensor Odometry. Ph.D. thesis, University of Nevada, Reno (2022)
Sehgal, A., La, H., Louis, S., Nguyen, H.: Deep reinforcement learning using genetic algorithm for parameter optimization. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 596–601. IEEE (2019)
Sehgal, A., Sehgal, M., La, H.M., Bebis, G.: Deep learning hyperparameter optimization for breast mass detection in mammograms. In: International Symposium on Visual Computing. Springer (2022)
Sehgal, A., Singandhupe, A., La, H.M., Tavakkoli, A., Louis, S.J.: Lidar-monocular visual odometry with genetic algorithm for parameter optimization. In: International Symposium on Visual Computing, pp. 358–370. Springer (2019)
Sehgal, A., Ward, N., La, H., Louis, S.: Automatic parameter optimization using genetic algorithm in deep reinforcement learning for robotic manipulation tasks. arXiv preprint arXiv:2204.03656 (2022)
Sehgal, A., Ward, N., La, H.M., Louis, S.: Deep reinforcement learning for robotic manipulation tasks using a genetic algorithm-based function optimizer. Encyclopedia with Semantic Computing and Robotic Intelligence (2023)
Sehgal, A., Ward, N., La, H.M., Papachristos, C., Louis, S.: Ga-drl: genetic algorithm-based function optimizer in deep reinforcement learning for robotic manipulation tasks. arXiv preprint arXiv:2203.00141 (2022)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: International Conference on Artificial Neural Networks, pp. 270–279. Springer (2018)
Theodorou, E., Buchli, J., Schaal, S.: Reinforcement learning of motor skills in high dimensions: A path integral approach. In: IEEE Inter. Conf. on Robotics and Automation, pp. 2397–2403 (2010)
Wu, J., Wang, R., Li, R., Zhang, H., Hu, X.: Multi-critic ddpg method and double experience replay. In: 2018 IEEE Inter. Conf. on Sys., Man, and Cybernetics (SMC), pp. 165–171. IEEE (2018)
Acknowledgement
This work was partially funded by the U.S. National Science Foundation (NSF) under grants NSF-CAREER: 1846513 and NSF-PFI-TT: 1919127. The views, opinions, findings, and conclusions reflected in this publication are solely those of the authors and do not represent the official policy or position of the NSF.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sehgal, A., Sehgal, M., Manh La, H. (2025). Multi-Actor-Critic Deep Reinforcement Learning with Hindsight Experience Replay. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2024. Lecture Notes in Computer Science, vol 15046. Springer, Cham. https://doi.org/10.1007/978-3-031-77392-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-77392-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77391-4
Online ISBN: 978-3-031-77392-1
eBook Packages: Computer ScienceComputer Science (R0)