Multi-Actor-Critic Deep Reinforcement Learning with Hindsight Experience Replay

Sehgal, Adarsh; Sehgal, Muskan; Manh La, Hung

doi:10.1007/978-3-031-77392-1_3

Adarsh Sehgal^16,17,
Muskan Sehgal^16,17 &
Hung Manh La^16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15046))

Included in the following conference series:

International Symposium on Visual Computing

194 Accesses

Abstract

Actor learning and critic learning are two components of the outstanding and mostly used Deep Deterministic Policy Gradient (DDPG) reinforcement learning method. Although such a method plays a significant role in the overall robot’s learning, the performance of the DDPG approach is relatively sensitive and unstable. To further enhance the performance and stability of DDPG, this paper introduces a multi-actor-critic DDPG for reliable actor-critic learning, which will be then used to create a new deep learning framework called AACHER and integrated with Hindsight Experience Replay (HER). The AACHER uses the average value of multiple actors or critics to substitute the single actor or critic in DDPG in order to increase resistance when one actor or critic performs poorly. Using numerous independent actors and critics is expected to gain knowledge from the environment more broadly. The developed AACHER is validated with goal-based environments, including AuboReach, FetchReach-v1, FetchPush-v1, FetchSlide-v1, and FetchPickAndPlace-v1. Various instances of actor/critic combinations are used to experimentally validate the new approach. Results reveal that AACHER outperforms the traditional algorithm (DDPG+HER) in all aspects of the actor/critic number combinations used for evaluation. When combined with FetchPickAndPlace-v1, the performance boost for A20C20 (20 actors and 20 critics) is as high as roughly 3.8 times the success rate in DDPG+HER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abbeel, P., Coates, A., Quigley, M., Ng, A.: An application of reinforcement learning to aerobatic helicopter flight. Advances in neural information processing systems 19 (2006)
Google Scholar
Ale, L., King, S.A., Zhang, N., Sattar, A.R., Skandaraniyam, J.: D3pg: Dirichlet ddpg for task partitioning and offloading with constrained hybrid action space in mobile-edge computing. IEEE Internet Things J. 9(19), 19260–19272 (2022)
Article Google Scholar
Andrychowicz, M., et al.: Hindsight experience replay. Advances in neural information processing systems 30 (2017)
Google Scholar
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Dhariwal, P., et al.: Openai baselines (2017). https://github.com/openai/baselines
Dong, H., Dong, H., Ding, Z., Zhang, S., Chang: deep reinforcement learning. Springer (2020)
Google Scholar
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation. arXiv preprint arXiv:1610.006331 (2016)
Hernandez-Mendez, S., Maldonado-Mendez, C., Marin-Hernandez, A., Rios-Figueroa, H.V., Vazquez-Leal, H., Palacios-Hernandez, E.R.: Design and implementation of a robotic arm using ros and moveit! In: 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC). pp. 1–6. IEEE (2017)
Google Scholar
Jin, X., Ma, H., Tang, J., Kang, Y.: A self-adaptive vibration reduction method based on deep deterministic policy gradient (ddpg) reinforcement learning algorithm. Appl. Sci. 12(19), 9703 (2022)
Article MATH Google Scholar
Khalid, J., Ramli, M.A., Khan, M.S., Hidayat, T.: Efficient load frequency control of renewable integrated power system: A twin delayed ddpg-based deep reinforcement learning approach. IEEE Access 10, 51561–51574 (2022)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kohl, N., Stone, P.: Policy gradient reinforcement learning for fast quadrupedal locomotion. In: IEEE International Conference on Robotics and Automation, vol. 3, pp. 2619–2624 (2004)
Google Scholar
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Advances in neural information processing systems 12 (1999)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3), 293–321 (1992)
Article MathSciNet MATH Google Scholar
Melnik, A., Lach, L., Plappert, M., Korthals, T., Haschke, R., Ritter, H.: Tactile sensing and deep reinforcement learning for in-hand manipulation tasks. In: IROS Workshop on Autonomous Object Manipulation (2019)
Google Scholar
Peng, Y., Chen, G., Zhang, M., Pang, S.: A sandpile model for reliable actor-critic reinforcement learning. In: Inter. Joint Conf. on Neural Networks (IJCNN), pp. 4014–4021. IEEE (2017)
Google Scholar
Peters, J., Mulling, K., Altun, Y.: Relative entropy policy search. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Google Scholar
Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)
Article MATH Google Scholar
Rus, D., Tolley, M.T.: Design, fabrication and control of soft robots. Nature 521(7553), 467–475 (2015)
Article MATH Google Scholar
Sehgal, A.: Genetic Algorithm as Function Optimizer in Reinforcement Learning and Sensor Odometry. Master’s thesis, University of Nevada, Reno (2019)
Google Scholar
Sehgal, A.: Deep Reinforcement Learning for Robotic Tasks: Manipulation and Sensor Odometry. Ph.D. thesis, University of Nevada, Reno (2022)
Google Scholar
Sehgal, A., La, H., Louis, S., Nguyen, H.: Deep reinforcement learning using genetic algorithm for parameter optimization. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 596–601. IEEE (2019)
Google Scholar
Sehgal, A., Sehgal, M., La, H.M., Bebis, G.: Deep learning hyperparameter optimization for breast mass detection in mammograms. In: International Symposium on Visual Computing. Springer (2022)
Google Scholar
Sehgal, A., Singandhupe, A., La, H.M., Tavakkoli, A., Louis, S.J.: Lidar-monocular visual odometry with genetic algorithm for parameter optimization. In: International Symposium on Visual Computing, pp. 358–370. Springer (2019)
Google Scholar
Sehgal, A., Ward, N., La, H., Louis, S.: Automatic parameter optimization using genetic algorithm in deep reinforcement learning for robotic manipulation tasks. arXiv preprint arXiv:2204.03656 (2022)
Sehgal, A., Ward, N., La, H.M., Louis, S.: Deep reinforcement learning for robotic manipulation tasks using a genetic algorithm-based function optimizer. Encyclopedia with Semantic Computing and Robotic Intelligence (2023)
Google Scholar
Sehgal, A., Ward, N., La, H.M., Papachristos, C., Louis, S.: Ga-drl: genetic algorithm-based function optimizer in deep reinforcement learning for robotic manipulation tasks. arXiv preprint arXiv:2203.00141 (2022)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)
Google Scholar
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
MATH Google Scholar
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: International Conference on Artificial Neural Networks, pp. 270–279. Springer (2018)
Google Scholar
Theodorou, E., Buchli, J., Schaal, S.: Reinforcement learning of motor skills in high dimensions: A path integral approach. In: IEEE Inter. Conf. on Robotics and Automation, pp. 2397–2403 (2010)
Google Scholar
Wu, J., Wang, R., Li, R., Zhang, H., Hu, X.: Multi-critic ddpg method and double experience replay. In: 2018 IEEE Inter. Conf. on Sys., Man, and Cybernetics (SMC), pp. 165–171. IEEE (2018)
Google Scholar

Download references

Acknowledgement

This work was partially funded by the U.S. National Science Foundation (NSF) under grants NSF-CAREER: 1846513 and NSF-PFI-TT: 1919127. The views, opinions, findings, and conclusions reflected in this publication are solely those of the authors and do not represent the official policy or position of the NSF.

Author information

Authors and Affiliations

Advanced Robotics and Automation (ARA) Laboratory, Reno, USA
Adarsh Sehgal, Muskan Sehgal & Hung Manh La
Department of Computer Science and Engineering, University of Nevada, Reno, 89557, NV, USA
Adarsh Sehgal, Muskan Sehgal & Hung Manh La

Authors

Adarsh Sehgal
View author publications
You can also search for this author in PubMed Google Scholar
Muskan Sehgal
View author publications
You can also search for this author in PubMed Google Scholar
Hung Manh La
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Adarsh Sehgal or Hung Manh La .

Editor information

Editors and Affiliations

University of Nevada Reno, Reno, NV, USA
George Bebis
Johns Hopkins University, Baltimore, MD, USA
Vishal Patel
Chinese University of Hong Kong, Shatin, Hong Kong
Jinwei Gu
University of California, Davis, CA, USA
Julian Panetta
George Mason University, Fairfax, VA, USA
Yotam Gingold
University of Georgia, Athens, GA, USA
Kyle Johnsen
Colorado State University, Fort Collins, CO, USA
Mohammed Safayet Arefin
Indian Institute of Technology, Kanpur, Uttar Pradesh, India
Soumya Dutta
Los Alamos National Lab., Los Alamos, NM, USA
Ayan Biswas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sehgal, A., Sehgal, M., Manh La, H. (2025). Multi-Actor-Critic Deep Reinforcement Learning with Hindsight Experience Replay. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2024. Lecture Notes in Computer Science, vol 15046. Springer, Cham. https://doi.org/10.1007/978-3-031-77392-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-77392-1_3
Published: 22 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77391-4
Online ISBN: 978-3-031-77392-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-Actor-Critic Deep Reinforcement Learning with Hindsight Experience Replay