Skip to main content

Multi-Actor-Critic Deep Reinforcement Learning with Hindsight Experience Replay

  • Conference paper
  • First Online:
Advances in Visual Computing (ISVC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15046))

Included in the following conference series:

  • 194 Accesses

Abstract

Actor learning and critic learning are two components of the outstanding and mostly used Deep Deterministic Policy Gradient (DDPG) reinforcement learning method. Although such a method plays a significant role in the overall robot’s learning, the performance of the DDPG approach is relatively sensitive and unstable. To further enhance the performance and stability of DDPG, this paper introduces a multi-actor-critic DDPG for reliable actor-critic learning, which will be then used to create a new deep learning framework called AACHER and integrated with Hindsight Experience Replay (HER). The AACHER uses the average value of multiple actors or critics to substitute the single actor or critic in DDPG in order to increase resistance when one actor or critic performs poorly. Using numerous independent actors and critics is expected to gain knowledge from the environment more broadly. The developed AACHER is validated with goal-based environments, including AuboReach, FetchReach-v1, FetchPush-v1, FetchSlide-v1, and FetchPickAndPlace-v1. Various instances of actor/critic combinations are used to experimentally validate the new approach. Results reveal that AACHER outperforms the traditional algorithm (DDPG+HER) in all aspects of the actor/critic number combinations used for evaluation. When combined with FetchPickAndPlace-v1, the performance boost for A20C20 (20 actors and 20 critics) is as high as roughly 3.8 times the success rate in DDPG+HER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbeel, P., Coates, A., Quigley, M., Ng, A.: An application of reinforcement learning to aerobatic helicopter flight. Advances in neural information processing systems 19 (2006)

    Google Scholar 

  2. Ale, L., King, S.A., Zhang, N., Sattar, A.R., Skandaraniyam, J.: D3pg: Dirichlet ddpg for task partitioning and offloading with constrained hybrid action space in mobile-edge computing. IEEE Internet Things J. 9(19), 19260–19272 (2022)

    Article  Google Scholar 

  3. Andrychowicz, M., et al.: Hindsight experience replay. Advances in neural information processing systems 30 (2017)

    Google Scholar 

  4. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  5. Dhariwal, P., et al.: Openai baselines (2017). https://github.com/openai/baselines

  6. Dong, H., Dong, H., Ding, Z., Zhang, S., Chang: deep reinforcement learning. Springer (2020)

    Google Scholar 

  7. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation. arXiv preprint arXiv:1610.006331 (2016)

  8. Hernandez-Mendez, S., Maldonado-Mendez, C., Marin-Hernandez, A., Rios-Figueroa, H.V., Vazquez-Leal, H., Palacios-Hernandez, E.R.: Design and implementation of a robotic arm using ros and moveit! In: 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC). pp. 1–6. IEEE (2017)

    Google Scholar 

  9. Jin, X., Ma, H., Tang, J., Kang, Y.: A self-adaptive vibration reduction method based on deep deterministic policy gradient (ddpg) reinforcement learning algorithm. Appl. Sci. 12(19), 9703 (2022)

    Article  MATH  Google Scholar 

  10. Khalid, J., Ramli, M.A., Khan, M.S., Hidayat, T.: Efficient load frequency control of renewable integrated power system: A twin delayed ddpg-based deep reinforcement learning approach. IEEE Access 10, 51561–51574 (2022)

    Article  Google Scholar 

  11. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  12. Kohl, N., Stone, P.: Policy gradient reinforcement learning for fast quadrupedal locomotion. In: IEEE International Conference on Robotics and Automation, vol. 3, pp. 2619–2624 (2004)

    Google Scholar 

  13. Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Advances in neural information processing systems 12 (1999)

    Google Scholar 

  14. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  15. Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3), 293–321 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  16. Melnik, A., Lach, L., Plappert, M., Korthals, T., Haschke, R., Ritter, H.: Tactile sensing and deep reinforcement learning for in-hand manipulation tasks. In: IROS Workshop on Autonomous Object Manipulation (2019)

    Google Scholar 

  17. Peng, Y., Chen, G., Zhang, M., Pang, S.: A sandpile model for reliable actor-critic reinforcement learning. In: Inter. Joint Conf. on Neural Networks (IJCNN), pp. 4014–4021. IEEE (2017)

    Google Scholar 

  18. Peters, J., Mulling, K., Altun, Y.: Relative entropy policy search. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)

    Google Scholar 

  19. Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)

    Article  MATH  Google Scholar 

  20. Rus, D., Tolley, M.T.: Design, fabrication and control of soft robots. Nature 521(7553), 467–475 (2015)

    Article  MATH  Google Scholar 

  21. Sehgal, A.: Genetic Algorithm as Function Optimizer in Reinforcement Learning and Sensor Odometry. Master’s thesis, University of Nevada, Reno (2019)

    Google Scholar 

  22. Sehgal, A.: Deep Reinforcement Learning for Robotic Tasks: Manipulation and Sensor Odometry. Ph.D. thesis, University of Nevada, Reno (2022)

    Google Scholar 

  23. Sehgal, A., La, H., Louis, S., Nguyen, H.: Deep reinforcement learning using genetic algorithm for parameter optimization. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 596–601. IEEE (2019)

    Google Scholar 

  24. Sehgal, A., Sehgal, M., La, H.M., Bebis, G.: Deep learning hyperparameter optimization for breast mass detection in mammograms. In: International Symposium on Visual Computing. Springer (2022)

    Google Scholar 

  25. Sehgal, A., Singandhupe, A., La, H.M., Tavakkoli, A., Louis, S.J.: Lidar-monocular visual odometry with genetic algorithm for parameter optimization. In: International Symposium on Visual Computing, pp. 358–370. Springer (2019)

    Google Scholar 

  26. Sehgal, A., Ward, N., La, H., Louis, S.: Automatic parameter optimization using genetic algorithm in deep reinforcement learning for robotic manipulation tasks. arXiv preprint arXiv:2204.03656 (2022)

  27. Sehgal, A., Ward, N., La, H.M., Louis, S.: Deep reinforcement learning for robotic manipulation tasks using a genetic algorithm-based function optimizer. Encyclopedia with Semantic Computing and Robotic Intelligence (2023)

    Google Scholar 

  28. Sehgal, A., Ward, N., La, H.M., Papachristos, C., Louis, S.: Ga-drl: genetic algorithm-based function optimizer in deep reinforcement learning for robotic manipulation tasks. arXiv preprint arXiv:2203.00141 (2022)

  29. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)

    Google Scholar 

  30. Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  31. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: International Conference on Artificial Neural Networks, pp. 270–279. Springer (2018)

    Google Scholar 

  32. Theodorou, E., Buchli, J., Schaal, S.: Reinforcement learning of motor skills in high dimensions: A path integral approach. In: IEEE Inter. Conf. on Robotics and Automation, pp. 2397–2403 (2010)

    Google Scholar 

  33. Wu, J., Wang, R., Li, R., Zhang, H., Hu, X.: Multi-critic ddpg method and double experience replay. In: 2018 IEEE Inter. Conf. on Sys., Man, and Cybernetics (SMC), pp. 165–171. IEEE (2018)

    Google Scholar 

Download references

Acknowledgement

This work was partially funded by the U.S. National Science Foundation (NSF) under grants NSF-CAREER: 1846513 and NSF-PFI-TT: 1919127. The views, opinions, findings, and conclusions reflected in this publication are solely those of the authors and do not represent the official policy or position of the NSF.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Adarsh Sehgal or Hung Manh La .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sehgal, A., Sehgal, M., Manh La, H. (2025). Multi-Actor-Critic Deep Reinforcement Learning with Hindsight Experience Replay. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2024. Lecture Notes in Computer Science, vol 15046. Springer, Cham. https://doi.org/10.1007/978-3-031-77392-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-77392-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-77391-4

  • Online ISBN: 978-3-031-77392-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics