Skip to main content
Log in

Memory-based soft actor–critic with prioritized experience replay for autonomous navigation

  • Original Research Paper
  • Published:
Intelligent Service Robotics Aims and scope Submit manuscript

Abstract

Due to random sampling and the unpredictability of moving obstacles, it remains challenging for mobile robots to effectively learn navigation policies and accomplish obstacle avoidance safely. Overcoming such challenges can reduce the time cost required for navigation model training and validation, improving the safety and credibility of autonomous navigation in medical service and industrial patrol. This article proposes an improved soft actor–critic model to enhance the autonomous navigation performance of robots. We first introduce a prioritized experience replay method to reduce the randomness of sampling. The performance of the navigation policy can be enhanced by prioritizing the learning of high-value experiences. Moreover, we also design a network with long short-term memory abilities to store historical environmental information. In this way, temporal characteristics of obstacle motion can be obtained to optimize obstacle avoidance policy. Experimental results in simulation and real-world show that the proposed model significantly improves learning speed, success rate, and trajectory smoothness while exhibiting excellent obstacle avoidance performance in dynamic environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Bai X, Yan W, Cao M, Xue D (2019) Distributed multi-vehicle task assignment in a time-invariant drift field with obstacles. IET Control Theory Appl 13(17):2886–2893

    Article  MathSciNet  Google Scholar 

  2. Eqab H, Salamah YB, Ahmad I, Morsy M (2023) Development of source seeking algorithm for mobile robots. Intell Serv Robot 16:1–9

    Article  Google Scholar 

  3. Xiao W, Yuan L, Ran T, He L, Zhang J, Cui J (2023) Multimodal fusion for autonomous navigation via deep reinforcement learning with sparse rewards and hindsight experience replay. Displays 78:102440

    Article  Google Scholar 

  4. Paz-Delgado GJ, Pérez-del-Pulgar CJ, Azkarate M, Kirchner F, García-Cerezo A (2023) Multi-stage warm started optimal motion planning for over-actuated mobile platforms. Intell Serv Robot 16:1–17

    Article  Google Scholar 

  5. Chen Q, Lu Y, Wang Y, Zhu B (2021) From topological map to local cognitive map: a new opportunity of local path planning. Intell Serv Robot 14:285–301

    Article  Google Scholar 

  6. Shi H, Shi L, Xu M, Hwang K-S (2019) End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Trans Ind Inform 16(4):2393–2402

    Article  Google Scholar 

  7. Xiao W, Yuan L, He L, Ran T, Zhang J, Cui J (2022) Multigoal visual navigation with collision avoidance via deep reinforcement learning. IEEE Trans Instrum Meas 71:1–9

    Google Scholar 

  8. Choi J, Lee G, Lee C (2021) Reinforcement learning-based dynamic obstacle avoidance and integration of path planning. Intell Serv Robot 14:663–677

    Article  Google Scholar 

  9. Li B, Wu Y (2020) Path planning for UAV ground target tracking via deep reinforcement learning. IEEE Access 8:29064–29074

    Article  Google Scholar 

  10. Low ES, Ong P, Cheah KC (2019) Solving the optimal path planning of a mobile robot using improved q-learning. Robot Auton Syst 115:143–161

    Article  Google Scholar 

  11. Bai Z, Cai B, ShangGuan W, Chai L (2018) Deep learning based motion planning for autonomous vehicle using spatiotemporal LSTM network. IEEE, pp 1610–1614

  12. Everett M, Chen YF, How JP (2018) Motion planning among dynamic, decision-making agents with deep reinforcement learning. IEEE, pp 3052–3059

  13. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  14. Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint https://arxiv.org/abs/1511.05952

  15. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. PMLR, pp 1928–1937

  16. Chen C, Liu Y, Kreiss S, Alahi A (2019) Crowd-robot interaction: crowd-aware robot navigation with attention-based deep reinforcement learning. IEEE, pp 6015–6022

  17. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  18. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing ATARI with deep reinforcement learning. https://arxiv.org/abs/1312.5602

  19. Li J, Chen Y, Zhao X, Huang J (2022) An improved DQN path planning algorithm. J Supercomput 78(1):616–639

    Article  Google Scholar 

  20. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971

  21. Tai L, Paolo G, Liu M (2017) Virtual-to-real deep reinforcement learning: continuous control of mobile robots for Mapless navigation. IEEE, pp 31–36

  22. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. PMLR, pp 1861–1870

  23. Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8:293–321

    Article  Google Scholar 

  24. Jesus JC, Kich VA, Kolling AH, Grando RB, Cuadros MASL, Gamarra DFT (2021) Soft actor-critic for navigation of mobile robots. J Intell Robot Syst 102(2):31

    Article  Google Scholar 

  25. Li AA, Lu Z, Miao C (2021) Revisiting prioritized experience replay: a value perspective. https://arxiv.org/abs/2102.03261

  26. Zha D, Lai K-H, Zhou K, Hu X (2019) Experience replay optimization. https://arxiv.org/abs/1906.08387

  27. Inoue M, Yamashita T, Nishida T (2019) Robot path planning by LSTM network under changing environment. Springer, pp 317–329

  28. Cui J, Yuan L, He L, Xiao W, Ran T, Zhang J (2023) Multi-input autonomous driving based on deep reinforcement learning with double bias experience replay. IEEE Sens J 23:11253–11261

    Article  Google Scholar 

  29. Wu K, Wang H, Esfahani MA, Yuan S (2021) Learn to navigate autonomously through deep reinforcement learning. IEEE Trans Ind Electron 69(5):5342–5352

    Article  Google Scholar 

  30. Zhou C, Huang B, Hassan H, Fränti P (2023) Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-d robotic motion planning. J Intell Manuf 34(1):151–180

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 52275003, in part by Fundamental Research Funds for the Central Universities buctrc202105, and in part by Natural Science Foundation of Xinjiang Uygur Autonomous Region under Grant 2022D01C673.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wendong Xiao.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, Z., Xiao, W., Yuan, L. et al. Memory-based soft actor–critic with prioritized experience replay for autonomous navigation. Intel Serv Robotics 17, 621–630 (2024). https://doi.org/10.1007/s11370-024-00514-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11370-024-00514-9

Keywords