Memory-based soft actor–critic with prioritized experience replay for autonomous navigation

Wei, Zhigang; Xiao, Wendong; Yuan, Liang; Ran, Teng; Cui, Jianping; Lv, Kai

doi:10.1007/s11370-024-00514-9

Memory-based soft actor–critic with prioritized experience replay for autonomous navigation

Original Research Paper
Published: 29 February 2024

Volume 17, pages 621–630, (2024)
Cite this article

Intelligent Service Robotics Aims and scope Submit manuscript

Zhigang Wei ORCID: orcid.org/0009-0000-4043-8478¹,
Wendong Xiao¹,
Liang Yuan^1,2,
Teng Ran¹,
Jianping Cui¹ &
…
Kai Lv¹

618 Accesses
2 Citations
Explore all metrics

Abstract

Due to random sampling and the unpredictability of moving obstacles, it remains challenging for mobile robots to effectively learn navigation policies and accomplish obstacle avoidance safely. Overcoming such challenges can reduce the time cost required for navigation model training and validation, improving the safety and credibility of autonomous navigation in medical service and industrial patrol. This article proposes an improved soft actor–critic model to enhance the autonomous navigation performance of robots. We first introduce a prioritized experience replay method to reduce the randomness of sampling. The performance of the navigation policy can be enhanced by prioritizing the learning of high-value experiences. Moreover, we also design a network with long short-term memory abilities to store historical environmental information. In this way, temporal characteristics of obstacle motion can be obtained to optimize obstacle avoidance policy. Experimental results in simulation and real-world show that the proposed model significantly improves learning speed, success rate, and trajectory smoothness while exhibiting excellent obstacle avoidance performance in dynamic environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SAC-PER: A Navigation Method Based on Deep Reinforcement Learning Under Uncertain Environments

Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-D robotic motion planning

Article Open access 07 August 2022

SAC-alpha: dynamic entropy adjustment for enhanced autonomous exploration in unknown environments

Article 23 March 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bai X, Yan W, Cao M, Xue D (2019) Distributed multi-vehicle task assignment in a time-invariant drift field with obstacles. IET Control Theory Appl 13(17):2886–2893
Article MathSciNet Google Scholar
Eqab H, Salamah YB, Ahmad I, Morsy M (2023) Development of source seeking algorithm for mobile robots. Intell Serv Robot 16:1–9
Article Google Scholar
Xiao W, Yuan L, Ran T, He L, Zhang J, Cui J (2023) Multimodal fusion for autonomous navigation via deep reinforcement learning with sparse rewards and hindsight experience replay. Displays 78:102440
Article Google Scholar
Paz-Delgado GJ, Pérez-del-Pulgar CJ, Azkarate M, Kirchner F, García-Cerezo A (2023) Multi-stage warm started optimal motion planning for over-actuated mobile platforms. Intell Serv Robot 16:1–17
Article Google Scholar
Chen Q, Lu Y, Wang Y, Zhu B (2021) From topological map to local cognitive map: a new opportunity of local path planning. Intell Serv Robot 14:285–301
Article Google Scholar
Shi H, Shi L, Xu M, Hwang K-S (2019) End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Trans Ind Inform 16(4):2393–2402
Article Google Scholar
Xiao W, Yuan L, He L, Ran T, Zhang J, Cui J (2022) Multigoal visual navigation with collision avoidance via deep reinforcement learning. IEEE Trans Instrum Meas 71:1–9
Google Scholar
Choi J, Lee G, Lee C (2021) Reinforcement learning-based dynamic obstacle avoidance and integration of path planning. Intell Serv Robot 14:663–677
Article Google Scholar
Li B, Wu Y (2020) Path planning for UAV ground target tracking via deep reinforcement learning. IEEE Access 8:29064–29074
Article Google Scholar
Low ES, Ong P, Cheah KC (2019) Solving the optimal path planning of a mobile robot using improved q-learning. Robot Auton Syst 115:143–161
Article Google Scholar
Bai Z, Cai B, ShangGuan W, Chai L (2018) Deep learning based motion planning for autonomous vehicle using spatiotemporal LSTM network. IEEE, pp 1610–1614
Everett M, Chen YF, How JP (2018) Motion planning among dynamic, decision-making agents with deep reinforcement learning. IEEE, pp 3052–3059
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint https://arxiv.org/abs/1511.05952
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. PMLR, pp 1928–1937
Chen C, Liu Y, Kreiss S, Alahi A (2019) Crowd-robot interaction: crowd-aware robot navigation with attention-based deep reinforcement learning. IEEE, pp 6015–6022
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing ATARI with deep reinforcement learning. https://arxiv.org/abs/1312.5602
Li J, Chen Y, Zhao X, Huang J (2022) An improved DQN path planning algorithm. J Supercomput 78(1):616–639
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971
Tai L, Paolo G, Liu M (2017) Virtual-to-real deep reinforcement learning: continuous control of mobile robots for Mapless navigation. IEEE, pp 31–36
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. PMLR, pp 1861–1870
Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8:293–321
Article Google Scholar
Jesus JC, Kich VA, Kolling AH, Grando RB, Cuadros MASL, Gamarra DFT (2021) Soft actor-critic for navigation of mobile robots. J Intell Robot Syst 102(2):31
Article Google Scholar
Li AA, Lu Z, Miao C (2021) Revisiting prioritized experience replay: a value perspective. https://arxiv.org/abs/2102.03261
Zha D, Lai K-H, Zhou K, Hu X (2019) Experience replay optimization. https://arxiv.org/abs/1906.08387
Inoue M, Yamashita T, Nishida T (2019) Robot path planning by LSTM network under changing environment. Springer, pp 317–329
Cui J, Yuan L, He L, Xiao W, Ran T, Zhang J (2023) Multi-input autonomous driving based on deep reinforcement learning with double bias experience replay. IEEE Sens J 23:11253–11261
Article Google Scholar
Wu K, Wang H, Esfahani MA, Yuan S (2021) Learn to navigate autonomously through deep reinforcement learning. IEEE Trans Ind Electron 69(5):5342–5352
Article Google Scholar
Zhou C, Huang B, Hassan H, Fränti P (2023) Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-d robotic motion planning. J Intell Manuf 34(1):151–180
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 52275003, in part by Fundamental Research Funds for the Central Universities buctrc202105, and in part by Natural Science Foundation of Xinjiang Uygur Autonomous Region under Grant 2022D01C673.

Author information

Authors and Affiliations

School of Mechanical Engineering, Xinjiang University, Urumqi, 830017, People’s Republic of China
Zhigang Wei, Wendong Xiao, Liang Yuan, Teng Ran, Jianping Cui & Kai Lv
Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing, 100029, People’s Republic of China
Liang Yuan

Authors

Zhigang Wei
View author publications
You can also search for this author inPubMed Google Scholar
Wendong Xiao
View author publications
You can also search for this author inPubMed Google Scholar
Liang Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Teng Ran
View author publications
You can also search for this author inPubMed Google Scholar
Jianping Cui
View author publications
You can also search for this author inPubMed Google Scholar
Kai Lv
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Wendong Xiao.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wei, Z., Xiao, W., Yuan, L. et al. Memory-based soft actor–critic with prioritized experience replay for autonomous navigation. Intel Serv Robotics 17, 621–630 (2024). https://doi.org/10.1007/s11370-024-00514-9

Download citation

Received: 25 July 2023
Accepted: 03 January 2024
Published: 29 February 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11370-024-00514-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Memory-based soft actor–critic with prioritized experience replay for autonomous navigation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SAC-PER: A Navigation Method Based on Deep Reinforcement Learning Under Uncertain Environments

Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-D robotic motion planning

SAC-alpha: dynamic entropy adjustment for enhanced autonomous exploration in unknown environments

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now