Abstract
Due to random sampling and the unpredictability of moving obstacles, it remains challenging for mobile robots to effectively learn navigation policies and accomplish obstacle avoidance safely. Overcoming such challenges can reduce the time cost required for navigation model training and validation, improving the safety and credibility of autonomous navigation in medical service and industrial patrol. This article proposes an improved soft actor–critic model to enhance the autonomous navigation performance of robots. We first introduce a prioritized experience replay method to reduce the randomness of sampling. The performance of the navigation policy can be enhanced by prioritizing the learning of high-value experiences. Moreover, we also design a network with long short-term memory abilities to store historical environmental information. In this way, temporal characteristics of obstacle motion can be obtained to optimize obstacle avoidance policy. Experimental results in simulation and real-world show that the proposed model significantly improves learning speed, success rate, and trajectory smoothness while exhibiting excellent obstacle avoidance performance in dynamic environments.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bai X, Yan W, Cao M, Xue D (2019) Distributed multi-vehicle task assignment in a time-invariant drift field with obstacles. IET Control Theory Appl 13(17):2886–2893
Eqab H, Salamah YB, Ahmad I, Morsy M (2023) Development of source seeking algorithm for mobile robots. Intell Serv Robot 16:1–9
Xiao W, Yuan L, Ran T, He L, Zhang J, Cui J (2023) Multimodal fusion for autonomous navigation via deep reinforcement learning with sparse rewards and hindsight experience replay. Displays 78:102440
Paz-Delgado GJ, Pérez-del-Pulgar CJ, Azkarate M, Kirchner F, García-Cerezo A (2023) Multi-stage warm started optimal motion planning for over-actuated mobile platforms. Intell Serv Robot 16:1–17
Chen Q, Lu Y, Wang Y, Zhu B (2021) From topological map to local cognitive map: a new opportunity of local path planning. Intell Serv Robot 14:285–301
Shi H, Shi L, Xu M, Hwang K-S (2019) End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Trans Ind Inform 16(4):2393–2402
Xiao W, Yuan L, He L, Ran T, Zhang J, Cui J (2022) Multigoal visual navigation with collision avoidance via deep reinforcement learning. IEEE Trans Instrum Meas 71:1–9
Choi J, Lee G, Lee C (2021) Reinforcement learning-based dynamic obstacle avoidance and integration of path planning. Intell Serv Robot 14:663–677
Li B, Wu Y (2020) Path planning for UAV ground target tracking via deep reinforcement learning. IEEE Access 8:29064–29074
Low ES, Ong P, Cheah KC (2019) Solving the optimal path planning of a mobile robot using improved q-learning. Robot Auton Syst 115:143–161
Bai Z, Cai B, ShangGuan W, Chai L (2018) Deep learning based motion planning for autonomous vehicle using spatiotemporal LSTM network. IEEE, pp 1610–1614
Everett M, Chen YF, How JP (2018) Motion planning among dynamic, decision-making agents with deep reinforcement learning. IEEE, pp 3052–3059
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint https://arxiv.org/abs/1511.05952
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. PMLR, pp 1928–1937
Chen C, Liu Y, Kreiss S, Alahi A (2019) Crowd-robot interaction: crowd-aware robot navigation with attention-based deep reinforcement learning. IEEE, pp 6015–6022
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing ATARI with deep reinforcement learning. https://arxiv.org/abs/1312.5602
Li J, Chen Y, Zhao X, Huang J (2022) An improved DQN path planning algorithm. J Supercomput 78(1):616–639
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971
Tai L, Paolo G, Liu M (2017) Virtual-to-real deep reinforcement learning: continuous control of mobile robots for Mapless navigation. IEEE, pp 31–36
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. PMLR, pp 1861–1870
Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8:293–321
Jesus JC, Kich VA, Kolling AH, Grando RB, Cuadros MASL, Gamarra DFT (2021) Soft actor-critic for navigation of mobile robots. J Intell Robot Syst 102(2):31
Li AA, Lu Z, Miao C (2021) Revisiting prioritized experience replay: a value perspective. https://arxiv.org/abs/2102.03261
Zha D, Lai K-H, Zhou K, Hu X (2019) Experience replay optimization. https://arxiv.org/abs/1906.08387
Inoue M, Yamashita T, Nishida T (2019) Robot path planning by LSTM network under changing environment. Springer, pp 317–329
Cui J, Yuan L, He L, Xiao W, Ran T, Zhang J (2023) Multi-input autonomous driving based on deep reinforcement learning with double bias experience replay. IEEE Sens J 23:11253–11261
Wu K, Wang H, Esfahani MA, Yuan S (2021) Learn to navigate autonomously through deep reinforcement learning. IEEE Trans Ind Electron 69(5):5342–5352
Zhou C, Huang B, Hassan H, Fränti P (2023) Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-d robotic motion planning. J Intell Manuf 34(1):151–180
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 52275003, in part by Fundamental Research Funds for the Central Universities buctrc202105, and in part by Natural Science Foundation of Xinjiang Uygur Autonomous Region under Grant 2022D01C673.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wei, Z., Xiao, W., Yuan, L. et al. Memory-based soft actor–critic with prioritized experience replay for autonomous navigation. Intel Serv Robotics 17, 621–630 (2024). https://doi.org/10.1007/s11370-024-00514-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-024-00514-9