State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots

Ma, Xin; Xu, Ya; Sun, Guo-qiang; Deng, Li-xia; Li, Yi-bin

doi:10.1631/jzus.C1200226

State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots

Published: 10 March 2013

Volume 14, pages 167–178, (2013)
Cite this article

Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Xin Ma¹,
Ya Xu¹,
Guo-qiang Sun¹,
Li-xia Deng¹ &
…
Yi-bin Li¹

327 Accesses
12 Citations
Explore all metrics

Abstract

This paper deals with a new approach based on Q-learning for solving the problem of mobile robot path planning in complex unknown static environments. As a computational approach to learning through interaction with the environment, reinforcement learning algorithms have been widely used for intelligent robot control, especially in the field of autonomous mobile robots. However, the learning process is slow and cumbersome. For practical applications, rapid rates of convergence are required. Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning, a state-chain sequential feedback Q-learning algorithm is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments. The state chain is built during the searching process. After one action is chosen and the reward is received, the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning. With the increasing number of Q-values updated after one action, the number of actual steps for convergence decreases and thus, the learning time decreases, where a step is a state transition. Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments. The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time, compared with the one-step Q-learning algorithm and the Q(λ)-learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Conor F. Hayes, Roxana Rădulescu, … Diederik M. Roijers

Path Planning and Trajectory Planning Algorithms: A General Overview

A review of motion planning algorithms for intelligent robots

Article Open access 25 November 2021

Chengmin Zhou, Bingding Huang & Pasi Fränti

References

Agirrebeitia, J., Aviles, R., de Bustos, I.F., Ajuria, G., 2005. A new APF strategy for path planning in environments with obstacles. Mech. Mach. Theory, 40(6):645–658. [doi:10.1016/j.mechmachtheory.2005.01.006]
Article MathSciNet MATH Google Scholar
Alexopoulos, C., Griffin, P.M., 1992. Path planning for a mobile robot. IEEE Trans. Syst. Man Cybern., 22(2): 318–322. [doi:10.1109/21.148404]
Article Google Scholar
Al-Taharwa, I., Sheta, A., Al-Weshah, M., 2008. A mobile robot path planning using genetic algorithm in static environment. J. Comput. Sci., 4(4):341–344.
Article Google Scholar
Barraquand, J., Langlois, B., Latombe, J.C., 1992. Numerical potential field techniques for robot path planning. IEEE Trans. Syst. Man Cybern., 22(2):224–241. [doi:10.1109/21.148426]
Article MathSciNet Google Scholar
Cao, Q., Huang, Y., Zhou, J., 2006. An Evolutionary Artificial Potential Field Algorithm for Dynamic Path Planning of Mobile Robot. Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.3331–3336. [doi:10.1109/IROS.2006.282508]
Google Scholar
Castillo, O., Trujillo, L., Melin, P., 2007. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots. Soft Comput., 11(3):269–279. [doi:10.1007/s00500-006-0068-4]
Article Google Scholar
Dearden, R., Friedman, N., Russell, S., 1998. Bayesian Q-Learning. Proc. National Conf. on Artificial Intelligence, p.761–768.
Google Scholar
Dolgov, D., Thrun, S., Montemerlo, M., Diebel, J., 2010. Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res., 29(5):485–501. [doi:10.1177/0278364909359210]
Article Google Scholar
Framling, K., 2007. Guiding exploration by pre-existing knowledge without modifying reward. Neur. Networks, 20(6):736–747. [doi:10.1016/j.neunet.2007.02.001]
Article Google Scholar
Garcia, M.A., Montiel, O., Castillo, O., Sepulveda, R., Melin, P., 2009. Path planning for autonomous mobile robot navigation with ant colony optimization and fuzzy cost function evaluation. Appl. Soft Comput., 9(3):1102–1110. [doi:10.1016/j.asoc.2009.02.014]
Article Google Scholar
Ge, S.S., Cui, Y.J., 2002. Dynamic motion planning for mobile robots using potential field method. Auton. Robots, 13(3):207–222. [doi:10.1023/A:1020564024509]
Article MATH Google Scholar
Ghatee, M., Mohades, A., 2009. Motion planning in order to optimize the length and clearance applying a hopfield neural network. Expert Syst. Appl., 36(3):4688–4695. [doi:10.1016/j.eswa.2008.06.040]
Article Google Scholar
Gong, D., Lu, L., Li, M., 2009. Robot Path Planning in Uncertain Environments Based on Particle Swarm Optimization. Proc. IEEE Congress on Evolutionary Computation, p.2127–2134. [doi:10.1109/CEC.2009.4983204]
Google Scholar
Guo, M., Liu, Y., Malec, J., 2004. A new Q-learning algorithm based on the metropolis criterion. IEEE Trans. Syst. Man Cybern. B, 34(5):2140–2143. [doi:10.1109/TSMCB.2004.832154]
Article Google Scholar
Hachour, O., 2009. The proposed fuzzy logic navigation approach of autonomous mobile robots in unknown environments. Int. J. Math. Models Methods Appl. Sci., 3(3):204–218.
Google Scholar
Hamagami, T., Hirata, H., 2003. An Adjustment Method of the Number of States of Q-Learning Segmenting State Space Adaptively. Proc. IEEE Int. Conf. on Systems, Man and Cybernetics, 4:3062–3067. [doi:10.1109/ICSMC.2003.1244360]
Google Scholar
Hart, P.E., Nilsson, N.J., Raphael, B., 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern., 4(2):100–107. [doi:10.1109/TSSC.1968.300136]
Article Google Scholar
Hwang, H.J., Viet, H.H., Chung, T., 2011. Q(λ) based vector direction for path planning problem of autonomous mobile robots. Lect. Notes Electr. Eng., 107(Part 4): 433–442. [doi:10.1007/978-94-007-2598-0_46]
Article Google Scholar
Jaradat, M.A.K., Al-Rousan, M., Quadan, L., 2011. Reinforcement based mobile robot navigation in dynamic environment. Robot. Comput.-Integr. Manuf., 27(1):135–149. [doi:10.1016/j.rcim.2010.06.019]
Article Google Scholar
Jin, Z., Liu, W., Jin, J., 2009. Partitioning the State Space by Critical States. Proc. 4th Int. Conf. on Bio-Inspired Computing, p.1–7. [doi:10.1109/BICTA.2009.5338123]
Google Scholar
Kala, R., Shukla, A., Tiwari, R., 2010. Fusion of probabilistic A* algorithm and fuzzy inference system for robotic path planning. Artif. Intell. Rev., 33(4):307–327. [doi:10.1007/s10462-010-9157-y]
Article Google Scholar
Koenig, S., Simmons, R.G., 1996. The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Mach. Learn., 22(1–3):227–250. [doi:10.1007/BF00114729]
MATH Google Scholar
Lampton, A., Valasek, J., 2009. Multiresolution State-Space Discretization Method for Q-Learning. Proc. American Control Conf., p.1646–1651. [doi:10.1109/ACC.2009.5160474]
Google Scholar
Latombe, J.C., 1991. Robot Motion Planning. Kluwer Academic Publishers. [doi:10.1007/978-1-4615-4022-9]
Book Google Scholar
Oh, C.H., Nakashima, T., Ishibuchi, H., 1998. Initialization of Q-Values by Fuzzy Rules for Accelerating QLearning. Proc. IEEE World Congress on Computational Intelligence and IEEE Int. Joint Conf. on Neural Networks, 3:2051–2056. [doi:10.1109/IJCNN.1998.687175]
Google Scholar
Peng, J., Williams, R.J., 1996. Incremental multi-step Qlearning. Mach. Learn., 22(1–3):283–290. [doi:10.1023/A:1018076709321]
Google Scholar
Poty, A., Melchior, P., Oustaloup, A., 2004. Dynamic Path Planning for Mobile Robots Using Fractional Potential Field. Proc. 1st Int. Symp. on Control, Communications and Signal Processing, p.557–561. [doi:10.1109/ISCCSP.2004.1296443]
Google Scholar
Saab, Y., VanPutte, M., 1999. Shortest path planning on topographical maps. IEEE Trans. Syst. Man Cybern. A, 29(1):139–150. [doi:10.1109/3468.736370]
Article Google Scholar
Senda, K., Mano, S., Fujii, S., 2003. A Reinforcement Learning Accelerated by State Space Reduction. SICE Annual Conf., 2:1992–1997.
Google Scholar
Song, Y., Li, Y., Li, C., Zhang, G., 2012. An efficient initialization approach of Q-learning for mobile robots. Int. J. Control Autom. Syst., 10(1):166–172. [doi:10.1007/s12555-012-0119-9]
Article MathSciNet Google Scholar
Still, S., Precup, D., 2012. An information-theoretic approach to curiosity-driven reinforcement learning. Theory Biosci., 131(3):139–148. [doi:10.1007/s12064-011-0142-z]
Article Google Scholar
Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, Cambriage, MA.
Google Scholar
Tsai, C.C., Huang, H.C., Chan, C.K., 2011. Parallel elite genetic algorithm and its application to global path planning for autonomous robot navigation. IEEE Trans. Ind. Electron., 58(10):4813–4821. [doi:10.1109/TIE.2011.2109332]
Article Google Scholar
Wang, Z., Zhu, X., Han, Q., 2011. Mobile robot path planning based on parameter optimization ant colony algorithm. Proc. Eng., 15:2738–2741. [doi:10.1016/j.proeng.2011.08.515]
Article Google Scholar
Watkins, C.J.C.H., 1989. Learning from Delayed Rewards. University of Cambridge, Cambridge, UK.
Google Scholar
Watkins, C.J.C.H., Dayan, P., 1992. Q-learning. Mach. Learn., 8(3–4):279–292.
MATH Google Scholar
Wiewiora, E., 2003. Potential-based shaping and Q-value initialization are equivalent. Artif. Intell. Res., 19:205–208.
MathSciNet MATH Google Scholar
Yang, S.X., Meng, M., 2000. An efficient neural network approach to dynamic robot motion planning. Neur. Networks, 13(2):143–148. [doi:10.1016/S0893-6080(99)00103-3]
Article Google Scholar
Yang, X., Moallem, M., Patel, R.V., 2005. A layered goal-oriented fuzzy motion planning strategy for mobile robot navigation. IEEE Trans. Syst. Man Cybern. B, 35(6):1214–1224. [doi:10.1109/TSMCB.2005.850177]
Article Google Scholar
Yun, S.C., Ganapathy, V., Chong, L.O., 2010. Improved Genetic Algorithms Based Optimum Path Planning for Mobile Robot. Proc. 11th Int. Conf. on Control, Automation, Robotics and Vision, p.1565–1570. [doi:10.1109/ICARCV.2010.5707781]
Google Scholar

Download references

Author information

Authors and Affiliations

School of Control Science and Engineering, Shandong University, Jinan, 250061, China
Xin Ma, Ya Xu, Guo-qiang Sun, Li-xia Deng & Yi-bin Li

Authors

Xin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ya Xu
View author publications
You can also search for this author in PubMed Google Scholar
Guo-qiang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Li-xia Deng
View author publications
You can also search for this author in PubMed Google Scholar
Yi-bin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Ma.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61075091, 61105100, and 61240052), the Natural Science Foundation of Shandong Province, China (No. ZR2012FM036), and the Independent Innovation Foundation of Shandong University, China (Nos. 2011JC011 and 2012JC005)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, X., Xu, Y., Sun, Gq. et al. State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots. J. Zhejiang Univ. - Sci. C 14, 167–178 (2013). https://doi.org/10.1631/jzus.C1200226

Download citation

Received: 19 July 2012
Accepted: 12 October 2012
Published: 10 March 2013
Issue Date: March 2013
DOI: https://doi.org/10.1631/jzus.C1200226

Key words

CLC number

TP242.6

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Path Planning and Trajectory Planning Algorithms: A General Overview

A review of motion planning algorithms for intelligent robots

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Path Planning and Trajectory Planning Algorithms: A General Overview

A review of motion planning algorithms for intelligent robots

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation