Skip to main content
Log in

State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

This paper deals with a new approach based on Q-learning for solving the problem of mobile robot path planning in complex unknown static environments. As a computational approach to learning through interaction with the environment, reinforcement learning algorithms have been widely used for intelligent robot control, especially in the field of autonomous mobile robots. However, the learning process is slow and cumbersome. For practical applications, rapid rates of convergence are required. Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning, a state-chain sequential feedback Q-learning algorithm is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments. The state chain is built during the searching process. After one action is chosen and the reward is received, the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning. With the increasing number of Q-values updated after one action, the number of actual steps for convergence decreases and thus, the learning time decreases, where a step is a state transition. Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments. The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time, compared with the one-step Q-learning algorithm and the Q(λ)-learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agirrebeitia, J., Aviles, R., de Bustos, I.F., Ajuria, G., 2005. A new APF strategy for path planning in environments with obstacles. Mech. Mach. Theory, 40(6):645–658. [doi:10.1016/j.mechmachtheory.2005.01.006]

    Article  MathSciNet  MATH  Google Scholar 

  • Alexopoulos, C., Griffin, P.M., 1992. Path planning for a mobile robot. IEEE Trans. Syst. Man Cybern., 22(2): 318–322. [doi:10.1109/21.148404]

    Article  Google Scholar 

  • Al-Taharwa, I., Sheta, A., Al-Weshah, M., 2008. A mobile robot path planning using genetic algorithm in static environment. J. Comput. Sci., 4(4):341–344.

    Article  Google Scholar 

  • Barraquand, J., Langlois, B., Latombe, J.C., 1992. Numerical potential field techniques for robot path planning. IEEE Trans. Syst. Man Cybern., 22(2):224–241. [doi:10.1109/21.148426]

    Article  MathSciNet  Google Scholar 

  • Cao, Q., Huang, Y., Zhou, J., 2006. An Evolutionary Artificial Potential Field Algorithm for Dynamic Path Planning of Mobile Robot. Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.3331–3336. [doi:10.1109/IROS.2006.282508]

    Google Scholar 

  • Castillo, O., Trujillo, L., Melin, P., 2007. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots. Soft Comput., 11(3):269–279. [doi:10.1007/s00500-006-0068-4]

    Article  Google Scholar 

  • Dearden, R., Friedman, N., Russell, S., 1998. Bayesian Q-Learning. Proc. National Conf. on Artificial Intelligence, p.761–768.

    Google Scholar 

  • Dolgov, D., Thrun, S., Montemerlo, M., Diebel, J., 2010. Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res., 29(5):485–501. [doi:10.1177/0278364909359210]

    Article  Google Scholar 

  • Framling, K., 2007. Guiding exploration by pre-existing knowledge without modifying reward. Neur. Networks, 20(6):736–747. [doi:10.1016/j.neunet.2007.02.001]

    Article  Google Scholar 

  • Garcia, M.A., Montiel, O., Castillo, O., Sepulveda, R., Melin, P., 2009. Path planning for autonomous mobile robot navigation with ant colony optimization and fuzzy cost function evaluation. Appl. Soft Comput., 9(3):1102–1110. [doi:10.1016/j.asoc.2009.02.014]

    Article  Google Scholar 

  • Ge, S.S., Cui, Y.J., 2002. Dynamic motion planning for mobile robots using potential field method. Auton. Robots, 13(3):207–222. [doi:10.1023/A:1020564024509]

    Article  MATH  Google Scholar 

  • Ghatee, M., Mohades, A., 2009. Motion planning in order to optimize the length and clearance applying a hopfield neural network. Expert Syst. Appl., 36(3):4688–4695. [doi:10.1016/j.eswa.2008.06.040]

    Article  Google Scholar 

  • Gong, D., Lu, L., Li, M., 2009. Robot Path Planning in Uncertain Environments Based on Particle Swarm Optimization. Proc. IEEE Congress on Evolutionary Computation, p.2127–2134. [doi:10.1109/CEC.2009.4983204]

    Google Scholar 

  • Guo, M., Liu, Y., Malec, J., 2004. A new Q-learning algorithm based on the metropolis criterion. IEEE Trans. Syst. Man Cybern. B, 34(5):2140–2143. [doi:10.1109/TSMCB.2004.832154]

    Article  Google Scholar 

  • Hachour, O., 2009. The proposed fuzzy logic navigation approach of autonomous mobile robots in unknown environments. Int. J. Math. Models Methods Appl. Sci., 3(3):204–218.

    Google Scholar 

  • Hamagami, T., Hirata, H., 2003. An Adjustment Method of the Number of States of Q-Learning Segmenting State Space Adaptively. Proc. IEEE Int. Conf. on Systems, Man and Cybernetics, 4:3062–3067. [doi:10.1109/ICSMC.2003.1244360]

    Google Scholar 

  • Hart, P.E., Nilsson, N.J., Raphael, B., 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern., 4(2):100–107. [doi:10.1109/TSSC.1968.300136]

    Article  Google Scholar 

  • Hwang, H.J., Viet, H.H., Chung, T., 2011. Q(λ) based vector direction for path planning problem of autonomous mobile robots. Lect. Notes Electr. Eng., 107(Part 4): 433–442. [doi:10.1007/978-94-007-2598-0_46]

    Article  Google Scholar 

  • Jaradat, M.A.K., Al-Rousan, M., Quadan, L., 2011. Reinforcement based mobile robot navigation in dynamic environment. Robot. Comput.-Integr. Manuf., 27(1):135–149. [doi:10.1016/j.rcim.2010.06.019]

    Article  Google Scholar 

  • Jin, Z., Liu, W., Jin, J., 2009. Partitioning the State Space by Critical States. Proc. 4th Int. Conf. on Bio-Inspired Computing, p.1–7. [doi:10.1109/BICTA.2009.5338123]

    Google Scholar 

  • Kala, R., Shukla, A., Tiwari, R., 2010. Fusion of probabilistic A* algorithm and fuzzy inference system for robotic path planning. Artif. Intell. Rev., 33(4):307–327. [doi:10.1007/s10462-010-9157-y]

    Article  Google Scholar 

  • Koenig, S., Simmons, R.G., 1996. The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Mach. Learn., 22(1–3):227–250. [doi:10.1007/BF00114729]

    MATH  Google Scholar 

  • Lampton, A., Valasek, J., 2009. Multiresolution State-Space Discretization Method for Q-Learning. Proc. American Control Conf., p.1646–1651. [doi:10.1109/ACC.2009.5160474]

    Google Scholar 

  • Latombe, J.C., 1991. Robot Motion Planning. Kluwer Academic Publishers. [doi:10.1007/978-1-4615-4022-9]

    Book  Google Scholar 

  • Oh, C.H., Nakashima, T., Ishibuchi, H., 1998. Initialization of Q-Values by Fuzzy Rules for Accelerating QLearning. Proc. IEEE World Congress on Computational Intelligence and IEEE Int. Joint Conf. on Neural Networks, 3:2051–2056. [doi:10.1109/IJCNN.1998.687175]

    Google Scholar 

  • Peng, J., Williams, R.J., 1996. Incremental multi-step Qlearning. Mach. Learn., 22(1–3):283–290. [doi:10.1023/A:1018076709321]

    Google Scholar 

  • Poty, A., Melchior, P., Oustaloup, A., 2004. Dynamic Path Planning for Mobile Robots Using Fractional Potential Field. Proc. 1st Int. Symp. on Control, Communications and Signal Processing, p.557–561. [doi:10.1109/ISCCSP.2004.1296443]

    Google Scholar 

  • Saab, Y., VanPutte, M., 1999. Shortest path planning on topographical maps. IEEE Trans. Syst. Man Cybern. A, 29(1):139–150. [doi:10.1109/3468.736370]

    Article  Google Scholar 

  • Senda, K., Mano, S., Fujii, S., 2003. A Reinforcement Learning Accelerated by State Space Reduction. SICE Annual Conf., 2:1992–1997.

    Google Scholar 

  • Song, Y., Li, Y., Li, C., Zhang, G., 2012. An efficient initialization approach of Q-learning for mobile robots. Int. J. Control Autom. Syst., 10(1):166–172. [doi:10.1007/s12555-012-0119-9]

    Article  MathSciNet  Google Scholar 

  • Still, S., Precup, D., 2012. An information-theoretic approach to curiosity-driven reinforcement learning. Theory Biosci., 131(3):139–148. [doi:10.1007/s12064-011-0142-z]

    Article  Google Scholar 

  • Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, Cambriage, MA.

    Google Scholar 

  • Tsai, C.C., Huang, H.C., Chan, C.K., 2011. Parallel elite genetic algorithm and its application to global path planning for autonomous robot navigation. IEEE Trans. Ind. Electron., 58(10):4813–4821. [doi:10.1109/TIE.2011.2109332]

    Article  Google Scholar 

  • Wang, Z., Zhu, X., Han, Q., 2011. Mobile robot path planning based on parameter optimization ant colony algorithm. Proc. Eng., 15:2738–2741. [doi:10.1016/j.proeng.2011.08.515]

    Article  Google Scholar 

  • Watkins, C.J.C.H., 1989. Learning from Delayed Rewards. University of Cambridge, Cambridge, UK.

    Google Scholar 

  • Watkins, C.J.C.H., Dayan, P., 1992. Q-learning. Mach. Learn., 8(3–4):279–292.

    MATH  Google Scholar 

  • Wiewiora, E., 2003. Potential-based shaping and Q-value initialization are equivalent. Artif. Intell. Res., 19:205–208.

    MathSciNet  MATH  Google Scholar 

  • Yang, S.X., Meng, M., 2000. An efficient neural network approach to dynamic robot motion planning. Neur. Networks, 13(2):143–148. [doi:10.1016/S0893-6080(99)00103-3]

    Article  Google Scholar 

  • Yang, X., Moallem, M., Patel, R.V., 2005. A layered goal-oriented fuzzy motion planning strategy for mobile robot navigation. IEEE Trans. Syst. Man Cybern. B, 35(6):1214–1224. [doi:10.1109/TSMCB.2005.850177]

    Article  Google Scholar 

  • Yun, S.C., Ganapathy, V., Chong, L.O., 2010. Improved Genetic Algorithms Based Optimum Path Planning for Mobile Robot. Proc. 11th Int. Conf. on Control, Automation, Robotics and Vision, p.1565–1570. [doi:10.1109/ICARCV.2010.5707781]

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Ma.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61075091, 61105100, and 61240052), the Natural Science Foundation of Shandong Province, China (No. ZR2012FM036), and the Independent Innovation Foundation of Shandong University, China (Nos. 2011JC011 and 2012JC005)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, X., Xu, Y., Sun, Gq. et al. State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots. J. Zhejiang Univ. - Sci. C 14, 167–178 (2013). https://doi.org/10.1631/jzus.C1200226

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1200226

Key words

CLC number

Navigation