Abstract
This paper deals with a new approach based on Q-learning for solving the problem of mobile robot path planning in complex unknown static environments. As a computational approach to learning through interaction with the environment, reinforcement learning algorithms have been widely used for intelligent robot control, especially in the field of autonomous mobile robots. However, the learning process is slow and cumbersome. For practical applications, rapid rates of convergence are required. Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning, a state-chain sequential feedback Q-learning algorithm is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments. The state chain is built during the searching process. After one action is chosen and the reward is received, the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning. With the increasing number of Q-values updated after one action, the number of actual steps for convergence decreases and thus, the learning time decreases, where a step is a state transition. Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments. The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time, compared with the one-step Q-learning algorithm and the Q(λ)-learning algorithm.
Similar content being viewed by others
References
Agirrebeitia, J., Aviles, R., de Bustos, I.F., Ajuria, G., 2005. A new APF strategy for path planning in environments with obstacles. Mech. Mach. Theory, 40(6):645–658. [doi:10.1016/j.mechmachtheory.2005.01.006]
Alexopoulos, C., Griffin, P.M., 1992. Path planning for a mobile robot. IEEE Trans. Syst. Man Cybern., 22(2): 318–322. [doi:10.1109/21.148404]
Al-Taharwa, I., Sheta, A., Al-Weshah, M., 2008. A mobile robot path planning using genetic algorithm in static environment. J. Comput. Sci., 4(4):341–344.
Barraquand, J., Langlois, B., Latombe, J.C., 1992. Numerical potential field techniques for robot path planning. IEEE Trans. Syst. Man Cybern., 22(2):224–241. [doi:10.1109/21.148426]
Cao, Q., Huang, Y., Zhou, J., 2006. An Evolutionary Artificial Potential Field Algorithm for Dynamic Path Planning of Mobile Robot. Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.3331–3336. [doi:10.1109/IROS.2006.282508]
Castillo, O., Trujillo, L., Melin, P., 2007. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots. Soft Comput., 11(3):269–279. [doi:10.1007/s00500-006-0068-4]
Dearden, R., Friedman, N., Russell, S., 1998. Bayesian Q-Learning. Proc. National Conf. on Artificial Intelligence, p.761–768.
Dolgov, D., Thrun, S., Montemerlo, M., Diebel, J., 2010. Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res., 29(5):485–501. [doi:10.1177/0278364909359210]
Framling, K., 2007. Guiding exploration by pre-existing knowledge without modifying reward. Neur. Networks, 20(6):736–747. [doi:10.1016/j.neunet.2007.02.001]
Garcia, M.A., Montiel, O., Castillo, O., Sepulveda, R., Melin, P., 2009. Path planning for autonomous mobile robot navigation with ant colony optimization and fuzzy cost function evaluation. Appl. Soft Comput., 9(3):1102–1110. [doi:10.1016/j.asoc.2009.02.014]
Ge, S.S., Cui, Y.J., 2002. Dynamic motion planning for mobile robots using potential field method. Auton. Robots, 13(3):207–222. [doi:10.1023/A:1020564024509]
Ghatee, M., Mohades, A., 2009. Motion planning in order to optimize the length and clearance applying a hopfield neural network. Expert Syst. Appl., 36(3):4688–4695. [doi:10.1016/j.eswa.2008.06.040]
Gong, D., Lu, L., Li, M., 2009. Robot Path Planning in Uncertain Environments Based on Particle Swarm Optimization. Proc. IEEE Congress on Evolutionary Computation, p.2127–2134. [doi:10.1109/CEC.2009.4983204]
Guo, M., Liu, Y., Malec, J., 2004. A new Q-learning algorithm based on the metropolis criterion. IEEE Trans. Syst. Man Cybern. B, 34(5):2140–2143. [doi:10.1109/TSMCB.2004.832154]
Hachour, O., 2009. The proposed fuzzy logic navigation approach of autonomous mobile robots in unknown environments. Int. J. Math. Models Methods Appl. Sci., 3(3):204–218.
Hamagami, T., Hirata, H., 2003. An Adjustment Method of the Number of States of Q-Learning Segmenting State Space Adaptively. Proc. IEEE Int. Conf. on Systems, Man and Cybernetics, 4:3062–3067. [doi:10.1109/ICSMC.2003.1244360]
Hart, P.E., Nilsson, N.J., Raphael, B., 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern., 4(2):100–107. [doi:10.1109/TSSC.1968.300136]
Hwang, H.J., Viet, H.H., Chung, T., 2011. Q(λ) based vector direction for path planning problem of autonomous mobile robots. Lect. Notes Electr. Eng., 107(Part 4): 433–442. [doi:10.1007/978-94-007-2598-0_46]
Jaradat, M.A.K., Al-Rousan, M., Quadan, L., 2011. Reinforcement based mobile robot navigation in dynamic environment. Robot. Comput.-Integr. Manuf., 27(1):135–149. [doi:10.1016/j.rcim.2010.06.019]
Jin, Z., Liu, W., Jin, J., 2009. Partitioning the State Space by Critical States. Proc. 4th Int. Conf. on Bio-Inspired Computing, p.1–7. [doi:10.1109/BICTA.2009.5338123]
Kala, R., Shukla, A., Tiwari, R., 2010. Fusion of probabilistic A* algorithm and fuzzy inference system for robotic path planning. Artif. Intell. Rev., 33(4):307–327. [doi:10.1007/s10462-010-9157-y]
Koenig, S., Simmons, R.G., 1996. The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Mach. Learn., 22(1–3):227–250. [doi:10.1007/BF00114729]
Lampton, A., Valasek, J., 2009. Multiresolution State-Space Discretization Method for Q-Learning. Proc. American Control Conf., p.1646–1651. [doi:10.1109/ACC.2009.5160474]
Latombe, J.C., 1991. Robot Motion Planning. Kluwer Academic Publishers. [doi:10.1007/978-1-4615-4022-9]
Oh, C.H., Nakashima, T., Ishibuchi, H., 1998. Initialization of Q-Values by Fuzzy Rules for Accelerating QLearning. Proc. IEEE World Congress on Computational Intelligence and IEEE Int. Joint Conf. on Neural Networks, 3:2051–2056. [doi:10.1109/IJCNN.1998.687175]
Peng, J., Williams, R.J., 1996. Incremental multi-step Qlearning. Mach. Learn., 22(1–3):283–290. [doi:10.1023/A:1018076709321]
Poty, A., Melchior, P., Oustaloup, A., 2004. Dynamic Path Planning for Mobile Robots Using Fractional Potential Field. Proc. 1st Int. Symp. on Control, Communications and Signal Processing, p.557–561. [doi:10.1109/ISCCSP.2004.1296443]
Saab, Y., VanPutte, M., 1999. Shortest path planning on topographical maps. IEEE Trans. Syst. Man Cybern. A, 29(1):139–150. [doi:10.1109/3468.736370]
Senda, K., Mano, S., Fujii, S., 2003. A Reinforcement Learning Accelerated by State Space Reduction. SICE Annual Conf., 2:1992–1997.
Song, Y., Li, Y., Li, C., Zhang, G., 2012. An efficient initialization approach of Q-learning for mobile robots. Int. J. Control Autom. Syst., 10(1):166–172. [doi:10.1007/s12555-012-0119-9]
Still, S., Precup, D., 2012. An information-theoretic approach to curiosity-driven reinforcement learning. Theory Biosci., 131(3):139–148. [doi:10.1007/s12064-011-0142-z]
Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, Cambriage, MA.
Tsai, C.C., Huang, H.C., Chan, C.K., 2011. Parallel elite genetic algorithm and its application to global path planning for autonomous robot navigation. IEEE Trans. Ind. Electron., 58(10):4813–4821. [doi:10.1109/TIE.2011.2109332]
Wang, Z., Zhu, X., Han, Q., 2011. Mobile robot path planning based on parameter optimization ant colony algorithm. Proc. Eng., 15:2738–2741. [doi:10.1016/j.proeng.2011.08.515]
Watkins, C.J.C.H., 1989. Learning from Delayed Rewards. University of Cambridge, Cambridge, UK.
Watkins, C.J.C.H., Dayan, P., 1992. Q-learning. Mach. Learn., 8(3–4):279–292.
Wiewiora, E., 2003. Potential-based shaping and Q-value initialization are equivalent. Artif. Intell. Res., 19:205–208.
Yang, S.X., Meng, M., 2000. An efficient neural network approach to dynamic robot motion planning. Neur. Networks, 13(2):143–148. [doi:10.1016/S0893-6080(99)00103-3]
Yang, X., Moallem, M., Patel, R.V., 2005. A layered goal-oriented fuzzy motion planning strategy for mobile robot navigation. IEEE Trans. Syst. Man Cybern. B, 35(6):1214–1224. [doi:10.1109/TSMCB.2005.850177]
Yun, S.C., Ganapathy, V., Chong, L.O., 2010. Improved Genetic Algorithms Based Optimum Path Planning for Mobile Robot. Proc. 11th Int. Conf. on Control, Automation, Robotics and Vision, p.1565–1570. [doi:10.1109/ICARCV.2010.5707781]
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the National Natural Science Foundation of China (Nos. 61075091, 61105100, and 61240052), the Natural Science Foundation of Shandong Province, China (No. ZR2012FM036), and the Independent Innovation Foundation of Shandong University, China (Nos. 2011JC011 and 2012JC005)
Rights and permissions
About this article
Cite this article
Ma, X., Xu, Y., Sun, Gq. et al. State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots. J. Zhejiang Univ. - Sci. C 14, 167–178 (2013). https://doi.org/10.1631/jzus.C1200226
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.C1200226