A Q-learning approach based on human reasoning for navigation in a dynamic environment

Rupeng Yuan; Fuhai Zhang; Yu Wang; Yili Fu; Shuguo Wang

doi:10.1017/S026357471800111X

A Q-learning approach based on human reasoning for navigation in a dynamic environment

Published online by Cambridge University Press: 30 October 2018

Rupeng Yuan ,

Fuhai Zhang ,

Yu Wang ,

Yili Fu and

Shuguo Wang

Show author details

Rupeng Yuan: Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Fuhai Zhang*: Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Yu Wang: Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Yili Fu: Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Shuguo Wang: Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
*: *Corresponding author. E-mail: zfhhit@hit.edu.cn, meylfu_hit@163.com

Article contents

Summary
References

Get access

Rights & Permissions

Summary

A Q-learning approach is often used for navigation in static environments where state space is easy to define. In this paper, a new Q-learning approach is proposed for navigation in dynamic environments by imitating human reasoning. As a model-free method, a Q-learning method does not require the environmental model in advance. The state space and the reward function in the proposed approach are defined according to human perception and evaluation, respectively. Specifically, approximate regions instead of accurate measurements are used to define states. Moreover, due to the limitation of robot dynamics, actions for each state are calculated by introducing a dynamic window that takes robot dynamics into account. The conducted tests show that the obstacle avoidance rate of the proposed approach can reach 90.5% after training, and the robot can always operate below the dynamics limitation.

Keywords

Autonomous navigation Mobile robot Dynamic environment Q-learning

Type: Articles
Information: Robotica , Volume 37 , Issue 3 , March 2019 , pp. 445 - 468

DOI: https://doi.org/10.1017/S026357471800111X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1. Minguez, J. and Montano, L., “Sensor-based robot motion generation in unknown, dynamic and troublesome scenarios,” Robot. Auton. Syst. 52 (4), 290–311 (2005).Google Scholar

2. Xidias, E., Zacharia, P. and Nearchou, A., “Path planning and scheduling for a fleet of autonomous vehicles,” Robotica 34 (10), 2257–2273 (2016).Google Scholar

3. Zhang, L., “Self-adaptive Monte Carlo localization for mobile robots using range sensors,” Robotica 30 (2), 229–244 (2009).Google Scholar

4. Chen, X., Xu, Y., Li, Q., Tang, J. and Shen, C., “Improving ultrasonic-based seamless navigation for indoor mobile robots utilizing EKF and LS-SVM,” Measurement 92, 243–251 (2016).Google Scholar

5. Zhuang, Y., Syed, Z., Li, Y. and El-Sheimy, N., “Evaluation of Two WiFi positioning systems based on autonomous crowdsourcing of handheld devices for indoor navigation,” IEEE Trans. Mob. Comput. 15 (8), 1982–1995 (2016).Google Scholar

6. Cadena, C. et al., “Simultaneous localization and mapping: Present, future, and the robust-perception age,” IEEE Trans. Robot. 30 (6), 1309–1332 (2016).Google Scholar

7. Hu, X., Chen, L., Tang, B., Cao, D. and He, H., “Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles,” Mech. Syst. Signal Process. 100, 482–500 (2018).Google Scholar

8. Khatib, O., “Real-time obstacle avoidance for manipulators and mobile robots,” Int. J. Robot. Res. 5 (5), 500–505 (1986).Google Scholar

9. Ge, S. S. and Cui, Y. J., “New potential functions for mobile robot path planning,” IEEE Trans. Robot. Autom. 16 (5), 615–620 (2000).Google Scholar

10. Ge, S. S. and Cui, Y. J., “Dynamic motion planning for mobile robots using potential field method,” Auton. Robots 13 (3), 207–222 (2002).Google Scholar

11. Chen, Y., Peng, H. and Grizzle, J., “Obstacle avoidance for low-speed autonomous vehicles with barrier function,” IEEE Trans. Control Syst. Technol. 26 (1), 194–206 (2018).Google Scholar

12. Lavalle, S., “Rapidly-exploring random trees: A new tool for path planning,” Res. Report 1, 293–308 (1998).Google Scholar

13. Richards, Arthur et al., “Spacecraft trajectory planning with avoidance constraints using mixed-integer linear programming,” J. Guidance Control Dynamics 25 (4), 755–764 (2012).Google Scholar

14. Yucong, Lin and Saripalli, S., “Path planning using 3D Dubins Curve for Unmanned Aerial Vehicles,” Proceedings of the International Conference on Unmanned Aircraft Systems IEEE, Orlando, FL, USA (2014) pp. 296–304.Google Scholar

15. Duguleana, M. and Mogan, G., “Neural networks based reinforcement learning for mobile robots obstacle avoidance,” Expert Syst. Appl. Int. J. 62, 104–115 (2016).Google Scholar

16. Jordan, M. I. and Mitchell, T. M., “Machine learning: Trends, perspectives, and prospects,” Science 349 (6245), 255–260 (2015).Google Scholar

17. Tai, L., Li, S. and Liu, M., “A Deep-Network Solution Towards Model-Less Obstacle Avoidance,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea (Oct. 2016) pp. 2759–2764.Google Scholar

18. Findi, A. H. M., Marhaban, M. H., Kamil, R. and Hassan, M. K., “Collision prediction based genetic network programming-reinforcement learning for mobile robot navigation in unknown dynamic environments,” J. Electr. Eng. Technol. 12, (2017).Google Scholar

19. Watkins, C. J. C. H., “Learning from delayed rewards,” Robot. Auton. Syst. 15 (4), 233–235 (1989).Google Scholar

20. Xu, X., Zuo, L. and Huang, Z., “Reinforcement learning algorithms with function approximation: Recent advances and applications,” Information Sci. 261, 1–31 (2014).Google Scholar

21. Gu, D. and Hu, H., “Teaching robots to plan through Q-learning,” Robotica 23 (2), 139–147 (2005).Google Scholar

22. Smart, W. D. and Kaelbling, L. P., “Effective Reinforcement Learning for Mobile Robots,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 4, Washington, DC, USA (May 2002) pp. 3404–3410.Google Scholar

23. Macek, K., Petrovic, I. and Peric, N., “A Reinforcement Learning Approach to Obstacle Avoidance of Mobile Robots,” Proceedings of the International Workshop on Advanced Motion Control, Maribor, Slovenia (2002) pp. 462–466.Google Scholar

24. Lee, J., Kim, T. and Kim, H. J., “Autonomous Lane Keeping based on Approximate Q-learning,” Proceedings of the International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Jeju, South Korea (July 2017) pp. 402–405.Google Scholar

25. Jaradat, M. A. K., Al-Rousan, M. and Quadan, L., “Reinforcement based mobile robot navigation in dynamic environment,” Robot. Comput.-Integr. Manuf. 27 (1), 135–149 (2011).Google Scholar

26. Fox, D., Burgard, W. and Thrun, S., “The dynamic window approach to collision avoidance,” IEEE Robot. Autom. Mag. 4 (1), 23–33 (1997).Google Scholar

Article contents

A Q-learning approach based on human reasoning for navigation in a dynamic environment

Summary

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests