Skip to main content

Advertisement

Log in

Mapless Motion Planning System for an Autonomous Underwater Vehicle Using Policy Gradient-based Deep Reinforcement Learning

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

This research is concerned with the motion planning problem encountered by underactuated autonomous underwater vehicles (AUVs) in a mapless environment. A motion planning system based on deep reinforcement learning is proposed. This system, which directly optimizes the policy, is an end-to-end motion planning system. It uses sensor information as input and continuous surge force and yaw moment as output. It can reach multiple target points in a sequence while simultaneously avoiding obstacles. In addition, this study proposes a reward curriculum training method to solve the problem in which the number of samples required for random exploration increases exponentially with the number of steps needed to obtain a reward. At the same time, the negative impact of intermediate rewards can be avoided. The proposed system demonstrates good planning ability for a mapless environment and excellent ability to migrate to other unknown environments. The system also has resistance to current disturbances. The simulation results show that the proposed mapless motion planning system can guide an underactuated AUV in navigating to its desired targets without colliding with any obstacles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp. 41–48. ACM (2009)

  2. Carreras, M., Batlle, J., Ridao, P.: Hybrid coordination of reinforcement learning-based behaviors for auv control. In: 2001 IEEE/RSJ international conference on intelligent robots and systems, 2001. Proceedings, vol. 3, pp. 1410–1415. IEEE (2001)

  3. Carreras Pérez, M., Yuh, J., Batlle i Grabulosa, J., Ridao Rodríguez, P.: A behavior-based scheme using reinforcement learning for autonomous underwater vehicles. Ⓒ Oceanic Engineering 30, 416–427 (2005)

    Article  Google Scholar 

  4. Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: Learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)

  5. Cheng, Y., Zhang, W.: Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels. Neurocomputing 272, 63–73 (2018)

    Article  Google Scholar 

  6. Cui, R., Yang, C., Li, Y., Sharma, S.: Adaptive neural network control of auvs with control input nonlinearities using reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. Hum. 47(6), 1019–1029 (2017)

    Article  Google Scholar 

  7. Devlin, S., Kudenko, D., Grześ, M.: An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Adv. Complex Syst. 14(02), 251–278 (2011)

    Article  MathSciNet  Google Scholar 

  8. El-Fakdi, A., Carreras, M.: Policy gradient based reinforcement learning for real autonomous underwater cable tracking. In: IEEE/RSJ international conference on intelligent robots and systems, 2008, IROS 2008. pp. 3635–3640. IEEE (2008)

  9. El-Fakdi, A., Carreras, M.: Two-step gradient-based reinforcement learning for underwater robotics behavior learning. Robot. Auton. Syst. 61(3), 271–282 (2013)

    Article  Google Scholar 

  10. Fossen, T.I.: Handbook of marine craft hydrodynamics and motion control. John Wiley & Sons (2011)

  11. Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S.G., Grefenstette, E., Ramalho, T., Agapiou, J., et al.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471 (2016)

    Article  Google Scholar 

  12. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), pp. 3389–3396. IEEE (2017)

  13. Heess, N., Hunt, J.J., Lillicrap, T.P., Silver, D.: Memory-based control with recurrent neural networks. arXiv:1512.04455 (2015)

  14. Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, A., Riedmiller, M., et al.: Emergence of locomotion behaviours in rich environments. arXiv:1707.02286 (2017)

  15. Kawano, H., Ura, T.: Motion planning algorithm for nonholonomic autonomous underwater vehicle in disturbance using reinforcement learning and teaching method. In: IEEE international conference on robotics and automation, 2002. Proceedings. ICRA’02, vol. 4, pp. 4032–4038. IEEE (2002)

  16. Kormushev, P., Caldwell, D.G.: Towards improved auv control through learning of periodic signals. In: Oceans-San Diego, 2013, pp. 1–4. IEEE (2013)

  17. Lei, T., Ming, L.: A robot exploration strategy based on q-learning network. In: IEEE international conference on real-time computing and robotics (RCAR), pp. 57–62. IEEE (2016)

  18. Li, Y., Cui, R., Li, Z., Xu, D.: Neural network approximation-based near-optimal motion planning with kinodynamic constraints using rrt. IEEE Transactions on Industrial Electronics (2018)

  19. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D. arXiv:1509.02971 (2015)

  20. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp. 1928–1937 (2016)

  21. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)

  22. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  23. Muller, U., Ben, J., Cosatto, E., Flepp, B., Cun, Y.L.: Off-road obstacle avoidance through end-to-end learning. In: Advances in neural information processing systems, pp. 739–746 (2006)

  24. Ng, A.Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Experimental Robotics IX, pp. 363–372. Springer (2006)

  25. Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)

  26. Pfeiffer, M., Schaeuble, M., Nieto, J., Siegwart, R., Cadena, C.: From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots. In: 2017 IEEE international conference on robotics and automation (icra), pp. 1527–1533. IEEE (2017)

  27. Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv:1710.05941 (2018)

  28. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv:1511.05952 (2015)

  29. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International conference on machine learning, pp. 1889–1897 (2015)

  30. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)

  31. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)

    Article  Google Scholar 

  32. Tai, L., Liu, M. arXiv:1610.01733 (2016)

  33. Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 31–36. IEEE (2017)

  34. Tambet, M., Avital, O., Taco, C., John, S.: Teacher-student curriculum learning. arXiv:1707.00183 (2017)

  35. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI, vol. 2, pp. 5. Phoenix, AZ (2016)

  36. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv:1511.06581 (2015)

  37. Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in neural information processing systems, pp. 5279–5288 (2017)

  38. Xiao, H., Cui, R., Xu, D.: A sampling-based bayesian approach for cooperative multiagent online search with resource constraints. IEEE Trans Cybern 48(6), 1773–1785 (2018)

    Article  Google Scholar 

  39. Xie, C., Patil, S., Moldovan, T., Levine, S., Abbeel, P.: Model-based reinforcement learning with parametrized physical models and optimism-driven exploration. In: 2016 IEEE international conference on robotics and automation (ICRA), pp. 504–511. IEEE (2016)

  40. Zaremba, W., Sutskever, I.: Learning to execute. arXiv:1410.4615 (2014)

  41. Zhang, F., Leitner, J., Milford, M., Upcroft, B., Corke, P.: Towards vision-based deep reinforcement learning for robotic motion control (2015). arXiv:1511.03791

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guocheng Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Cheng, J., Zhang, G. et al. Mapless Motion Planning System for an Autonomous Underwater Vehicle Using Policy Gradient-based Deep Reinforcement Learning. J Intell Robot Syst 96, 591–601 (2019). https://doi.org/10.1007/s10846-019-01004-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-019-01004-2

Keywords

Navigation