Abstract
The distribution mismatching issue has been hindering the landing of deep reinforcement learning algorithms in the robot field for a long time. This paper proposes a novel DRL-based path planner and corresponding training method to realize the safe obstacle avoidance of real quadrotors. To achieve the goal, we design a randomized environment generation module to fit the reality-simulation error. Then the map information can be parameterized to make the test data statistically significant. In addition, an instruction filter is proposed to smooth the output of the policy network in the test phase. Its improvement in obstacle avoidance performance is demonstrated in the experiment section. Finally, real-time flight experiments are conducted to verify the effectiveness of our algorithm and prove that the learning-based path planner can solve practical problems in the robot field. Our framework has three advantages: (1) map parameterization, (2) low-cost planning, and (3) reality validation. The video and code are available: https://github.com/Vinson-sheep/multi_rotor_avoidance_rl.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Code Availability
The complete simulation data is available by contacting the corresponding author.
References
Aakash, C., Kumar, V.M.: Path planning of an UAV with the help of lidar for slam application. IOP Conference Series: Materials Science and Engineering 912(6), 062013 (2020). https://doi.org/10.1088/1757-899x/912/6/062013
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017). https://doi.org/10.1109/MSP.2017.2743240
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41–48 (2009)
Bharadwaja, Y., Vaitheeswaran, S., Ananda, C.: Obstacle avoidance for unmanned air vehicles using monocular-slam with chain-based path planning in gps denied environments. Journal of Aerospace System Engineering 14(2), 1–11 (2020)
Chen, S., Chen, H., Zhou, W., Wen, C.Y., Li, B.: End-to-End UAV simulation for visual SLAM and navigation. arXiv:2012.00298 (2020)
Chen, S., Chen, H., Zhou, W., Wen, C.Y., Li, B.: End-to-End UAV simulation for visual SLAM and navigation. arXiv:2012.00298 (2020)
Christodoulou, P.: Soft actor-critic for discrete action settings. arXiv:1910.07207 (2019)
Deits, R., Tedrake, R.: Efficient mixed-integer planning for Uavs in cluttered environments. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 42–49 (2015), https://doi.org/10.1109/ICRA.2015.7138978
Doukhi, O., Lee, D.: Deep reinforcement learning for end-to-end local motion planning of autonomous aerial robots in unknown outdoor environments: Real-time flight experiments. Sensors 21, 2534 (2021). https://doi.org/10.3390/s21072534
Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv:1802.09477 (2018)
Gao, F., Wang, L., Zhou, B., Zhou, X., Pan, J., Shen, S.: Teach-repeat-replan: a complete and robust system for aggressive flight in complex environments. IEEE Trans. Robot. 36(5), 1526–1545 (2020). https://doi.org/10.1109/TRO.2020.2993215
Haarnoja, T., Ha, S., Zhou, A., Tan, J., Tucker, G., Levine, S.: Learning to walk via deep reinforcement learning. arXiv:1812.11103 (2018)
Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. arXiv:1702.08165 (2017)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv:1801.01290 (2018)
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., Levine, S.: Soft actor-critic algorithms and applications. arXiv:1812.05905(2018)
Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics 4(2), 100–107 (1968). https://doi.org/10.1109/TSSC.1968.300136
He, L., Aouf, N., Whidborne, J.F., Song, B.: Deep reinforcement learning based local planner for UAV obstacle avoidance using demonstration data. arXiv:2008.02521 (2020)
Karaman, S., Frazzoli, E.: Incremental sampling-based algorithms for optimal motion planning. arXiv:1005.0416 (2010)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980(2014)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
Mellinger, D., Kumar, V.: Minimum snap trajectory generation and control for Quadrotors. In: 2011 IEEE International Conference on Robotics and Automation, pp. 2520–2525 (2011), https://doi.org/10.1109/ICRA.2011.5980409
Ou, J., Guo, X., Zhu, M., Lou, W.: Autonomous quadrotor obstacle avoidance based on dueling double deep recurrent q-learning with monocular vision. Neurocomputing 441(2), 300–310 (2021)
Patterson, M.A., Weinstein, M., Rao, A.V.: An efficient overloaded method for computing derivatives of mathematical functions in matlab. ACM Trans. Math. Softw. 39(3). https://doi.org/10.1145/2450153.2450155 (2013)
Sun, W., Tang, G., Hauser, K.: Fast uav trajectory optimization using bilevel optimization with analytical gradients. IEEE Trans. Robot. 37(6), 2010–2024 (2021). https://doi.org/10.1109/TRO.2021.3076454
Tordesillas, J., Lopez, B.T., How, J.P.: Faster: fast and safe trajectory planner for flights in unknown environments. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1934–1940 (2019), https://doi.org/10.1109/IROS40897.2019.8968021
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. arXiv:1509.06461 (2015)
Wang, Z., Zhou, X., Xu, C., Gao, F.: Geometrically constrained trajectory optimization for multicopters. IEEE Trans. Robot. 38(5), 3259–3278 (2022). https://doi.org/10.1109/TRO.2022.3160022
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003. PMLR (2016)
Wu, C.: Towards linear-time incremental structure from motion. In: 2013 International Conference on 3D Vision - 3DV 2013, pp. 127–134. IEEE (2013), https://doi.org/10.1109/3DV.2013.25
Funding
This work is supported by the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515110815).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. Yongsheng Yang, Zhiwei Hou, Hongbo Chen and Peng Lu performed material preparation, data collection and analysis. Yongsheng Yang wrote the first draft of the manuscript and all authors commented on previous versions. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not applicable. Our manuscript doesn’t report the results of studies involving humans or animals.
Consent to participate
Not applicable. Our manuscript doesn’t report the results of studies involving humans or animals.
Consent for Publication
All authors have approved and consented to publish the manuscript.
Conflict of Interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Y., Hou, Z., Chen, H. et al. DRL-based Path Planner and its Application in Real Quadrotor with LIDAR. J Intell Robot Syst 107, 38 (2023). https://doi.org/10.1007/s10846-023-01819-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-023-01819-0