Skip to main content
Log in

Aggressive and robust low-level control and trajectory tracking for quadrotors with deep reinforcement learning

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Executing accurate trajectory tracking tasks using a high-performance low-level controller is crucial for quadrotors to be applied in various scenarios, especially those involving uncertain disturbances. However, due to the uncertainties in disturbed environments, developing effective low-level controllers with traditional model-based control schemes is challenging. This paper presents an aggressive and robust reinforcement learning (RL)-based low-level control policy for quadrotors. The policy maps the observed quadrotor state directly to motor thrust commands, without requiring the quadrotor dynamics. Additionally, a trajectory generation pipeline is developed to improve the accuracy of trajectory tracking tasks based on differential flatness. With the learned low-level control policy, extensive simulations and real-world experiments are implemented to validate the performance of the policy. The results indicate that our RL-based low-level control policy outperforms traditional proportional–integral–derivative (PID) control methods and related learning-based policies in terms of accuracy and robustness, particularly in environments with uncertain disturbances. Furthermore, the proposed RL-based control policy exhibits an aggressive response in trajectory tracking, even when the speed of the desired trajectory is increased to 6 m/s. Moreover, the learned policy demonstrates strong vibration suppression capabilities and enables the quadrotor to recover to a hovering state from random initial conditions with shorter response time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data that support the findings of this study are available from the Department of Control Science and Engineering, Harbin Institute of Technology Shenzhen. Restrictions apply to the availability of these data, which were used under license for the current study and so are not publicly available. However, data are available from the authors upon reasonable request.

References

  1. Idrissi M, Salami M, Annaz F (2022) A review of quadrotor unmanned aerial vehicles: applications, architectural design and control algorithms. J Intell Robotic Syst 104(2):22. https://doi.org/10.1007/s10846-021-01527-7

    Article  MATH  Google Scholar 

  2. Gupte S, Mohandas PIT, Conrad JM (2012) A survey of quadrotor unmanned aerial vehicles. In: Proceedings of IEEE Southeastcon, IEEE, pp 1–6, https://doi.org/10.1109/SECon.2012.6196930

  3. Choutri K, Lagha M, Dala L (2020) A fully autonomous search and rescue system using quadrotor UAV. Int J Comput Digit Syst 10:2–12

    MATH  Google Scholar 

  4. Zhang Y, Yuan X, Li W et al (2017) Automatic power line inspection using UAV images. Remote Sens 9(8):824. https://doi.org/10.3390/rs9080824

    Article  MATH  Google Scholar 

  5. Xing J, Cioffi G, Hidalgo-Carrió J, et al (2023) Autonomous power line inspection with drones via perception-aware MPC. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 1086–1093, https://doi.org/10.1109/IROS55552.2023.10341871

  6. Škrinjar JP, Škorput P, Furdić M (2019) Application of unmanned aerial vehicles in logistic processes. Lecture Note Network and Syst 42:359–366. https://doi.org/10.1007/978-3-319-90893-9_43

    Article  Google Scholar 

  7. Giles DK, Billing RC (2015) Deployment and performance of a UAV for crop spraying. Chem Eng Trans 44:307–312. https://doi.org/10.3303/CET1544052

    Article  MATH  Google Scholar 

  8. Hafeez A, Husain MA, Singh SP et al (2023) Implementation of drone technology for farm monitoring & pesticide spraying: a review. Inform Process Agri 10(2):192–203. https://doi.org/10.1016/j.inpa.2022.02.002

    Article  MATH  Google Scholar 

  9. Salih AL, Moghavvemi M, Mohamed HA et al (2010) Flight PID controller design for a UAV quadrotor. Sci. Res Essays 5(23):3660–3667

    MATH  Google Scholar 

  10. Pounds PEI, Bersak DR, Dollar A (2012) Stability of small-scale uav helicopters and quadrotors with added payload mass under pid control. Auton Robot 33:129–142. https://doi.org/10.1007/s10514-012-9280-5

    Article  MATH  Google Scholar 

  11. Koch WF, Bestavros A (2019) Flight controller synthesis via deep reinforcement learning. PhD thesis, Boston University, https://hdl.handle.net/2144/39552

  12. Yang W, Jiang Y, He X et al (2023) Feasibility conditions-free prescribed performance decentralized fault-tolerant neural control of constrained large-scale systems. IEEE Trans Syst Man, Cyber: Syst 53(5):3152–3164. https://doi.org/10.1109/TSMC.2022.3222857

    Article  MATH  Google Scholar 

  13. Shan H, Jiang Y, Liang H et al (2024) Fuzzy adaptive containment control for nonlinear multi-manipulator systems with actuator faults and predefined accuracy. IEEE Trans Netw Sci Eng 11(2):1510–1523. https://doi.org/10.1109/TNSE.2023.3325002

    Article  MathSciNet  MATH  Google Scholar 

  14. Goodarzi F, Lee D, Lee T (2013) Geometric nonlinear PID control of a quadrotor UAV on SE(3). In: European Control Conference. IEEE, pp 3845–3850, https://doi.org/10.23919/ECC.2013.6669644

  15. Raffo GV, Ortega MG, Rubio FR (2011) Nonlinear \(\text{ H}_\infty\) controller for the quad-rotor helicopter with input coupling, vol 44. IFAC. https://doi.org/10.3182/20110828-6-IT-1002.02453

    Article  MATH  Google Scholar 

  16. Tal E, Karaman S (2021) Accurate tracking of aggressive quadrotor trajectories using incremental nonlinear dynamic inversion and differential flatness. IEEE Trans Control Syst Technol 29(3):1203–1218. https://doi.org/10.1109/TCST.2020.3001117

    Article  MATH  Google Scholar 

  17. Hanover D, Foehn P, Sun S et al (2022) Performance, precision, and payloads: adaptive nonlinear MPC for quadrotors. IEEE Robot Automat Lett 7(2):690–697. https://doi.org/10.1109/LRA.2021.3131690

    Article  Google Scholar 

  18. Santoso F, Garratt MA, Anavatti SG (2018) State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Trans Automat Sci Eng 15(2):613–627. https://doi.org/10.1109/TASE.2017.2651109

    Article  MATH  Google Scholar 

  19. Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT Press

    MATH  Google Scholar 

  20. Li S, Liu T, Zhang C, et al (2018) Learning unmanned aerial vehicle control for autonomous target following. In: International Joint Conference on Artificial Intelligence, vol 2018-July. IJCAI, pp 4936–4942, https://doi.org/10.24963/ijcai.2018/685

  21. Polvara R, Patacchiola M, Sharma S, et al (2017) Autonomous quadrotor landing using deep reinforcement learning. https://doi.org/10.48550/arXiv.1709.03339

  22. Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274. https://doi.org/10.1177/0278364913495721

    Article  MATH  Google Scholar 

  23. Chen S, Li Y, Lou Y et al (2023) Learning real-time dynamic responsive gap-traversing policy for quadrotors with safety-aware exploration. IEEE Trans Intell Veh 8(3):2271–2284. https://doi.org/10.1109/TIV.2022.3229723

    Article  MATH  Google Scholar 

  24. Bauersfeld L, Kaufmann E, Scaramuzza D (2023) User-conditioned neural control policies for mobile robotics. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1342–1348, https://doi.org/10.1109/ICRA48891.2023.10160851

  25. Koch W, Mancuso R, West R et al (2019) Reinforcement learning for UAV attitude control. ACM Trans Cyber-Physical Syst 3(2):1–21. https://doi.org/10.1145/3301273

    Article  MATH  Google Scholar 

  26. Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. https://doi.org/10.48550/arXiv.1509.02971

  27. Schulman J, Levine S, Moritz P, et al (2015) Trust region policy optimization. In: International Conference on Machine Learning, vol 37. PMLR, Lille, France, pp 1889–1897, https://proceedings.mlr.press/v37/schulman15.html

  28. Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. https://doi.org/10.48550/arXiv.1707.06347

  29. Hwangbo J, Sa I, Siegwart R et al (2017) Control of a quadrotor with reinforcement learning. IEEE Robot Automat Lett 2(4):2096–2103. https://doi.org/10.1109/LRA.2017.2720851

    Article  MATH  Google Scholar 

  30. Lopes GC, Ferreira M, Da Silva Simoes A, et al (2018) Intelligent control of a quadrotor with proximal policy optimization reinforcement learning. In: 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE). IEEE, pp 509–514, https://doi.org/10.1109/LARS/SBR/WRE.2018.00094

  31. Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, vol 80. PMLR, pp 1861–1870, https://proceedings.mlr.press/v80/haarnoja18b.html

  32. Barros GM, Colombini EL (2020) Using soft actor-critic for low-Level UAV control. https://doi.org/10.48550/arXiv.2010.02293

  33. Wang Y, Sun J, He H et al (2020) Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Trans Syst Man Cybern Syst 50(10):3713–3725. https://doi.org/10.1109/TSMC.2018.2884725

    Article  MATH  Google Scholar 

  34. Li Y, Li H, Li Z, et al (2019) Fast and accurate trajectory tracking for unmanned aerial vehicles based on deep reinforcement learning. In: International Conference on Embedded and Real-Time Computing Systems and Applications. IEEE, https://doi.org/10.1109/RTCSA.2019.8864571

  35. Pi CH, Hu KC, Cheng S et al (2020) Low-level autonomous control and tracking of quadrotor using reinforcement learning. Control Eng Pract 95:104222. https://doi.org/10.1016/j.conengprac.2019.104222

    Article  MATH  Google Scholar 

  36. Puterman ML (2014) Markov decision processes.: Discrete stochastic dynamic programming. John Wiley Sons, https://doi.org/10.1002/9780470316887

  37. Nair V, Hinton GE (2010) Rectified linear units improve Restricted Boltzmann machines. In: International Conference on Machine Learning, pp 807–814, https://doi.org/10.5555/3104322.3104425

  38. Kalman B, Kwasny S (1992) Why tanh: Choosing a sigmoidal function. In: International Joint Conference on Neural Networks, vol 4. IEEE, pp 578–581, https://doi.org/10.1109/IJCNN.1992.227257

  39. Schulman J, Moritz P, Levine S, et al (2016) High-dimensional continuous control using generalized advantage estimation. In: International Conference on Learning Representations, https://doi.org/10.48550/arXiv.1506.02438

  40. Murray RM, Rathinam M, Sluis W (1995) Differential flatness of mechanical control systems: A catalog of prototype systems. In: ASME international mechanical engineering congress and exposition, Citeseer, https://www.cds.caltech.edu/~murray/preprints/mrs95-imece.pdf

  41. Mellinger D, Kumar V (2011) Minimum snap trajectory generation and control for quadrotors. In: IEEE International Conference on Robotics and Automation. IEEE, pp 2520–2525, https://doi.org/10.1109/ICRA.2011.5980409

  42. Faessler M, Franchi A, Scaramuzza D (2018) Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high-speed trajectories. IEEE Robot Automat Lett 3(2):620–626. https://doi.org/10.1109/LRA.2017.2776353

    Article  MATH  Google Scholar 

  43. Song Y, Naji S, Kaufmann E, et al (2021) Flightmare: A flexible quadrotor simulator. In: Conference on Robot Learning, vol 155. PMLR, pp 1147–1157, https://proceedings.mlr.press/v155/song21a

  44. Abadi M, Barham P, Chen J, et al (2016) Tensorflow: A system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, Savannah, GA, pp 265–283, https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

  45. Sola J (2017) Quaternion kinematics for the error-state Kalman filter. https://doi.org/10.48550/arXiv.1711.02508

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61977019; and in part by the Shenzhen Fundamental Research Program under Grant JCYJ20220818102415033, Grant JSGG20201103093802006 and Grant KJZD20230923114222045.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanjie Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, S., Li, Y., Lou, Y. et al. Aggressive and robust low-level control and trajectory tracking for quadrotors with deep reinforcement learning. Neural Comput & Applic 37, 1223–1240 (2025). https://doi.org/10.1007/s00521-024-10675-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-10675-4

Keywords