Abstract
Executing accurate trajectory tracking tasks using a high-performance low-level controller is crucial for quadrotors to be applied in various scenarios, especially those involving uncertain disturbances. However, due to the uncertainties in disturbed environments, developing effective low-level controllers with traditional model-based control schemes is challenging. This paper presents an aggressive and robust reinforcement learning (RL)-based low-level control policy for quadrotors. The policy maps the observed quadrotor state directly to motor thrust commands, without requiring the quadrotor dynamics. Additionally, a trajectory generation pipeline is developed to improve the accuracy of trajectory tracking tasks based on differential flatness. With the learned low-level control policy, extensive simulations and real-world experiments are implemented to validate the performance of the policy. The results indicate that our RL-based low-level control policy outperforms traditional proportional–integral–derivative (PID) control methods and related learning-based policies in terms of accuracy and robustness, particularly in environments with uncertain disturbances. Furthermore, the proposed RL-based control policy exhibits an aggressive response in trajectory tracking, even when the speed of the desired trajectory is increased to 6 m/s. Moreover, the learned policy demonstrates strong vibration suppression capabilities and enables the quadrotor to recover to a hovering state from random initial conditions with shorter response time.

















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support the findings of this study are available from the Department of Control Science and Engineering, Harbin Institute of Technology Shenzhen. Restrictions apply to the availability of these data, which were used under license for the current study and so are not publicly available. However, data are available from the authors upon reasonable request.
References
Idrissi M, Salami M, Annaz F (2022) A review of quadrotor unmanned aerial vehicles: applications, architectural design and control algorithms. J Intell Robotic Syst 104(2):22. https://doi.org/10.1007/s10846-021-01527-7
Gupte S, Mohandas PIT, Conrad JM (2012) A survey of quadrotor unmanned aerial vehicles. In: Proceedings of IEEE Southeastcon, IEEE, pp 1–6, https://doi.org/10.1109/SECon.2012.6196930
Choutri K, Lagha M, Dala L (2020) A fully autonomous search and rescue system using quadrotor UAV. Int J Comput Digit Syst 10:2–12
Zhang Y, Yuan X, Li W et al (2017) Automatic power line inspection using UAV images. Remote Sens 9(8):824. https://doi.org/10.3390/rs9080824
Xing J, Cioffi G, Hidalgo-Carrió J, et al (2023) Autonomous power line inspection with drones via perception-aware MPC. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 1086–1093, https://doi.org/10.1109/IROS55552.2023.10341871
Škrinjar JP, Škorput P, Furdić M (2019) Application of unmanned aerial vehicles in logistic processes. Lecture Note Network and Syst 42:359–366. https://doi.org/10.1007/978-3-319-90893-9_43
Giles DK, Billing RC (2015) Deployment and performance of a UAV for crop spraying. Chem Eng Trans 44:307–312. https://doi.org/10.3303/CET1544052
Hafeez A, Husain MA, Singh SP et al (2023) Implementation of drone technology for farm monitoring & pesticide spraying: a review. Inform Process Agri 10(2):192–203. https://doi.org/10.1016/j.inpa.2022.02.002
Salih AL, Moghavvemi M, Mohamed HA et al (2010) Flight PID controller design for a UAV quadrotor. Sci. Res Essays 5(23):3660–3667
Pounds PEI, Bersak DR, Dollar A (2012) Stability of small-scale uav helicopters and quadrotors with added payload mass under pid control. Auton Robot 33:129–142. https://doi.org/10.1007/s10514-012-9280-5
Koch WF, Bestavros A (2019) Flight controller synthesis via deep reinforcement learning. PhD thesis, Boston University, https://hdl.handle.net/2144/39552
Yang W, Jiang Y, He X et al (2023) Feasibility conditions-free prescribed performance decentralized fault-tolerant neural control of constrained large-scale systems. IEEE Trans Syst Man, Cyber: Syst 53(5):3152–3164. https://doi.org/10.1109/TSMC.2022.3222857
Shan H, Jiang Y, Liang H et al (2024) Fuzzy adaptive containment control for nonlinear multi-manipulator systems with actuator faults and predefined accuracy. IEEE Trans Netw Sci Eng 11(2):1510–1523. https://doi.org/10.1109/TNSE.2023.3325002
Goodarzi F, Lee D, Lee T (2013) Geometric nonlinear PID control of a quadrotor UAV on SE(3). In: European Control Conference. IEEE, pp 3845–3850, https://doi.org/10.23919/ECC.2013.6669644
Raffo GV, Ortega MG, Rubio FR (2011) Nonlinear \(\text{ H}_\infty\) controller for the quad-rotor helicopter with input coupling, vol 44. IFAC. https://doi.org/10.3182/20110828-6-IT-1002.02453
Tal E, Karaman S (2021) Accurate tracking of aggressive quadrotor trajectories using incremental nonlinear dynamic inversion and differential flatness. IEEE Trans Control Syst Technol 29(3):1203–1218. https://doi.org/10.1109/TCST.2020.3001117
Hanover D, Foehn P, Sun S et al (2022) Performance, precision, and payloads: adaptive nonlinear MPC for quadrotors. IEEE Robot Automat Lett 7(2):690–697. https://doi.org/10.1109/LRA.2021.3131690
Santoso F, Garratt MA, Anavatti SG (2018) State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Trans Automat Sci Eng 15(2):613–627. https://doi.org/10.1109/TASE.2017.2651109
Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT Press
Li S, Liu T, Zhang C, et al (2018) Learning unmanned aerial vehicle control for autonomous target following. In: International Joint Conference on Artificial Intelligence, vol 2018-July. IJCAI, pp 4936–4942, https://doi.org/10.24963/ijcai.2018/685
Polvara R, Patacchiola M, Sharma S, et al (2017) Autonomous quadrotor landing using deep reinforcement learning. https://doi.org/10.48550/arXiv.1709.03339
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274. https://doi.org/10.1177/0278364913495721
Chen S, Li Y, Lou Y et al (2023) Learning real-time dynamic responsive gap-traversing policy for quadrotors with safety-aware exploration. IEEE Trans Intell Veh 8(3):2271–2284. https://doi.org/10.1109/TIV.2022.3229723
Bauersfeld L, Kaufmann E, Scaramuzza D (2023) User-conditioned neural control policies for mobile robotics. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1342–1348, https://doi.org/10.1109/ICRA48891.2023.10160851
Koch W, Mancuso R, West R et al (2019) Reinforcement learning for UAV attitude control. ACM Trans Cyber-Physical Syst 3(2):1–21. https://doi.org/10.1145/3301273
Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. https://doi.org/10.48550/arXiv.1509.02971
Schulman J, Levine S, Moritz P, et al (2015) Trust region policy optimization. In: International Conference on Machine Learning, vol 37. PMLR, Lille, France, pp 1889–1897, https://proceedings.mlr.press/v37/schulman15.html
Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. https://doi.org/10.48550/arXiv.1707.06347
Hwangbo J, Sa I, Siegwart R et al (2017) Control of a quadrotor with reinforcement learning. IEEE Robot Automat Lett 2(4):2096–2103. https://doi.org/10.1109/LRA.2017.2720851
Lopes GC, Ferreira M, Da Silva Simoes A, et al (2018) Intelligent control of a quadrotor with proximal policy optimization reinforcement learning. In: 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE). IEEE, pp 509–514, https://doi.org/10.1109/LARS/SBR/WRE.2018.00094
Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, vol 80. PMLR, pp 1861–1870, https://proceedings.mlr.press/v80/haarnoja18b.html
Barros GM, Colombini EL (2020) Using soft actor-critic for low-Level UAV control. https://doi.org/10.48550/arXiv.2010.02293
Wang Y, Sun J, He H et al (2020) Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Trans Syst Man Cybern Syst 50(10):3713–3725. https://doi.org/10.1109/TSMC.2018.2884725
Li Y, Li H, Li Z, et al (2019) Fast and accurate trajectory tracking for unmanned aerial vehicles based on deep reinforcement learning. In: International Conference on Embedded and Real-Time Computing Systems and Applications. IEEE, https://doi.org/10.1109/RTCSA.2019.8864571
Pi CH, Hu KC, Cheng S et al (2020) Low-level autonomous control and tracking of quadrotor using reinforcement learning. Control Eng Pract 95:104222. https://doi.org/10.1016/j.conengprac.2019.104222
Puterman ML (2014) Markov decision processes.: Discrete stochastic dynamic programming. John Wiley Sons, https://doi.org/10.1002/9780470316887
Nair V, Hinton GE (2010) Rectified linear units improve Restricted Boltzmann machines. In: International Conference on Machine Learning, pp 807–814, https://doi.org/10.5555/3104322.3104425
Kalman B, Kwasny S (1992) Why tanh: Choosing a sigmoidal function. In: International Joint Conference on Neural Networks, vol 4. IEEE, pp 578–581, https://doi.org/10.1109/IJCNN.1992.227257
Schulman J, Moritz P, Levine S, et al (2016) High-dimensional continuous control using generalized advantage estimation. In: International Conference on Learning Representations, https://doi.org/10.48550/arXiv.1506.02438
Murray RM, Rathinam M, Sluis W (1995) Differential flatness of mechanical control systems: A catalog of prototype systems. In: ASME international mechanical engineering congress and exposition, Citeseer, https://www.cds.caltech.edu/~murray/preprints/mrs95-imece.pdf
Mellinger D, Kumar V (2011) Minimum snap trajectory generation and control for quadrotors. In: IEEE International Conference on Robotics and Automation. IEEE, pp 2520–2525, https://doi.org/10.1109/ICRA.2011.5980409
Faessler M, Franchi A, Scaramuzza D (2018) Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high-speed trajectories. IEEE Robot Automat Lett 3(2):620–626. https://doi.org/10.1109/LRA.2017.2776353
Song Y, Naji S, Kaufmann E, et al (2021) Flightmare: A flexible quadrotor simulator. In: Conference on Robot Learning, vol 155. PMLR, pp 1147–1157, https://proceedings.mlr.press/v155/song21a
Abadi M, Barham P, Chen J, et al (2016) Tensorflow: A system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, Savannah, GA, pp 265–283, https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
Sola J (2017) Quaternion kinematics for the error-state Kalman filter. https://doi.org/10.48550/arXiv.1711.02508
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61977019; and in part by the Shenzhen Fundamental Research Program under Grant JCYJ20220818102415033, Grant JSGG20201103093802006 and Grant KJZD20230923114222045.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, S., Li, Y., Lou, Y. et al. Aggressive and robust low-level control and trajectory tracking for quadrotors with deep reinforcement learning. Neural Comput & Applic 37, 1223–1240 (2025). https://doi.org/10.1007/s00521-024-10675-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-10675-4