Aggressive and robust low-level control and trajectory tracking for quadrotors with deep reinforcement learning

Chen, Shiyu; Li, Yanjie; Lou, Yunjiang; Lin, Ke

doi:10.1007/s00521-024-10675-4

Aggressive and robust low-level control and trajectory tracking for quadrotors with deep reinforcement learning

Original Article
Published: 18 November 2024

Volume 37, pages 1223–1240, (2025)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Shiyu Chen^1,2,
Yanjie Li^1,2,
Yunjiang Lou^1,2 &
…
Ke Lin^1,2

211 Accesses
Explore all metrics

Abstract

Executing accurate trajectory tracking tasks using a high-performance low-level controller is crucial for quadrotors to be applied in various scenarios, especially those involving uncertain disturbances. However, due to the uncertainties in disturbed environments, developing effective low-level controllers with traditional model-based control schemes is challenging. This paper presents an aggressive and robust reinforcement learning (RL)-based low-level control policy for quadrotors. The policy maps the observed quadrotor state directly to motor thrust commands, without requiring the quadrotor dynamics. Additionally, a trajectory generation pipeline is developed to improve the accuracy of trajectory tracking tasks based on differential flatness. With the learned low-level control policy, extensive simulations and real-world experiments are implemented to validate the performance of the policy. The results indicate that our RL-based low-level control policy outperforms traditional proportional–integral–derivative (PID) control methods and related learning-based policies in terms of accuracy and robustness, particularly in environments with uncertain disturbances. Furthermore, the proposed RL-based control policy exhibits an aggressive response in trajectory tracking, even when the speed of the desired trajectory is increased to 6 m/s. Moreover, the learned policy demonstrates strong vibration suppression capabilities and enables the quadrotor to recover to a hovering state from random initial conditions with shorter response time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Re-planning of Quadrotors Under Disturbance Based on Meta Reinforcement Learning

Article 17 January 2023

A Deep Reinforcement Learning Approach for Quadrotor Path Planning with Search-Based Planner Optimization

A Velocity Controller for Quadrotors Based on Reinforcement Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data that support the findings of this study are available from the Department of Control Science and Engineering, Harbin Institute of Technology Shenzhen. Restrictions apply to the availability of these data, which were used under license for the current study and so are not publicly available. However, data are available from the authors upon reasonable request.

References

Idrissi M, Salami M, Annaz F (2022) A review of quadrotor unmanned aerial vehicles: applications, architectural design and control algorithms. J Intell Robotic Syst 104(2):22. https://doi.org/10.1007/s10846-021-01527-7
Article MATH Google Scholar
Gupte S, Mohandas PIT, Conrad JM (2012) A survey of quadrotor unmanned aerial vehicles. In: Proceedings of IEEE Southeastcon, IEEE, pp 1–6, https://doi.org/10.1109/SECon.2012.6196930
Choutri K, Lagha M, Dala L (2020) A fully autonomous search and rescue system using quadrotor UAV. Int J Comput Digit Syst 10:2–12
MATH Google Scholar
Zhang Y, Yuan X, Li W et al (2017) Automatic power line inspection using UAV images. Remote Sens 9(8):824. https://doi.org/10.3390/rs9080824
Article MATH Google Scholar
Xing J, Cioffi G, Hidalgo-Carrió J, et al (2023) Autonomous power line inspection with drones via perception-aware MPC. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 1086–1093, https://doi.org/10.1109/IROS55552.2023.10341871
Škrinjar JP, Škorput P, Furdić M (2019) Application of unmanned aerial vehicles in logistic processes. Lecture Note Network and Syst 42:359–366. https://doi.org/10.1007/978-3-319-90893-9_43
Article Google Scholar
Giles DK, Billing RC (2015) Deployment and performance of a UAV for crop spraying. Chem Eng Trans 44:307–312. https://doi.org/10.3303/CET1544052
Article MATH Google Scholar
Hafeez A, Husain MA, Singh SP et al (2023) Implementation of drone technology for farm monitoring & pesticide spraying: a review. Inform Process Agri 10(2):192–203. https://doi.org/10.1016/j.inpa.2022.02.002
Article MATH Google Scholar
Salih AL, Moghavvemi M, Mohamed HA et al (2010) Flight PID controller design for a UAV quadrotor. Sci. Res Essays 5(23):3660–3667
MATH Google Scholar
Pounds PEI, Bersak DR, Dollar A (2012) Stability of small-scale uav helicopters and quadrotors with added payload mass under pid control. Auton Robot 33:129–142. https://doi.org/10.1007/s10514-012-9280-5
Article MATH Google Scholar
Koch WF, Bestavros A (2019) Flight controller synthesis via deep reinforcement learning. PhD thesis, Boston University, https://hdl.handle.net/2144/39552
Yang W, Jiang Y, He X et al (2023) Feasibility conditions-free prescribed performance decentralized fault-tolerant neural control of constrained large-scale systems. IEEE Trans Syst Man, Cyber: Syst 53(5):3152–3164. https://doi.org/10.1109/TSMC.2022.3222857
Article MATH Google Scholar
Shan H, Jiang Y, Liang H et al (2024) Fuzzy adaptive containment control for nonlinear multi-manipulator systems with actuator faults and predefined accuracy. IEEE Trans Netw Sci Eng 11(2):1510–1523. https://doi.org/10.1109/TNSE.2023.3325002
Article MathSciNet MATH Google Scholar
Goodarzi F, Lee D, Lee T (2013) Geometric nonlinear PID control of a quadrotor UAV on SE(3). In: European Control Conference. IEEE, pp 3845–3850, https://doi.org/10.23919/ECC.2013.6669644
Raffo GV, Ortega MG, Rubio FR (2011) Nonlinear $\text{ H}_\infty$ controller for the quad-rotor helicopter with input coupling, vol 44. IFAC. https://doi.org/10.3182/20110828-6-IT-1002.02453
Article MATH Google Scholar
Tal E, Karaman S (2021) Accurate tracking of aggressive quadrotor trajectories using incremental nonlinear dynamic inversion and differential flatness. IEEE Trans Control Syst Technol 29(3):1203–1218. https://doi.org/10.1109/TCST.2020.3001117
Article MATH Google Scholar
Hanover D, Foehn P, Sun S et al (2022) Performance, precision, and payloads: adaptive nonlinear MPC for quadrotors. IEEE Robot Automat Lett 7(2):690–697. https://doi.org/10.1109/LRA.2021.3131690
Article Google Scholar
Santoso F, Garratt MA, Anavatti SG (2018) State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Trans Automat Sci Eng 15(2):613–627. https://doi.org/10.1109/TASE.2017.2651109
Article MATH Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT Press
MATH Google Scholar
Li S, Liu T, Zhang C, et al (2018) Learning unmanned aerial vehicle control for autonomous target following. In: International Joint Conference on Artificial Intelligence, vol 2018-July. IJCAI, pp 4936–4942, https://doi.org/10.24963/ijcai.2018/685
Polvara R, Patacchiola M, Sharma S, et al (2017) Autonomous quadrotor landing using deep reinforcement learning. https://doi.org/10.48550/arXiv.1709.03339
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274. https://doi.org/10.1177/0278364913495721
Article MATH Google Scholar
Chen S, Li Y, Lou Y et al (2023) Learning real-time dynamic responsive gap-traversing policy for quadrotors with safety-aware exploration. IEEE Trans Intell Veh 8(3):2271–2284. https://doi.org/10.1109/TIV.2022.3229723
Article MATH Google Scholar
Bauersfeld L, Kaufmann E, Scaramuzza D (2023) User-conditioned neural control policies for mobile robotics. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1342–1348, https://doi.org/10.1109/ICRA48891.2023.10160851
Koch W, Mancuso R, West R et al (2019) Reinforcement learning for UAV attitude control. ACM Trans Cyber-Physical Syst 3(2):1–21. https://doi.org/10.1145/3301273
Article MATH Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. https://doi.org/10.48550/arXiv.1509.02971
Schulman J, Levine S, Moritz P, et al (2015) Trust region policy optimization. In: International Conference on Machine Learning, vol 37. PMLR, Lille, France, pp 1889–1897, https://proceedings.mlr.press/v37/schulman15.html
Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. https://doi.org/10.48550/arXiv.1707.06347
Hwangbo J, Sa I, Siegwart R et al (2017) Control of a quadrotor with reinforcement learning. IEEE Robot Automat Lett 2(4):2096–2103. https://doi.org/10.1109/LRA.2017.2720851
Article MATH Google Scholar
Lopes GC, Ferreira M, Da Silva Simoes A, et al (2018) Intelligent control of a quadrotor with proximal policy optimization reinforcement learning. In: 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE). IEEE, pp 509–514, https://doi.org/10.1109/LARS/SBR/WRE.2018.00094
Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, vol 80. PMLR, pp 1861–1870, https://proceedings.mlr.press/v80/haarnoja18b.html
Barros GM, Colombini EL (2020) Using soft actor-critic for low-Level UAV control. https://doi.org/10.48550/arXiv.2010.02293
Wang Y, Sun J, He H et al (2020) Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Trans Syst Man Cybern Syst 50(10):3713–3725. https://doi.org/10.1109/TSMC.2018.2884725
Article MATH Google Scholar
Li Y, Li H, Li Z, et al (2019) Fast and accurate trajectory tracking for unmanned aerial vehicles based on deep reinforcement learning. In: International Conference on Embedded and Real-Time Computing Systems and Applications. IEEE, https://doi.org/10.1109/RTCSA.2019.8864571
Pi CH, Hu KC, Cheng S et al (2020) Low-level autonomous control and tracking of quadrotor using reinforcement learning. Control Eng Pract 95:104222. https://doi.org/10.1016/j.conengprac.2019.104222
Article MATH Google Scholar
Puterman ML (2014) Markov decision processes.: Discrete stochastic dynamic programming. John Wiley Sons, https://doi.org/10.1002/9780470316887
Nair V, Hinton GE (2010) Rectified linear units improve Restricted Boltzmann machines. In: International Conference on Machine Learning, pp 807–814, https://doi.org/10.5555/3104322.3104425
Kalman B, Kwasny S (1992) Why tanh: Choosing a sigmoidal function. In: International Joint Conference on Neural Networks, vol 4. IEEE, pp 578–581, https://doi.org/10.1109/IJCNN.1992.227257
Schulman J, Moritz P, Levine S, et al (2016) High-dimensional continuous control using generalized advantage estimation. In: International Conference on Learning Representations, https://doi.org/10.48550/arXiv.1506.02438
Murray RM, Rathinam M, Sluis W (1995) Differential flatness of mechanical control systems: A catalog of prototype systems. In: ASME international mechanical engineering congress and exposition, Citeseer, https://www.cds.caltech.edu/~murray/preprints/mrs95-imece.pdf
Mellinger D, Kumar V (2011) Minimum snap trajectory generation and control for quadrotors. In: IEEE International Conference on Robotics and Automation. IEEE, pp 2520–2525, https://doi.org/10.1109/ICRA.2011.5980409
Faessler M, Franchi A, Scaramuzza D (2018) Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high-speed trajectories. IEEE Robot Automat Lett 3(2):620–626. https://doi.org/10.1109/LRA.2017.2776353
Article MATH Google Scholar
Song Y, Naji S, Kaufmann E, et al (2021) Flightmare: A flexible quadrotor simulator. In: Conference on Robot Learning, vol 155. PMLR, pp 1147–1157, https://proceedings.mlr.press/v155/song21a
Abadi M, Barham P, Chen J, et al (2016) Tensorflow: A system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, Savannah, GA, pp 265–283, https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
Sola J (2017) Quaternion kinematics for the error-state Kalman filter. https://doi.org/10.48550/arXiv.1711.02508

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61977019; and in part by the Shenzhen Fundamental Research Program under Grant JCYJ20220818102415033, Grant JSGG20201103093802006 and Grant KJZD20230923114222045.

Author information

Authors and Affiliations

Department of Control Science and Engineering, Harbin Institute of Technology Shenzhen, Shenzhen, 518055, China
Shiyu Chen, Yanjie Li, Yunjiang Lou & Ke Lin
Guangdong Key Laboratory of Intelligent Morphing Mechanisms and Adaptive Robotics, Shenzhen, 518055, China
Shiyu Chen, Yanjie Li, Yunjiang Lou & Ke Lin

Authors

Shiyu Chen
View author publications
You can also search for this author inPubMed Google Scholar
Yanjie Li
View author publications
You can also search for this author inPubMed Google Scholar
Yunjiang Lou
View author publications
You can also search for this author inPubMed Google Scholar
Ke Lin
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yanjie Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, S., Li, Y., Lou, Y. et al. Aggressive and robust low-level control and trajectory tracking for quadrotors with deep reinforcement learning. Neural Comput & Applic 37, 1223–1240 (2025). https://doi.org/10.1007/s00521-024-10675-4

Download citation

Received: 25 January 2024
Accepted: 07 October 2024
Published: 18 November 2024
Issue Date: January 2025
DOI: https://doi.org/10.1007/s00521-024-10675-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aggressive and robust low-level control and trajectory tracking for quadrotors with deep reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Re-planning of Quadrotors Under Disturbance Based on Meta Reinforcement Learning

A Deep Reinforcement Learning Approach for Quadrotor Path Planning with Search-Based Planner Optimization

A Velocity Controller for Quadrotors Based on Reinforcement Learning

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now