Skip to main content
Log in

Disturbance rejection and high dynamic quadrotor control based on reinforcement learning and supervised learning

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, we design and train a neural network controller for quadrotor attitude control to expand the application of quadrotors in more complex scenarios and challenging tasks. The neural network controller can allow the quadrotor to reject strong disturbance and realize high dynamic control. Because the quadrotor attitude control is a complex and high dimensional control problem, we propose a new framework that combines supervised learning and reinforcement learning (RL) to train the neural network controller. The neural network controller maps the states of the quadrotor to the control command of rotors in an end-to-end style. Besides, we propose the survival of the fittest principle for neural network preservation to obtain a better policy network during the RL training process. The numerical simulations demonstrate that: when the disturbance is more severe, the neural network controller trained by our method has better anti-disturbance ability than the proportion integration differentiation method and the incremental nonlinear dynamic inversion method, and the neural network controller supports high dynamic control to make the quadrotor achieves a large attitude angle.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. MahmoudZadeh S, Yazdani A, Elmi A, Abbasi A, Ghanooni P (2021) Exploiting a fleet of UAVs for monitoring and data acquisition of a distributed sensor network. Neural Comput Appl. https://doi.org/10.1007/s00521-021-05906-x

    Article  Google Scholar 

  2. Geng L, Zhang YF, Wang JJ, Fuh JYH, Teo SH (2013) Mission planning of autonomous UAVs for urban surveillance with evolutionary algorithms. In:10th IEEE international conference on control and automation (ICCA), pp 828–833. https://doi.org/10.1109/ICCA.2013.6564992

  3. Khosiawan Y, Park Y, Moon I, Nilakantan JM, Nielsen I (2018) Task scheduling system for UAV operations in indoor environment. Neural Comput Appl 31(9):5431–5459. https://doi.org/10.1007/s00521-018-3373-9

    Article  Google Scholar 

  4. Nagai M, Chen T, Shibasaki R, Kumagai H, Ahmed A (2009) UAV-borne 3-D mapping system by multisensor integration. IEEE T Geosci Remote 47(3):701–708. https://doi.org/10.1109/TGRS.2008.2010314

    Article  Google Scholar 

  5. Waharte S, Trigoni N (2010) Supporting search and rescue operations with UAVs. In: International conference on emerging security technologies (EST), Canterbury, UK, 6–7 Sept 2010, pp 142–147

  6. Falanga D, Mueggler E, Faessler M, Scaramuzza D (2017) Aggressive quadrotor flight through narrow gaps with onboard sensing and computing using active vision. IEEE ICRA. https://doi.org/10.1109/ICRA.2017.7989679

    Article  Google Scholar 

  7. Maleki KN, Ashenayi K, Hook LR, Fuller JG, Hutchins N (2016) A reliable system design for nondeterministic adaptive controllers in small UAV autopilots. In: IEEE/AIAA 35th digital avionics systems conference (DASC). https://doi.org/10.1109/DASC.2016.7778103

  8. Ortiz JP, Minchala LI, Reinoso MJ (2016) Nonlinear robust H-Infinity PID controller for the multivariable system quadrotor. IEEE Lat Am T 14(3):1176–1183. https://doi.org/10.1109/TLA.2016.7459596

    Article  Google Scholar 

  9. Bouabdallah S, Siegwart R (2005) Backstepping and sliding-mode techniques applied to an indoor micro quadrotor. In: IEEE international conference on robotics and automation, pp 2247–2252. https://doi.org/10.1109/ROBOT.2005.1570447

  10. Lu P, Kampen EJV (2015) Active fault-tolerant control for quadrotors subjected to a complete rotor failure. In: IEEE/RSJ international conference on intelligent robots & systems, pp 4698–4703.https://doi.org/10.1109/IROS.2015.7354046

  11. Santoso F, Garratt MA, Anavatti SG (2017) State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Trans Autom Sci Eng 15(2):613–627. https://doi.org/10.1109/TASE.2017.2651109

    Article  Google Scholar 

  12. Miglino O, Lund HH, Nolfi S (1995) Evolving mobile robots in simulated and real environments. Artif Life 2(4):417–434. https://doi.org/10.1162/artl.1995.2.4.417

    Article  Google Scholar 

  13. Sigaud O, Stulp F (2019) Policy search in continuous action domains: an overview. Neural Netw 113:28–40. https://doi.org/10.1016/j.neunet.2019.01.011

    Article  Google Scholar 

  14. Hwangbo J, Lee J, Dosovitskiy A et al (2019) Learning agile and dynamic motor skills for legged robots. Sci Robot. https://doi.org/10.1126/scirobotics.aau5872

    Article  Google Scholar 

  15. Ng AY, Kim HJ, Jordan MI, Sastry S (2003) Autonomous helicopter flight via reinforcement learning. Adv Neur In 16:799–806. https://doi.org/10.1007/11552246_35

    Article  Google Scholar 

  16. Abbeel P, Coates A, Ng AY (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robotics Res 29(13):1608–1639. https://doi.org/10.1177/0278364910371999

    Article  Google Scholar 

  17. Hwangbo J, Sa I, Siegwart R, Hutter M (2017) Control of a quadrotor With reinforcement learning. IEEE Robot Autom Let 2(4):2096–2103. https://doi.org/10.1109/LRA.2017.2720851

    Article  Google Scholar 

  18. Koch W, Mancuso R, West R, Bestavros A (2019) Reinforcement learning for UAV attitude control. ACM Trans Cyber-Phys Syst 3(2):1–21. https://doi.org/10.1145/3301273

    Article  Google Scholar 

  19. Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961

    Article  Google Scholar 

  20. Richard S, Andrew G (2018) Reinforcement learning: an introduction, 2nd edn. The MIT Press, Cambridge, MA

    MATH  Google Scholar 

  21. Schulman J, Levine S, Abbeel P et al (2015) Trust region policy optimization. In: Proceedings of the international conference on machine learning, pp 1889–1897. arXiv:1502.05477

  22. Lillicrap TP, Hunt JJ, Pritzel A et al (2015) Continuous control with deep reinforcement learning. Comput Sci 8(6):A187. https://doi.org/10.1016/S1098-3015(10)67722-4

    Article  Google Scholar 

  23. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint. arXiv:1707.06347

  24. Achiam J, Abbeel P (2020) Proximal policy optimization. https://spinningup.openai.com/en/latest/algorithms/ppo.html. Accessed 26 July 2021

  25. Schulman J, Moritz P et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint. arXiv:1506.02438

  26. Abadi M, Barham P, Chen JM et al (2016) TensorFlow: a system for large-scale machine learning. In: Proc 12th USENIX conf on operating systems design and implementation, pp 265–283. arXiv:1605.08695v2

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under grant No.61803009; the Fundamental Research Funds for the Central Universities under grant YWF-21-BJ-541; and the Aeronautical Science Foundation of China under grant No.20175851032.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiang Zhao.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M., Cai, Z., Zhao, J. et al. Disturbance rejection and high dynamic quadrotor control based on reinforcement learning and supervised learning. Neural Comput & Applic 34, 11141–11161 (2022). https://doi.org/10.1007/s00521-022-07033-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07033-7

Keywords

Navigation