Disturbance rejection and high dynamic quadrotor control based on reinforcement learning and supervised learning

Li, Mingjun; Cai, Zhihao; Zhao, Jiang; Wang, Jinyan; Wang, Yingxun

doi:10.1007/s00521-022-07033-7

Disturbance rejection and high dynamic quadrotor control based on reinforcement learning and supervised learning

Original Article
Published: 22 February 2022

Volume 34, pages 11141–11161, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Mingjun Li¹,
Zhihao Cai¹,
Jiang Zhao¹,
Jinyan Wang² &
…
Yingxun Wang¹

669 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we design and train a neural network controller for quadrotor attitude control to expand the application of quadrotors in more complex scenarios and challenging tasks. The neural network controller can allow the quadrotor to reject strong disturbance and realize high dynamic control. Because the quadrotor attitude control is a complex and high dimensional control problem, we propose a new framework that combines supervised learning and reinforcement learning (RL) to train the neural network controller. The neural network controller maps the states of the quadrotor to the control command of rotors in an end-to-end style. Besides, we propose the survival of the fittest principle for neural network preservation to obtain a better policy network during the RL training process. The numerical simulations demonstrate that: when the disturbance is more severe, the neural network controller trained by our method has better anti-disturbance ability than the proportion integration differentiation method and the incremental nonlinear dynamic inversion method, and the neural network controller supports high dynamic control to make the quadrotor achieves a large attitude angle.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Velocity Controller for Quadrotors Based on Reinforcement Learning

Attitude Control Based on Reinforcement Learning for Quadrotor

Deep Reinforcement Learning and L1 Adaptive Control Algorithm-Based Attitude Control of Fixed-Wing UAVs

References

MahmoudZadeh S, Yazdani A, Elmi A, Abbasi A, Ghanooni P (2021) Exploiting a fleet of UAVs for monitoring and data acquisition of a distributed sensor network. Neural Comput Appl. https://doi.org/10.1007/s00521-021-05906-x
Article Google Scholar
Geng L, Zhang YF, Wang JJ, Fuh JYH, Teo SH (2013) Mission planning of autonomous UAVs for urban surveillance with evolutionary algorithms. In:10th IEEE international conference on control and automation (ICCA), pp 828–833. https://doi.org/10.1109/ICCA.2013.6564992
Khosiawan Y, Park Y, Moon I, Nilakantan JM, Nielsen I (2018) Task scheduling system for UAV operations in indoor environment. Neural Comput Appl 31(9):5431–5459. https://doi.org/10.1007/s00521-018-3373-9
Article Google Scholar
Nagai M, Chen T, Shibasaki R, Kumagai H, Ahmed A (2009) UAV-borne 3-D mapping system by multisensor integration. IEEE T Geosci Remote 47(3):701–708. https://doi.org/10.1109/TGRS.2008.2010314
Article Google Scholar
Waharte S, Trigoni N (2010) Supporting search and rescue operations with UAVs. In: International conference on emerging security technologies (EST), Canterbury, UK, 6–7 Sept 2010, pp 142–147
Falanga D, Mueggler E, Faessler M, Scaramuzza D (2017) Aggressive quadrotor flight through narrow gaps with onboard sensing and computing using active vision. IEEE ICRA. https://doi.org/10.1109/ICRA.2017.7989679
Article Google Scholar
Maleki KN, Ashenayi K, Hook LR, Fuller JG, Hutchins N (2016) A reliable system design for nondeterministic adaptive controllers in small UAV autopilots. In: IEEE/AIAA 35th digital avionics systems conference (DASC). https://doi.org/10.1109/DASC.2016.7778103
Ortiz JP, Minchala LI, Reinoso MJ (2016) Nonlinear robust H-Infinity PID controller for the multivariable system quadrotor. IEEE Lat Am T 14(3):1176–1183. https://doi.org/10.1109/TLA.2016.7459596
Article Google Scholar
Bouabdallah S, Siegwart R (2005) Backstepping and sliding-mode techniques applied to an indoor micro quadrotor. In: IEEE international conference on robotics and automation, pp 2247–2252. https://doi.org/10.1109/ROBOT.2005.1570447
Lu P, Kampen EJV (2015) Active fault-tolerant control for quadrotors subjected to a complete rotor failure. In: IEEE/RSJ international conference on intelligent robots & systems, pp 4698–4703.https://doi.org/10.1109/IROS.2015.7354046
Santoso F, Garratt MA, Anavatti SG (2017) State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Trans Autom Sci Eng 15(2):613–627. https://doi.org/10.1109/TASE.2017.2651109
Article Google Scholar
Miglino O, Lund HH, Nolfi S (1995) Evolving mobile robots in simulated and real environments. Artif Life 2(4):417–434. https://doi.org/10.1162/artl.1995.2.4.417
Article Google Scholar
Sigaud O, Stulp F (2019) Policy search in continuous action domains: an overview. Neural Netw 113:28–40. https://doi.org/10.1016/j.neunet.2019.01.011
Article Google Scholar
Hwangbo J, Lee J, Dosovitskiy A et al (2019) Learning agile and dynamic motor skills for legged robots. Sci Robot. https://doi.org/10.1126/scirobotics.aau5872
Article Google Scholar
Ng AY, Kim HJ, Jordan MI, Sastry S (2003) Autonomous helicopter flight via reinforcement learning. Adv Neur In 16:799–806. https://doi.org/10.1007/11552246_35
Article Google Scholar
Abbeel P, Coates A, Ng AY (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robotics Res 29(13):1608–1639. https://doi.org/10.1177/0278364910371999
Article Google Scholar
Hwangbo J, Sa I, Siegwart R, Hutter M (2017) Control of a quadrotor With reinforcement learning. IEEE Robot Autom Let 2(4):2096–2103. https://doi.org/10.1109/LRA.2017.2720851
Article Google Scholar
Koch W, Mancuso R, West R, Bestavros A (2019) Reinforcement learning for UAV attitude control. ACM Trans Cyber-Phys Syst 3(2):1–21. https://doi.org/10.1145/3301273
Article Google Scholar
Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961
Article Google Scholar
Richard S, Andrew G (2018) Reinforcement learning: an introduction, 2nd edn. The MIT Press, Cambridge, MA
MATH Google Scholar
Schulman J, Levine S, Abbeel P et al (2015) Trust region policy optimization. In: Proceedings of the international conference on machine learning, pp 1889–1897. arXiv:1502.05477
Lillicrap TP, Hunt JJ, Pritzel A et al (2015) Continuous control with deep reinforcement learning. Comput Sci 8(6):A187. https://doi.org/10.1016/S1098-3015(10)67722-4
Article Google Scholar
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint. arXiv:1707.06347
Achiam J, Abbeel P (2020) Proximal policy optimization. https://spinningup.openai.com/en/latest/algorithms/ppo.html. Accessed 26 July 2021
Schulman J, Moritz P et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint. arXiv:1506.02438
Abadi M, Barham P, Chen JM et al (2016) TensorFlow: a system for large-scale machine learning. In: Proc 12th USENIX conf on operating systems design and implementation, pp 265–283. arXiv:1605.08695v2

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under grant No.61803009; the Fundamental Research Funds for the Central Universities under grant YWF-21-BJ-541; and the Aeronautical Science Foundation of China under grant No.20175851032.

Author information

Authors and Affiliations

School of Automation Science and Electrical Engineering, Beihang University, Beijing, 100191, China
Mingjun Li, Zhihao Cai, Jiang Zhao & Yingxun Wang
AVIC Aeronautical Radio Electronics Research Institute, Shanghai, 200235, China
Jinyan Wang

Authors

Mingjun Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhihao Cai
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jinyan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yingxun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiang Zhao.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, M., Cai, Z., Zhao, J. et al. Disturbance rejection and high dynamic quadrotor control based on reinforcement learning and supervised learning. Neural Comput & Applic 34, 11141–11161 (2022). https://doi.org/10.1007/s00521-022-07033-7

Download citation

Received: 04 August 2021
Accepted: 30 January 2022
Published: 22 February 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00521-022-07033-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Disturbance rejection and high dynamic quadrotor control based on reinforcement learning and supervised learning

Abstract

Access this article

Similar content being viewed by others

A Velocity Controller for Quadrotors Based on Reinforcement Learning

Attitude Control Based on Reinforcement Learning for Quadrotor

Deep Reinforcement Learning and L1 Adaptive Control Algorithm-Based Attitude Control of Fixed-Wing UAVs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Disturbance rejection and high dynamic quadrotor control based on reinforcement learning and supervised learning

Abstract

Access this article

Similar content being viewed by others

A Velocity Controller for Quadrotors Based on Reinforcement Learning

Attitude Control Based on Reinforcement Learning for Quadrotor

Deep Reinforcement Learning and L1 Adaptive Control Algorithm-Based Attitude Control of Fixed-Wing UAVs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation