Abstract
In this paper, we design and train a neural network controller for quadrotor attitude control to expand the application of quadrotors in more complex scenarios and challenging tasks. The neural network controller can allow the quadrotor to reject strong disturbance and realize high dynamic control. Because the quadrotor attitude control is a complex and high dimensional control problem, we propose a new framework that combines supervised learning and reinforcement learning (RL) to train the neural network controller. The neural network controller maps the states of the quadrotor to the control command of rotors in an end-to-end style. Besides, we propose the survival of the fittest principle for neural network preservation to obtain a better policy network during the RL training process. The numerical simulations demonstrate that: when the disturbance is more severe, the neural network controller trained by our method has better anti-disturbance ability than the proportion integration differentiation method and the incremental nonlinear dynamic inversion method, and the neural network controller supports high dynamic control to make the quadrotor achieves a large attitude angle.
Similar content being viewed by others
References
MahmoudZadeh S, Yazdani A, Elmi A, Abbasi A, Ghanooni P (2021) Exploiting a fleet of UAVs for monitoring and data acquisition of a distributed sensor network. Neural Comput Appl. https://doi.org/10.1007/s00521-021-05906-x
Geng L, Zhang YF, Wang JJ, Fuh JYH, Teo SH (2013) Mission planning of autonomous UAVs for urban surveillance with evolutionary algorithms. In:10th IEEE international conference on control and automation (ICCA), pp 828–833. https://doi.org/10.1109/ICCA.2013.6564992
Khosiawan Y, Park Y, Moon I, Nilakantan JM, Nielsen I (2018) Task scheduling system for UAV operations in indoor environment. Neural Comput Appl 31(9):5431–5459. https://doi.org/10.1007/s00521-018-3373-9
Nagai M, Chen T, Shibasaki R, Kumagai H, Ahmed A (2009) UAV-borne 3-D mapping system by multisensor integration. IEEE T Geosci Remote 47(3):701–708. https://doi.org/10.1109/TGRS.2008.2010314
Waharte S, Trigoni N (2010) Supporting search and rescue operations with UAVs. In: International conference on emerging security technologies (EST), Canterbury, UK, 6–7 Sept 2010, pp 142–147
Falanga D, Mueggler E, Faessler M, Scaramuzza D (2017) Aggressive quadrotor flight through narrow gaps with onboard sensing and computing using active vision. IEEE ICRA. https://doi.org/10.1109/ICRA.2017.7989679
Maleki KN, Ashenayi K, Hook LR, Fuller JG, Hutchins N (2016) A reliable system design for nondeterministic adaptive controllers in small UAV autopilots. In: IEEE/AIAA 35th digital avionics systems conference (DASC). https://doi.org/10.1109/DASC.2016.7778103
Ortiz JP, Minchala LI, Reinoso MJ (2016) Nonlinear robust H-Infinity PID controller for the multivariable system quadrotor. IEEE Lat Am T 14(3):1176–1183. https://doi.org/10.1109/TLA.2016.7459596
Bouabdallah S, Siegwart R (2005) Backstepping and sliding-mode techniques applied to an indoor micro quadrotor. In: IEEE international conference on robotics and automation, pp 2247–2252. https://doi.org/10.1109/ROBOT.2005.1570447
Lu P, Kampen EJV (2015) Active fault-tolerant control for quadrotors subjected to a complete rotor failure. In: IEEE/RSJ international conference on intelligent robots & systems, pp 4698–4703.https://doi.org/10.1109/IROS.2015.7354046
Santoso F, Garratt MA, Anavatti SG (2017) State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Trans Autom Sci Eng 15(2):613–627. https://doi.org/10.1109/TASE.2017.2651109
Miglino O, Lund HH, Nolfi S (1995) Evolving mobile robots in simulated and real environments. Artif Life 2(4):417–434. https://doi.org/10.1162/artl.1995.2.4.417
Sigaud O, Stulp F (2019) Policy search in continuous action domains: an overview. Neural Netw 113:28–40. https://doi.org/10.1016/j.neunet.2019.01.011
Hwangbo J, Lee J, Dosovitskiy A et al (2019) Learning agile and dynamic motor skills for legged robots. Sci Robot. https://doi.org/10.1126/scirobotics.aau5872
Ng AY, Kim HJ, Jordan MI, Sastry S (2003) Autonomous helicopter flight via reinforcement learning. Adv Neur In 16:799–806. https://doi.org/10.1007/11552246_35
Abbeel P, Coates A, Ng AY (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robotics Res 29(13):1608–1639. https://doi.org/10.1177/0278364910371999
Hwangbo J, Sa I, Siegwart R, Hutter M (2017) Control of a quadrotor With reinforcement learning. IEEE Robot Autom Let 2(4):2096–2103. https://doi.org/10.1109/LRA.2017.2720851
Koch W, Mancuso R, West R, Bestavros A (2019) Reinforcement learning for UAV attitude control. ACM Trans Cyber-Phys Syst 3(2):1–21. https://doi.org/10.1145/3301273
Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961
Richard S, Andrew G (2018) Reinforcement learning: an introduction, 2nd edn. The MIT Press, Cambridge, MA
Schulman J, Levine S, Abbeel P et al (2015) Trust region policy optimization. In: Proceedings of the international conference on machine learning, pp 1889–1897. arXiv:1502.05477
Lillicrap TP, Hunt JJ, Pritzel A et al (2015) Continuous control with deep reinforcement learning. Comput Sci 8(6):A187. https://doi.org/10.1016/S1098-3015(10)67722-4
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint. arXiv:1707.06347
Achiam J, Abbeel P (2020) Proximal policy optimization. https://spinningup.openai.com/en/latest/algorithms/ppo.html. Accessed 26 July 2021
Schulman J, Moritz P et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint. arXiv:1506.02438
Abadi M, Barham P, Chen JM et al (2016) TensorFlow: a system for large-scale machine learning. In: Proc 12th USENIX conf on operating systems design and implementation, pp 265–283. arXiv:1605.08695v2
Acknowledgements
This work was supported by the National Natural Science Foundation of China under grant No.61803009; the Fundamental Research Funds for the Central Universities under grant YWF-21-BJ-541; and the Aeronautical Science Foundation of China under grant No.20175851032.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, M., Cai, Z., Zhao, J. et al. Disturbance rejection and high dynamic quadrotor control based on reinforcement learning and supervised learning. Neural Comput & Applic 34, 11141–11161 (2022). https://doi.org/10.1007/s00521-022-07033-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07033-7