Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization

Li, Lun; Zhang, Xuebo; Qian, Chenxu; Wang, Runhua

doi:10.1007/s00521-023-08232-6

Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization

Original Article
Published: 31 January 2023

Volume 35, pages 10239–10255, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Lun Li^1,2,
Xuebo Zhang^1,2,
Chenxu Qian^1,2 &
…
Runhua Wang^1,2

570 Accesses
3 Citations
Explore all metrics

Abstract

Autonomous agile flight control has been a challenging problem due to complex highly nonlinear dynamics, and generating feasible basic flight maneuvers off-line for subsequent online motion planning has become a solution. In this paper, we present a novel reinforcement learning-based basic flight maneuvers generation method for a 6 degrees of freedom aircraft model via Proximal Policy Optimization (PPO). Different from traditional control methods depending on model simplification or complex controller design, the proposed algorithm can automatically generate maneuvers by directly selecting different aircrafts and adding or subtracting reward components. First, we propose a new approach to ensure the continuity and avoid large oscillations of the control command by designing its time derivative as the output of policy network and then using the integral operator to obtain the final control action exerted to the plane, which is effective to achieve complex flight control tasks based on the aircraft model with high-fidelity. Second, the reward function used in PPO training is comprised of desired aims of which the weights can be adaptive according to the task type or conditional trigger during training. By this method, we successfully generate most of the basic flight maneuvers, including level flight, coordinated turn, climb/descent and horizontal roll. A series of simulation results show that our proposed algorithm can not only learn these maneuvers in a short time within 0.2–10 h but also has superior performance in settling time and robustness compared with the traditional PID control method while attaining similar accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Extensive Application of Model Predictive Control Combined with Policy Search to Multi-agent Agile UAV Flight

Morphing aircraft acceleration and deceleration task morphing strategy using a reinforcement learning method

Article 26 August 2023

Learning motion primitives for planning swift maneuvers of quadrotor

Article 14 January 2019

Data availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34:5649–5664
Article Google Scholar
Altameem T, Amoon M, Altameem A (2022) A deep reinforcement learning process based on robotic training to assist mental health patients. Neural Comput Appl 34(13):10587–10596
Article Google Scholar
Beard RW, McLain TW (2012) Small unmanned aircraft: theory and practice. Princeton University Press, Princeton
Book Google Scholar
Berndt JS (2011) Jsbsim, an open source platform independent flight dynamics model in c++. JSBSim Ref Man v1 0: 4–71
Bohez S, Abdolmaleki A, Neunert M, et al (2019) Value constrained model-free continuous control. Preprint arXiv:1902.04623
Bøhn E, Coates EM, Moe S, et al (2019) Deep reinforcement learning attitude control of fixed-wing UAVs using proximal policy optimization. In: 2019 international conference on unmanned aircraft systems (ICUAS), pp 523–533
Bulka E, Nahon M (2017) Autonomous control of agile fixed-wing UAVs performing aerobatic maneuvers. In: 2017 international conference on unmanned aircraft systems (ICUAS), pp 104–113
Bulka E, Nahon M (2019) Automatic control for aerobatic maneuvering of agile fixed-wing UAVs. J Intell Robot Syst 93(1):85–100
Article Google Scholar
Chang-Joo KIM (2020) Implementation of tactical maneuvers with maneuver libraries. Chin J Aeronaut 33(1):255–270
Article Google Scholar
Clark J, Amodei D (2016) Faulty reward functions in the wild. Internet: https://blogopenaicom/faulty-reward-functions
Clarke SG, Hwang I (2020) Deep reinforcement learning control for aerobatic maneuvering of agile fixed-wing aircraft. In: AIAA Scitech 2020 Forum, p 136
Engstrom L, Ilyas A, Santurkar S, et al (2019) Implementation matters in deep rl: a case study on ppo and trpo. In: International conference on learning representations
Engstrom L, Ilyas A, Santurkar S, et al (2020) Implementation matters in deep policy gradients: a case study on ppo and trpo. Preprint arXiv:2005.12729
Frank A, McGrew J, Valenti M, et al (2007) Hover, transition, and level flight control design for a single-propeller indoor airplane. In: AIAA guidance, navigation and control conference and exhibit, p 6318
Green WE, Oh PY (2005) A MAV that flies like an airplane and hovers like a helicopter. In: Proceedings, 2005 IEEE/ASME international conference on advanced intelligent mechatronics, pp 693–698
Gu S, Lillicrap T, Sutskever I, et al (2016) Continuous deep q-learning with model-based acceleration. In: International conference on machine learning, pp 2829–2838
Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, Sweden, pp 1861–1870
Hammond M (2017) Deep reinforcement learning in the enterprise: bridging the gap from games to industry. In: Artificial intelligence conference presentation
He S, Zhang M, Fang H et al (2020) Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information. Neural Comput Appl 32(18):14311–14320
Article Google Scholar
Hull DG (2007) Fundamentals of airplane flight mechanics. Springer, Cham
MATH Google Scholar
Karimi J, Pourtakdoust SH (2013) Integrated motion planning and trajectory control system for unmanned air vehicles. Proc Inst Mech Eng Part G J Aerosp Eng 227(1):3–18
Article Google Scholar
Kevin R (2014) Regret-based reward elicitation for Markov decision processes. PhD thesis, University of Toronto (Canada)
Khan W, Nahon M (2016) Modeling dynamics of agile fixed-wing UAVs for real-time applications. In: 2016 international conference on unmanned aircraft systems (ICUAS), pp 1303–1312
Kim CJ, Heo MJ, Hwang JW, et al (2020) Development of real-time maneuver library generation technique for implementing tactical maneuvers of fixed-wing aircraft. Int J Aerosp Eng 2020
Koch W, Mancuso R, West R et al (2019) Reinforcement learning for UAV attitude control. ACM Trans Cyber-Phys Syst 3(2):1–21
Article Google Scholar
Levin JM, Nahon M, Paranjape AA (2019) Real-time motion planning with a fixed-wing UAV using an agile maneuver space. Auton Robot 43(8):2111–2130
Article Google Scholar
Li B, Zhou W, Sun J et al (2018) Development of model predictive controller for a Tail-Sitter VTOL UAV in hover flight. Sensors 18(9):2859
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. Preprint arXiv:1509.02971
Lixin W, Youguang GUO, Zhang Q et al (2017) Suggestion for aircraft flying qualities requirements of a short-range air combat mission. Chin J Aeronaut 30(3):881–897
Article Google Scholar
Man Z, Huang GB (2020) Special issue on extreme learning machine and deep learning networks. Neural Comput Appl 32(18):14241–14245
Article Google Scholar
McDonnell RJ (1990) Investigation of the high angle of attack dynamics of the f-15b using bifurcation analysis. Technical report
Moore J, Cory R, Tedrake R (2014) Robust post-stall perching with a simple fixed-wing glider using LQR-trees. Bioinspir Biomim 9(2):25013
Article Google Scholar
Moorhouse DJ, Woodcock RJ (1982) Background information and user guide for mil-f-8785c, military specification-flying qualities of piloted airplanes. Technical report
Ng AY, Coates A, Diel M, et al (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Experimental robotics IX. Springer, pp 363–372
Perry AR (2004) The flightgear flight simulator. In: Proceedings of the USENIX annual technical conference
Randløv J, Alstrøm P (1998) Learning to drive a bicycle using reinforcement learning and shaping. In: ICML, pp 463–471
Rennie G (2018) Autonomous control of simulated fixed wing aircraft using deep reinforcement learning. Master’s thesis, University of BATH
Roberts JW, Cory R, Tedrake R (2009) On the controllability of fixed-wing perching. In: 2009 American control conference, pp 2018–2023
Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. Preprint arXiv:1506.02438
Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. Preprint arXiv:1707.06347
Selig MS (2014) Real-time flight simulation of highly maneuverable unmanned aerial vehicles. J Aircr 51(6):1705–1725
Article Google Scholar
Shao J, Lin H, Zhang K (2014) Swarm robots reinforcement learning convergence accuracy-based learning classifier systems with gradient descent (XCS-GD). Neural Comput Appl 25(2):263–268
Article Google Scholar
Smith D, Valasek J (2001) Agility metric robustness using linear error theory. J Guid Control Dyn 24(2):340–351
Article Google Scholar
Sobolic FM (2009) Agile flight control techniques for a fixed-wing aircraft
Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT press, Cambridge
Book MATH Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge
MATH Google Scholar
Tang C, Lai YC (2020) Deep reinforcement learning automatic landing control of fixed-wing aircraft using deep deterministic policy gradient. In: 2020 international conference on unmanned aircraft systems (ICUAS), pp 1–9
Tucker G, Bhupatiraju S, Gu S, et al (2018) The mirage of action-dependent baselines in reinforcement learning. In: International conference on machine learning. PMLR, pp 5015–5024
Ure NK, Inalhan G (2008) Design of higher order sliding mode control laws for a multi modal agile maneuvering UCAV. In: 2008 2nd international symposium on systems and control in aerospace and astronautics, pp 1–6
Ure NK, Inalhan G (2009) Design of a multi modal control framework for agile maneuvering UCAV. In: 2009 IEEE aerospace conference, pp 1–10
Yang R, Sun X, Narasimhan K (2019) A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Adv Neural Inf Process Syst 32
Zhang X, Wang R, Fang Y et al (2017) Acceleration-level pseudo-dynamic visual servoing of mobile robots with backstepping and dynamic surface control. IEEE Trans Syst Man Cybern Syst 49(10):2071–2081
Article Google Scholar

Download references

Acknowledgements

This work is supported in part by Tianjin Science Fund for Distinguished Young Scholars under Grant 19JCJQJC62100, in part by Tianjin Natural Science Foundation under Grant 20JCYBJC01470, and in part by the Fundamental Research Funds for the Central Universities. In addition, thanks to the JSBSim community for counsels on the use of JSBSim FDM.

Author information

Authors and Affiliations

Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin, 300350, China
Lun Li, Xuebo Zhang, Chenxu Qian & Runhua Wang
Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China
Lun Li, Xuebo Zhang, Chenxu Qian & Runhua Wang

Authors

Lun Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuebo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chenxu Qian
View author publications
You can also search for this author in PubMed Google Scholar
Runhua Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuebo Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, L., Zhang, X., Qian, C. et al. Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization. Neural Comput & Applic 35, 10239–10255 (2023). https://doi.org/10.1007/s00521-023-08232-6

Download citation

Received: 12 February 2022
Accepted: 06 January 2023
Published: 31 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00521-023-08232-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization

Abstract

Access this article

Similar content being viewed by others

An Extensive Application of Model Predictive Control Combined with Policy Search to Multi-agent Agile UAV Flight

Morphing aircraft acceleration and deceleration task morphing strategy using a reinforcement learning method

Learning motion primitives for planning swift maneuvers of quadrotor

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization

Abstract

Access this article

Similar content being viewed by others

An Extensive Application of Model Predictive Control Combined with Policy Search to Multi-agent Agile UAV Flight

Morphing aircraft acceleration and deceleration task morphing strategy using a reinforcement learning method

Learning motion primitives for planning swift maneuvers of quadrotor

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation