Skip to main content
Log in

Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Autonomous agile flight control has been a challenging problem due to complex highly nonlinear dynamics, and generating feasible basic flight maneuvers off-line for subsequent online motion planning has become a solution. In this paper, we present a novel reinforcement learning-based basic flight maneuvers generation method for a 6 degrees of freedom aircraft model via Proximal Policy Optimization (PPO). Different from traditional control methods depending on model simplification or complex controller design, the proposed algorithm can automatically generate maneuvers by directly selecting different aircrafts and adding or subtracting reward components. First, we propose a new approach to ensure the continuity and avoid large oscillations of the control command by designing its time derivative as the output of policy network and then using the integral operator to obtain the final control action exerted to the plane, which is effective to achieve complex flight control tasks based on the aircraft model with high-fidelity. Second, the reward function used in PPO training is comprised of desired aims of which the weights can be adaptive according to the task type or conditional trigger during training. By this method, we successfully generate most of the basic flight maneuvers, including level flight, coordinated turn, climb/descent and horizontal roll. A series of simulation results show that our proposed algorithm can not only learn these maneuvers in a short time within 0.2–10 h but also has superior performance in settling time and robustness compared with the traditional PID control method while attaining similar accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

  1. Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34:5649–5664

    Article  Google Scholar 

  2. Altameem T, Amoon M, Altameem A (2022) A deep reinforcement learning process based on robotic training to assist mental health patients. Neural Comput Appl 34(13):10587–10596

    Article  Google Scholar 

  3. Beard RW, McLain TW (2012) Small unmanned aircraft: theory and practice. Princeton University Press, Princeton

    Book  Google Scholar 

  4. Berndt JS (2011) Jsbsim, an open source platform independent flight dynamics model in c++. JSBSim Ref Man v1 0: 4–71

  5. Bohez S, Abdolmaleki A, Neunert M, et al (2019) Value constrained model-free continuous control. Preprint arXiv:1902.04623

  6. Bøhn E, Coates EM, Moe S, et al (2019) Deep reinforcement learning attitude control of fixed-wing UAVs using proximal policy optimization. In: 2019 international conference on unmanned aircraft systems (ICUAS), pp 523–533

  7. Bulka E, Nahon M (2017) Autonomous control of agile fixed-wing UAVs performing aerobatic maneuvers. In: 2017 international conference on unmanned aircraft systems (ICUAS), pp 104–113

  8. Bulka E, Nahon M (2019) Automatic control for aerobatic maneuvering of agile fixed-wing UAVs. J Intell Robot Syst 93(1):85–100

    Article  Google Scholar 

  9. Chang-Joo KIM (2020) Implementation of tactical maneuvers with maneuver libraries. Chin J Aeronaut 33(1):255–270

    Article  Google Scholar 

  10. Clark J, Amodei D (2016) Faulty reward functions in the wild. Internet: https://blogopenaicom/faulty-reward-functions

  11. Clarke SG, Hwang I (2020) Deep reinforcement learning control for aerobatic maneuvering of agile fixed-wing aircraft. In: AIAA Scitech 2020 Forum, p 136

  12. Engstrom L, Ilyas A, Santurkar S, et al (2019) Implementation matters in deep rl: a case study on ppo and trpo. In: International conference on learning representations

  13. Engstrom L, Ilyas A, Santurkar S, et al (2020) Implementation matters in deep policy gradients: a case study on ppo and trpo. Preprint arXiv:2005.12729

  14. Frank A, McGrew J, Valenti M, et al (2007) Hover, transition, and level flight control design for a single-propeller indoor airplane. In: AIAA guidance, navigation and control conference and exhibit, p 6318

  15. Green WE, Oh PY (2005) A MAV that flies like an airplane and hovers like a helicopter. In: Proceedings, 2005 IEEE/ASME international conference on advanced intelligent mechatronics, pp 693–698

  16. Gu S, Lillicrap T, Sutskever I, et al (2016) Continuous deep q-learning with model-based acceleration. In: International conference on machine learning, pp 2829–2838

  17. Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, Sweden, pp 1861–1870

  18. Hammond M (2017) Deep reinforcement learning in the enterprise: bridging the gap from games to industry. In: Artificial intelligence conference presentation

  19. He S, Zhang M, Fang H et al (2020) Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information. Neural Comput Appl 32(18):14311–14320

    Article  Google Scholar 

  20. Hull DG (2007) Fundamentals of airplane flight mechanics. Springer, Cham

    MATH  Google Scholar 

  21. Karimi J, Pourtakdoust SH (2013) Integrated motion planning and trajectory control system for unmanned air vehicles. Proc Inst Mech Eng Part G J Aerosp Eng 227(1):3–18

    Article  Google Scholar 

  22. Kevin R (2014) Regret-based reward elicitation for Markov decision processes. PhD thesis, University of Toronto (Canada)

  23. Khan W, Nahon M (2016) Modeling dynamics of agile fixed-wing UAVs for real-time applications. In: 2016 international conference on unmanned aircraft systems (ICUAS), pp 1303–1312

  24. Kim CJ, Heo MJ, Hwang JW, et al (2020) Development of real-time maneuver library generation technique for implementing tactical maneuvers of fixed-wing aircraft. Int J Aerosp Eng 2020

  25. Koch W, Mancuso R, West R et al (2019) Reinforcement learning for UAV attitude control. ACM Trans Cyber-Phys Syst 3(2):1–21

    Article  Google Scholar 

  26. Levin JM, Nahon M, Paranjape AA (2019) Real-time motion planning with a fixed-wing UAV using an agile maneuver space. Auton Robot 43(8):2111–2130

    Article  Google Scholar 

  27. Li B, Zhou W, Sun J et al (2018) Development of model predictive controller for a Tail-Sitter VTOL UAV in hover flight. Sensors 18(9):2859

    Article  Google Scholar 

  28. Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. Preprint arXiv:1509.02971

  29. Lixin W, Youguang GUO, Zhang Q et al (2017) Suggestion for aircraft flying qualities requirements of a short-range air combat mission. Chin J Aeronaut 30(3):881–897

    Article  Google Scholar 

  30. Man Z, Huang GB (2020) Special issue on extreme learning machine and deep learning networks. Neural Comput Appl 32(18):14241–14245

    Article  Google Scholar 

  31. McDonnell RJ (1990) Investigation of the high angle of attack dynamics of the f-15b using bifurcation analysis. Technical report

  32. Moore J, Cory R, Tedrake R (2014) Robust post-stall perching with a simple fixed-wing glider using LQR-trees. Bioinspir Biomim 9(2):25013

    Article  Google Scholar 

  33. Moorhouse DJ, Woodcock RJ (1982) Background information and user guide for mil-f-8785c, military specification-flying qualities of piloted airplanes. Technical report

  34. Ng AY, Coates A, Diel M, et al (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Experimental robotics IX. Springer, pp 363–372

  35. Perry AR (2004) The flightgear flight simulator. In: Proceedings of the USENIX annual technical conference

  36. Randløv J, Alstrøm P (1998) Learning to drive a bicycle using reinforcement learning and shaping. In: ICML, pp 463–471

  37. Rennie G (2018) Autonomous control of simulated fixed wing aircraft using deep reinforcement learning. Master’s thesis, University of BATH

  38. Roberts JW, Cory R, Tedrake R (2009) On the controllability of fixed-wing perching. In: 2009 American control conference, pp 2018–2023

  39. Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. Preprint arXiv:1506.02438

  40. Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. Preprint arXiv:1707.06347

  41. Selig MS (2014) Real-time flight simulation of highly maneuverable unmanned aerial vehicles. J Aircr 51(6):1705–1725

    Article  Google Scholar 

  42. Shao J, Lin H, Zhang K (2014) Swarm robots reinforcement learning convergence accuracy-based learning classifier systems with gradient descent (XCS-GD). Neural Comput Appl 25(2):263–268

    Article  Google Scholar 

  43. Smith D, Valasek J (2001) Agility metric robustness using linear error theory. J Guid Control Dyn 24(2):340–351

    Article  Google Scholar 

  44. Sobolic FM (2009) Agile flight control techniques for a fixed-wing aircraft

  45. Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT press, Cambridge

    Book  MATH  Google Scholar 

  46. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge

    MATH  Google Scholar 

  47. Tang C, Lai YC (2020) Deep reinforcement learning automatic landing control of fixed-wing aircraft using deep deterministic policy gradient. In: 2020 international conference on unmanned aircraft systems (ICUAS), pp 1–9

  48. Tucker G, Bhupatiraju S, Gu S, et al (2018) The mirage of action-dependent baselines in reinforcement learning. In: International conference on machine learning. PMLR, pp 5015–5024

  49. Ure NK, Inalhan G (2008) Design of higher order sliding mode control laws for a multi modal agile maneuvering UCAV. In: 2008 2nd international symposium on systems and control in aerospace and astronautics, pp 1–6

  50. Ure NK, Inalhan G (2009) Design of a multi modal control framework for agile maneuvering UCAV. In: 2009 IEEE aerospace conference, pp 1–10

  51. Yang R, Sun X, Narasimhan K (2019) A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Adv Neural Inf Process Syst 32

  52. Zhang X, Wang R, Fang Y et al (2017) Acceleration-level pseudo-dynamic visual servoing of mobile robots with backstepping and dynamic surface control. IEEE Trans Syst Man Cybern Syst 49(10):2071–2081

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by Tianjin Science Fund for Distinguished Young Scholars under Grant 19JCJQJC62100, in part by Tianjin Natural Science Foundation under Grant 20JCYBJC01470, and in part by the Fundamental Research Funds for the Central Universities. In addition, thanks to the JSBSim community for counsels on the use of JSBSim FDM.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuebo Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, L., Zhang, X., Qian, C. et al. Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization. Neural Comput & Applic 35, 10239–10255 (2023). https://doi.org/10.1007/s00521-023-08232-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08232-6

Keywords

Navigation