Abstract
Autonomous agile flight control has been a challenging problem due to complex highly nonlinear dynamics, and generating feasible basic flight maneuvers off-line for subsequent online motion planning has become a solution. In this paper, we present a novel reinforcement learning-based basic flight maneuvers generation method for a 6 degrees of freedom aircraft model via Proximal Policy Optimization (PPO). Different from traditional control methods depending on model simplification or complex controller design, the proposed algorithm can automatically generate maneuvers by directly selecting different aircrafts and adding or subtracting reward components. First, we propose a new approach to ensure the continuity and avoid large oscillations of the control command by designing its time derivative as the output of policy network and then using the integral operator to obtain the final control action exerted to the plane, which is effective to achieve complex flight control tasks based on the aircraft model with high-fidelity. Second, the reward function used in PPO training is comprised of desired aims of which the weights can be adaptive according to the task type or conditional trigger during training. By this method, we successfully generate most of the basic flight maneuvers, including level flight, coordinated turn, climb/descent and horizontal roll. A series of simulation results show that our proposed algorithm can not only learn these maneuvers in a short time within 0.2–10 h but also has superior performance in settling time and robustness compared with the traditional PID control method while attaining similar accuracy.
Similar content being viewed by others
Data availability
The datasets generated during the current study are available from the corresponding author on reasonable request.
References
Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34:5649–5664
Altameem T, Amoon M, Altameem A (2022) A deep reinforcement learning process based on robotic training to assist mental health patients. Neural Comput Appl 34(13):10587–10596
Beard RW, McLain TW (2012) Small unmanned aircraft: theory and practice. Princeton University Press, Princeton
Berndt JS (2011) Jsbsim, an open source platform independent flight dynamics model in c++. JSBSim Ref Man v1 0: 4–71
Bohez S, Abdolmaleki A, Neunert M, et al (2019) Value constrained model-free continuous control. Preprint arXiv:1902.04623
Bøhn E, Coates EM, Moe S, et al (2019) Deep reinforcement learning attitude control of fixed-wing UAVs using proximal policy optimization. In: 2019 international conference on unmanned aircraft systems (ICUAS), pp 523–533
Bulka E, Nahon M (2017) Autonomous control of agile fixed-wing UAVs performing aerobatic maneuvers. In: 2017 international conference on unmanned aircraft systems (ICUAS), pp 104–113
Bulka E, Nahon M (2019) Automatic control for aerobatic maneuvering of agile fixed-wing UAVs. J Intell Robot Syst 93(1):85–100
Chang-Joo KIM (2020) Implementation of tactical maneuvers with maneuver libraries. Chin J Aeronaut 33(1):255–270
Clark J, Amodei D (2016) Faulty reward functions in the wild. Internet: https://blogopenaicom/faulty-reward-functions
Clarke SG, Hwang I (2020) Deep reinforcement learning control for aerobatic maneuvering of agile fixed-wing aircraft. In: AIAA Scitech 2020 Forum, p 136
Engstrom L, Ilyas A, Santurkar S, et al (2019) Implementation matters in deep rl: a case study on ppo and trpo. In: International conference on learning representations
Engstrom L, Ilyas A, Santurkar S, et al (2020) Implementation matters in deep policy gradients: a case study on ppo and trpo. Preprint arXiv:2005.12729
Frank A, McGrew J, Valenti M, et al (2007) Hover, transition, and level flight control design for a single-propeller indoor airplane. In: AIAA guidance, navigation and control conference and exhibit, p 6318
Green WE, Oh PY (2005) A MAV that flies like an airplane and hovers like a helicopter. In: Proceedings, 2005 IEEE/ASME international conference on advanced intelligent mechatronics, pp 693–698
Gu S, Lillicrap T, Sutskever I, et al (2016) Continuous deep q-learning with model-based acceleration. In: International conference on machine learning, pp 2829–2838
Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, Sweden, pp 1861–1870
Hammond M (2017) Deep reinforcement learning in the enterprise: bridging the gap from games to industry. In: Artificial intelligence conference presentation
He S, Zhang M, Fang H et al (2020) Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information. Neural Comput Appl 32(18):14311–14320
Hull DG (2007) Fundamentals of airplane flight mechanics. Springer, Cham
Karimi J, Pourtakdoust SH (2013) Integrated motion planning and trajectory control system for unmanned air vehicles. Proc Inst Mech Eng Part G J Aerosp Eng 227(1):3–18
Kevin R (2014) Regret-based reward elicitation for Markov decision processes. PhD thesis, University of Toronto (Canada)
Khan W, Nahon M (2016) Modeling dynamics of agile fixed-wing UAVs for real-time applications. In: 2016 international conference on unmanned aircraft systems (ICUAS), pp 1303–1312
Kim CJ, Heo MJ, Hwang JW, et al (2020) Development of real-time maneuver library generation technique for implementing tactical maneuvers of fixed-wing aircraft. Int J Aerosp Eng 2020
Koch W, Mancuso R, West R et al (2019) Reinforcement learning for UAV attitude control. ACM Trans Cyber-Phys Syst 3(2):1–21
Levin JM, Nahon M, Paranjape AA (2019) Real-time motion planning with a fixed-wing UAV using an agile maneuver space. Auton Robot 43(8):2111–2130
Li B, Zhou W, Sun J et al (2018) Development of model predictive controller for a Tail-Sitter VTOL UAV in hover flight. Sensors 18(9):2859
Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. Preprint arXiv:1509.02971
Lixin W, Youguang GUO, Zhang Q et al (2017) Suggestion for aircraft flying qualities requirements of a short-range air combat mission. Chin J Aeronaut 30(3):881–897
Man Z, Huang GB (2020) Special issue on extreme learning machine and deep learning networks. Neural Comput Appl 32(18):14241–14245
McDonnell RJ (1990) Investigation of the high angle of attack dynamics of the f-15b using bifurcation analysis. Technical report
Moore J, Cory R, Tedrake R (2014) Robust post-stall perching with a simple fixed-wing glider using LQR-trees. Bioinspir Biomim 9(2):25013
Moorhouse DJ, Woodcock RJ (1982) Background information and user guide for mil-f-8785c, military specification-flying qualities of piloted airplanes. Technical report
Ng AY, Coates A, Diel M, et al (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Experimental robotics IX. Springer, pp 363–372
Perry AR (2004) The flightgear flight simulator. In: Proceedings of the USENIX annual technical conference
Randløv J, Alstrøm P (1998) Learning to drive a bicycle using reinforcement learning and shaping. In: ICML, pp 463–471
Rennie G (2018) Autonomous control of simulated fixed wing aircraft using deep reinforcement learning. Master’s thesis, University of BATH
Roberts JW, Cory R, Tedrake R (2009) On the controllability of fixed-wing perching. In: 2009 American control conference, pp 2018–2023
Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. Preprint arXiv:1506.02438
Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. Preprint arXiv:1707.06347
Selig MS (2014) Real-time flight simulation of highly maneuverable unmanned aerial vehicles. J Aircr 51(6):1705–1725
Shao J, Lin H, Zhang K (2014) Swarm robots reinforcement learning convergence accuracy-based learning classifier systems with gradient descent (XCS-GD). Neural Comput Appl 25(2):263–268
Smith D, Valasek J (2001) Agility metric robustness using linear error theory. J Guid Control Dyn 24(2):340–351
Sobolic FM (2009) Agile flight control techniques for a fixed-wing aircraft
Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT press, Cambridge
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge
Tang C, Lai YC (2020) Deep reinforcement learning automatic landing control of fixed-wing aircraft using deep deterministic policy gradient. In: 2020 international conference on unmanned aircraft systems (ICUAS), pp 1–9
Tucker G, Bhupatiraju S, Gu S, et al (2018) The mirage of action-dependent baselines in reinforcement learning. In: International conference on machine learning. PMLR, pp 5015–5024
Ure NK, Inalhan G (2008) Design of higher order sliding mode control laws for a multi modal agile maneuvering UCAV. In: 2008 2nd international symposium on systems and control in aerospace and astronautics, pp 1–6
Ure NK, Inalhan G (2009) Design of a multi modal control framework for agile maneuvering UCAV. In: 2009 IEEE aerospace conference, pp 1–10
Yang R, Sun X, Narasimhan K (2019) A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Adv Neural Inf Process Syst 32
Zhang X, Wang R, Fang Y et al (2017) Acceleration-level pseudo-dynamic visual servoing of mobile robots with backstepping and dynamic surface control. IEEE Trans Syst Man Cybern Syst 49(10):2071–2081
Acknowledgements
This work is supported in part by Tianjin Science Fund for Distinguished Young Scholars under Grant 19JCJQJC62100, in part by Tianjin Natural Science Foundation under Grant 20JCYBJC01470, and in part by the Fundamental Research Funds for the Central Universities. In addition, thanks to the JSBSim community for counsels on the use of JSBSim FDM.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, L., Zhang, X., Qian, C. et al. Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization. Neural Comput & Applic 35, 10239–10255 (2023). https://doi.org/10.1007/s00521-023-08232-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08232-6