Abstract
We use the advanced proximal policy optimization (PPO) reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the “model-free” quadrotor. The model is controlled by four learned neural networks, which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework, the speed tracking accuracy and robustness have been greatly enhanced. In addition, a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then, the flight policy of the model is continuously optimized in the online learning phase. Finally, the performances of our proposed algorithm are compared with those of the traditional PID algorithm.
Similar content being viewed by others
References
Abadi M, Barham P, Chen JM, et al., 2016. TensorFlow: a system for large-scale machine learning. Proc 12th USENIX Conf on Operating Systems Design and Implementation, p.265–283.
Alexis K, Nikolakopoulos G, Tzes A, 2012. Model predictive quadrotor control: attitude, altitude and position experimental studies. IET Contr Theory Appl, 6(12):1812–1827. https://doi.org/10.1049/iet-cta.2011.0348
Amari SI, 1998. Natural gradient works efficiently in learning. Neur Comput, 10(2):251–276. https://doi.org/10.1162/089976698300017746
Antonelli G, Cataldi E, Arrichiello F, et al., 2018. Adaptive trajectory tracking for quadrotor MAVs in presence of parameter uncertainties and external disturbances. IEEE Trans Contr Syst Technol, 26(1):248–254. https://doi.org/10.1109/TCST.2017.2650679
Bobtsov A, Guirik A, Budko M, et al., 2016. Hybrid parallel neuro-controller for multirotor unmanned aerial vehicle. Proc 8th Int Congress on Ultra Modern Telecommunications and Control Systems and Workshops, p.1–4. https://doi.org/10.1109/ICUMT.2016.7765223
Bouabdallah S, Noth A, Siegwart R, 2004. PID vs LQ control techniques applied to an indoor micro quadrotor. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.2451–2456. https://doi.org/10.1109/IROS.2004.1389776
Dierks T, Jagannathan S, 2010. Output feedback control of a quadrotor UAV using neural networks. IEEE Trans Neur Netw, 21(1):50–66. https://doi.org/10.1109/TNN.2009.2034145
Duan Y, Chen X, Houthooft R, et al., 2016. Benchmarking deep reinforcement learning for continuous control. Proc 33rd Int Conf on Machine Learning, p.1329–1338.
Fumagalli M, Naldi R, Macchelli A, et al., 2012. Modeling and control of a flying robot for contact inspection. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.3532–3537. https://doi.org/10.1109/IROS.2012.6385917
Hwangbo J, Sa I, Siegwart R, et al., 2017. Control of a quadrotor with reinforcement learning. IEEE Robot Autom Lett, 2(4):2096–2103. https://doi.org/10.1109/LRA.2017.2720851
Kakade S, Langford J, 2002. Approximately optimal approximate reinforcement learning. Proc 19th Int Conf on Machine Learning, p.267–274.
Kingma DP, Ba J, 2014. ADAM: a method for stochastic optimization. https://arxiv.org/abs/1412.6980
Lee T, 2013. Robust adaptive attitude tracking on SO(3) with an application to a quadrotor UAV. IEEE Trans Contr Syst Technol, 21(5):1924–1930. https://doi.org/10.1109/TCST.2012.2209887
Lillicrap TP, Hunt JJ, Pritzel A, et al., 2016. Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971
Miglino O, Lund HH, Nolfi S, 1995. Evolving mobile robots in simulated and real environments. Artif Life, 2(4):417–434. https://doi.org/10.1162/artl.1995.2.4.417
Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533. https://doi.org/10.1038/nature14236
Quanser, 2015. User Manual Qball 2 for QUARC: Set Up and Configuration. Quanser, Inc., Markham, ON, Canada.
Rozi HA, Susanto E, Dwibawa IP, 2017. Quadrotor model with proportional derivative controller. Proc Int Conf on Control, Electronics, Renewable Energy and Communications, p.241–246. https://doi.org/10.1109/ICCEREC.2017.8226676
Salih AL, Moghavvemi M, Mohamed HAF, et al., 2010. Flight PID controller design for a UAV quadrotor. Sci Res Essays, 5(23):3660–3667.
Santoso F, Garratt MA, Anavatti SG, 2018. State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Trans Autom Sci Eng, 15(2):613–627. https://doi.org/10.1109/TASE.2017.2651109
Schulman J, 2016. Optimizing Expectations: from Deep Reinforcement Learning to Stochastic Computation Graphs. PhD Thesis, University of California, Berkeley, USA.
Schulman J, Levine S, Moritz P, et al., 2015. Trust region policy optimization. Proc 31st Int Conf on Machine Learning, p.1889–1897.
Schulman J, Wolski F, Dhariwal P, et al., 2017. Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347
Shi DJ, Dai XH, Zhang XW, et al., 2017. A practical performance evaluation method for electric multicopters. IEEE/ASME Trans Mechatr, 22(3):1337–1348. https://doi.org/10.1109/TMECH.2017.2675913
Silver D, Lever G, Heess N, et al., 2014. Deterministic policy gradient algorithms. Proc 31st Int Conf on Machine Learning, p.1–9.
Silver D, Huang A, Maddison CJ, et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489. https://doi.org/10.1038/nature16961
Sutton RS, 1995. Generalization in reinforcement learning: successful examples using sparse coarse coding. Proc 8th Int Conf on Neural Information Processing Systems, p.1038–1044.
Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, USA.
Tomic T, Schmid K, Lutz P, et al., 2012. Toward a fully autonomous UAV: research platform for indoor and outdoor urban search and rescue. IEEE Robot Autom Mag, 19(3): 46–56. https://doi.org/10.1109/MRA.2012.2206473
Valente J, del Cerro J, Barrientos A, et al., 2013. Aerial coverage optimization in precision agriculture management: a musical harmony inspired approach. Comput Electron Agric, 99:153–159. https://doi.org/10.1016/j.compag.2013.09.008
Valenti RG, Jian YD, Ni K, et al., 2016. An autonomous flyer photographer. Proc IEEE Int Conf on Cyber Technology in Automation, Control, and Intelligent Systems, p.273–278. https://doi.org/10.1109/CYBER.2016.7574835
van Hasselt H, 2010. Double Q-learning. Proc 23rd Int Conf on Neural Information Processing Systems, p.2613–2621.
van Hasselt H, Guez A, Silver D, 2016. Deep reinforcement learning with double Q-learning. Proc 30th AAAI Conf on Artificial Intelligence, p.2094–2100.
Wang YD, Sun J, He HB, et al., 2019. Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Trans Syst Man Cybern Syst, p.1–13. https://doi.org/10.1109/TSMC.2018.2884725
Waslander SL, Hoffmann GM, Jang JS, et al., 2005. Multiagent quadrotor testbed control design: integral sliding mode vs. reinforcement learning. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.3712–3717. https://doi.org/10.1109/IROS.2005.1545025
Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3–4):279–292. https://doi.org/10.1007/BF00992698
Williams-Hayes PS, 2005. Flight test implementation of a second generation intelligent flight control system. Proc Infotech@Aerospace, p.26–29. https://doi.org/10.2514/6.2005-6995
Xu B, 2018. Composite learning finite-time control with application to quadrotors. IEEE Trans Syst Man Cybern Syst, 48(10):1806–1815. https://doi.org/10.1109/TSMC.2017.2698473
Xu R, Ozguner U, 2006. Sliding mode control of a quadrotor helicopter. Proc 45th IEEE Conf on Decision and Control, p.4957–4962. https://doi.org/10.1109/CDC.2006.377588
Yang HJ, Cheng L, Xia YQ, et al., 2018. Active disturbance rejection attitude control for a dual closed-loop quadrotor under gust wind. IEEE Trans Contr Syst Technol, 26(4): 1400–1405. https://doi.org/10.1109/TCST.2017.2710951
Yechiel O, Guterman H, 2017. A survey of adaptive control. Int Rob Autom J, 3(2):290–292. https://doi.org/10.15406/iratj.2017.03.00053
Author information
Authors and Affiliations
Contributions
Qing-ling WANG guided the research. Huan HU performed the experiments, drafted, revised, and finalized the paper.
Corresponding author
Ethics declarations
Huan HU and Qing-ling WANG declare that they have no conflict of interest.
Additional information
Project supported by the National Key R&D Program of China (No. 2018AAA0101400), the National Natural Science Foundation of China (Nos. 61973074, U1713209, 61520106009, and 61533008), the Science and Technology on Information System Engineering Laboratory (No. 05201902), and the Fundamental Research Funds for the Central Universities, China
Rights and permissions
About this article
Cite this article
Hu, H., Wang, Ql. Proximal policy optimization with an integral compensator for quadrotor control. Front Inform Technol Electron Eng 21, 777–795 (2020). https://doi.org/10.1631/FITEE.1900641
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1900641