Proximal policy optimization with an integral compensator for quadrotor control

Hu, Huan; Wang, Qing-ling

doi:10.1631/FITEE.1900641

Proximal policy optimization with an integral compensator for quadrotor control

Published: 21 May 2020

Volume 21, pages 777–795, (2020)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

451 Accesses
23 Citations
3 Altmetric
Explore all metrics

Abstract

We use the advanced proximal policy optimization (PPO) reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the “model-free” quadrotor. The model is controlled by four learned neural networks, which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework, the speed tracking accuracy and robustness have been greatly enhanced. In addition, a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then, the flight policy of the model is continuously optimized in the online learning phase. Finally, the performances of our proposed algorithm are compared with those of the traditional PID algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attitude Control Based on Reinforcement Learning for Quadrotor

Stabilization of a quadrotor system using an optimal neural network controller

Article 26 December 2021

A Velocity Controller for Quadrotors Based on Reinforcement Learning

References

Abadi M, Barham P, Chen JM, et al., 2016. TensorFlow: a system for large-scale machine learning. Proc 12^th USENIX Conf on Operating Systems Design and Implementation, p.265–283.
Alexis K, Nikolakopoulos G, Tzes A, 2012. Model predictive quadrotor control: attitude, altitude and position experimental studies. IET Contr Theory Appl, 6(12):1812–1827. https://doi.org/10.1049/iet-cta.2011.0348
Article MathSciNet Google Scholar
Amari SI, 1998. Natural gradient works efficiently in learning. Neur Comput, 10(2):251–276. https://doi.org/10.1162/089976698300017746
Article Google Scholar
Antonelli G, Cataldi E, Arrichiello F, et al., 2018. Adaptive trajectory tracking for quadrotor MAVs in presence of parameter uncertainties and external disturbances. IEEE Trans Contr Syst Technol, 26(1):248–254. https://doi.org/10.1109/TCST.2017.2650679
Article Google Scholar
Bobtsov A, Guirik A, Budko M, et al., 2016. Hybrid parallel neuro-controller for multirotor unmanned aerial vehicle. Proc 8^th Int Congress on Ultra Modern Telecommunications and Control Systems and Workshops, p.1–4. https://doi.org/10.1109/ICUMT.2016.7765223
Bouabdallah S, Noth A, Siegwart R, 2004. PID vs LQ control techniques applied to an indoor micro quadrotor. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.2451–2456. https://doi.org/10.1109/IROS.2004.1389776
Dierks T, Jagannathan S, 2010. Output feedback control of a quadrotor UAV using neural networks. IEEE Trans Neur Netw, 21(1):50–66. https://doi.org/10.1109/TNN.2009.2034145
Article Google Scholar
Duan Y, Chen X, Houthooft R, et al., 2016. Benchmarking deep reinforcement learning for continuous control. Proc 33^rd Int Conf on Machine Learning, p.1329–1338.
Fumagalli M, Naldi R, Macchelli A, et al., 2012. Modeling and control of a flying robot for contact inspection. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.3532–3537. https://doi.org/10.1109/IROS.2012.6385917
Hwangbo J, Sa I, Siegwart R, et al., 2017. Control of a quadrotor with reinforcement learning. IEEE Robot Autom Lett, 2(4):2096–2103. https://doi.org/10.1109/LRA.2017.2720851
Article Google Scholar
Kakade S, Langford J, 2002. Approximately optimal approximate reinforcement learning. Proc 19^th Int Conf on Machine Learning, p.267–274.
Kingma DP, Ba J, 2014. ADAM: a method for stochastic optimization. https://arxiv.org/abs/1412.6980
Lee T, 2013. Robust adaptive attitude tracking on SO(3) with an application to a quadrotor UAV. IEEE Trans Contr Syst Technol, 21(5):1924–1930. https://doi.org/10.1109/TCST.2012.2209887
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, et al., 2016. Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971
Miglino O, Lund HH, Nolfi S, 1995. Evolving mobile robots in simulated and real environments. Artif Life, 2(4):417–434. https://doi.org/10.1162/artl.1995.2.4.417
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533. https://doi.org/10.1038/nature14236
Article Google Scholar
Quanser, 2015. User Manual Qball 2 for QUARC: Set Up and Configuration. Quanser, Inc., Markham, ON, Canada.
Google Scholar
Rozi HA, Susanto E, Dwibawa IP, 2017. Quadrotor model with proportional derivative controller. Proc Int Conf on Control, Electronics, Renewable Energy and Communications, p.241–246. https://doi.org/10.1109/ICCEREC.2017.8226676
Salih AL, Moghavvemi M, Mohamed HAF, et al., 2010. Flight PID controller design for a UAV quadrotor. Sci Res Essays, 5(23):3660–3667.
Google Scholar
Santoso F, Garratt MA, Anavatti SG, 2018. State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Trans Autom Sci Eng, 15(2):613–627. https://doi.org/10.1109/TASE.2017.2651109
Article Google Scholar
Schulman J, 2016. Optimizing Expectations: from Deep Reinforcement Learning to Stochastic Computation Graphs. PhD Thesis, University of California, Berkeley, USA.
Google Scholar
Schulman J, Levine S, Moritz P, et al., 2015. Trust region policy optimization. Proc 31^st Int Conf on Machine Learning, p.1889–1897.
Schulman J, Wolski F, Dhariwal P, et al., 2017. Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347
Shi DJ, Dai XH, Zhang XW, et al., 2017. A practical performance evaluation method for electric multicopters. IEEE/ASME Trans Mechatr, 22(3):1337–1348. https://doi.org/10.1109/TMECH.2017.2675913
Article Google Scholar
Silver D, Lever G, Heess N, et al., 2014. Deterministic policy gradient algorithms. Proc 31^st Int Conf on Machine Learning, p.1–9.
Silver D, Huang A, Maddison CJ, et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489. https://doi.org/10.1038/nature16961
Article Google Scholar
Sutton RS, 1995. Generalization in reinforcement learning: successful examples using sparse coarse coding. Proc 8^th Int Conf on Neural Information Processing Systems, p.1038–1044.
Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, USA.
MATH Google Scholar
Tomic T, Schmid K, Lutz P, et al., 2012. Toward a fully autonomous UAV: research platform for indoor and outdoor urban search and rescue. IEEE Robot Autom Mag, 19(3): 46–56. https://doi.org/10.1109/MRA.2012.2206473
Article Google Scholar
Valente J, del Cerro J, Barrientos A, et al., 2013. Aerial coverage optimization in precision agriculture management: a musical harmony inspired approach. Comput Electron Agric, 99:153–159. https://doi.org/10.1016/j.compag.2013.09.008
Article Google Scholar
Valenti RG, Jian YD, Ni K, et al., 2016. An autonomous flyer photographer. Proc IEEE Int Conf on Cyber Technology in Automation, Control, and Intelligent Systems, p.273–278. https://doi.org/10.1109/CYBER.2016.7574835
van Hasselt H, 2010. Double Q-learning. Proc 23^rd Int Conf on Neural Information Processing Systems, p.2613–2621.
van Hasselt H, Guez A, Silver D, 2016. Deep reinforcement learning with double Q-learning. Proc 30^th AAAI Conf on Artificial Intelligence, p.2094–2100.
Wang YD, Sun J, He HB, et al., 2019. Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Trans Syst Man Cybern Syst, p.1–13. https://doi.org/10.1109/TSMC.2018.2884725
Waslander SL, Hoffmann GM, Jang JS, et al., 2005. Multiagent quadrotor testbed control design: integral sliding mode vs. reinforcement learning. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.3712–3717. https://doi.org/10.1109/IROS.2005.1545025
Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3–4):279–292. https://doi.org/10.1007/BF00992698
MATH Google Scholar
Williams-Hayes PS, 2005. Flight test implementation of a second generation intelligent flight control system. Proc Infotech@Aerospace, p.26–29. https://doi.org/10.2514/6.2005-6995
Xu B, 2018. Composite learning finite-time control with application to quadrotors. IEEE Trans Syst Man Cybern Syst, 48(10):1806–1815. https://doi.org/10.1109/TSMC.2017.2698473
Article Google Scholar
Xu R, Ozguner U, 2006. Sliding mode control of a quadrotor helicopter. Proc 45^th IEEE Conf on Decision and Control, p.4957–4962. https://doi.org/10.1109/CDC.2006.377588
Yang HJ, Cheng L, Xia YQ, et al., 2018. Active disturbance rejection attitude control for a dual closed-loop quadrotor under gust wind. IEEE Trans Contr Syst Technol, 26(4): 1400–1405. https://doi.org/10.1109/TCST.2017.2710951
Article Google Scholar
Yechiel O, Guterman H, 2017. A survey of adaptive control. Int Rob Autom J, 3(2):290–292. https://doi.org/10.15406/iratj.2017.03.00053
Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation, Southeast University, Nanjing, 210096, China
Huan Hu & Qing-ling Wang

Authors

Huan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Qing-ling Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Qing-ling WANG guided the research. Huan HU performed the experiments, drafted, revised, and finalized the paper.

Corresponding author

Correspondence to Qing-ling Wang.

Ethics declarations

Huan HU and Qing-ling WANG declare that they have no conflict of interest.

Additional information

Project supported by the National Key R&D Program of China (No. 2018AAA0101400), the National Natural Science Foundation of China (Nos. 61973074, U1713209, 61520106009, and 61533008), the Science and Technology on Information System Engineering Laboratory (No. 05201902), and the Fundamental Research Funds for the Central Universities, China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, H., Wang, Ql. Proximal policy optimization with an integral compensator for quadrotor control. Front Inform Technol Electron Eng 21, 777–795 (2020). https://doi.org/10.1631/FITEE.1900641

Download citation

Received: 22 November 2019
Accepted: 24 February 2020
Published: 21 May 2020
Issue Date: May 2020
DOI: https://doi.org/10.1631/FITEE.1900641

Key words

CLC number

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proximal policy optimization with an integral compensator for quadrotor control

Abstract

Access this article

Similar content being viewed by others

Attitude Control Based on Reinforcement Learning for Quadrotor

Stabilization of a quadrotor system using an optimal neural network controller

A Velocity Controller for Quadrotors Based on Reinforcement Learning

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Proximal policy optimization with an integral compensator for quadrotor control

Abstract

Access this article

Similar content being viewed by others

Attitude Control Based on Reinforcement Learning for Quadrotor

Stabilization of a quadrotor system using an optimal neural network controller

A Velocity Controller for Quadrotors Based on Reinforcement Learning

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation