skip to main content
10.1145/3471287.3471288acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicisdmConference Proceedingsconference-collections
research-article

Actor-Critic Neural Network Based Finite-time Control for Uncertain Robotic Systems

Published:25 September 2021Publication History

ABSTRACT

This paper investigates reinforcement learning (RL) based finite-time control (FTC) of uncertain robotic systems. The proposed methodology consists of a terminal sliding mode based finite-time controller and an Actor-Critic (AC)-based RL loop that adjusts the output of the neural network. The terminal sliding mode controller is designed to ensure calculable settling time, as compared to conventional asymptotic stability. The AC-based RL loop uses recursive least square technique to update the critic network and policy gradient algorithm to estimate the parameters of actor network. We show that the AC is beneficial to improve robustness of terminal sliding mode controller both in approaching stage and near equilibrium. The performance of proposed controller is compared to that with only terminal sliding mode controller. The simulation results show that proposed controller outperforms pure terminal sliding mode controller, and that AC is a successful supplement to FTC.

References

  1. W. Bai, T. Li, and S. Tong. 2020. NN Reinforcement Learning Adaptive Control for a Class of Nonstrict-Feedback Discrete-Time Systems. IEEE Transactions on Cybernetics 50, 11 (2020), 4573–4584. https://doi.org/10.1109/TCYB.2020.2963849Google ScholarGoogle ScholarCross RefCross Ref
  2. Sanjay P. Bhat and Dennis S. Bernstein. 2000. Finite-Time Stability of Continuous Autonomous Systems. SIAM Journal on Control and Optimization 38, 3 (2000), 751–766.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Birari, A. Kharat, P. Joshi, R. Pakhare, U. Datar, and V. Khotre. 2016. Velocity control of omni drive robot using PID controller and dual feedback. In 2016 IEEE First International Conference on Control, Measurement and Instrumentation (CMI). 295–299. https://doi.org/10.1109/CMI.2016.7413758Google ScholarGoogle ScholarCross RefCross Ref
  4. Peter Corke. 2013. Robotics, Vision and Control: Fundamental Algorithms in MATLAB (1st ed.). Springer Publishing Company, Incorporated.Google ScholarGoogle Scholar
  5. X. Feng, Y. Hu, and H. Yin. 2015. The Asymptotic Stability of a System with Two Identical Robots and a Built-In Safety. In 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics, Vol. 1. 370–373. https://doi.org/10.1109/IHMSC.2015.40Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Gromniak and J. Stenzel. 2019. Deep Reinforcement Learning for Mobile Robot Navigation. In 2019 4th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS). 68–73. https://doi.org/10.1109/ACIRS.2019.8935944Google ScholarGoogle Scholar
  7. Y. Guo, P. Wang, G. Ma, and C. Li. 2018. Prescribed Performance Based Finite-Time Attitude Tracking Control for Rigid Spacecraft. In 2018 Eighth International Conference on Information Science and Technology (ICIST). 121–126. https://doi.org/10.1109/ICIST.2018.8426177Google ScholarGoogle Scholar
  8. Z. Hu, Q. Chen, Y. Hu, and C. Chen. 2018. Barrier Lyapunov Function Based Finite-Time Backstepping Control of Quadrotor with Full State Constraints. In 2018 37th Chinese Control Conference (CCC). 9877–9882. https://doi.org/10.23919/ChiCC.2018.8483906Google ScholarGoogle Scholar
  9. R. Inoue, K. Watanabe, and H. Igarashi. 2010. Acquiring of walking behavior for four-legged robots using actor-critic method based on policy gradient. In 2010 IEEE International Symposium on Intelligent Control. 795–800. https://doi.org/10.1109/ISIC.2010.5612891Google ScholarGoogle ScholarCross RefCross Ref
  10. K. Ito and F. Matsuno. 2002. A study of reinforcement learning for the robot with many degrees of freedom - acquisition of locomotion patterns for multi-legged robot. In Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), Vol. 4. 3392–3397 vol.4. https://doi.org/10.1109/ROBOT.2002.1014235Google ScholarGoogle ScholarCross RefCross Ref
  11. S. N. Kadam and B. Seth. 2011. LQR controller of one wheel robot stabilized by reaction wheel principle. In 2011 2nd International Conference on Instrumentation Control and Automation. 299–303. https://doi.org/10.1109/ICA.2011.6130176Google ScholarGoogle ScholarCross RefCross Ref
  12. F. Lachekhab and M. Tadjine. 2015. Goal seeking of mobile robot using fuzzy actor critic learning algorithm. In 2015 7th International Conference on Modelling, Identification and Control (ICMIC). 1–6. https://doi.org/10.1109/ICMIC.2015.7409370Google ScholarGoogle ScholarCross RefCross Ref
  13. Q. Lan, S. Li, J. Yang, and L. Guo. 2013. Finite-time control for soft landing on an asteroid based on homogeneous system technique. In Proceedings of the 32nd Chinese Control Conference. 673–678.Google ScholarGoogle Scholar
  14. Ji Li and Chunjiang Qian. 2006. Global finite-time stabilization by dynamic output feedback for a class of continuous nonlinear systems. IEEE Trans. Autom. Control. 51, 5 (2006), 879–884.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. Liu, X. Yang, D. Wang, and Q. Wei. 2015. Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints. IEEE Transactions on Cybernetics 45, 7 (2015), 1372–1385. https://doi.org/10.1109/TCYB.2015.2417170Google ScholarGoogle ScholarCross RefCross Ref
  16. H. Nakamura. 2013. Homogeneous integral finite-time control and its application to robot control. In The SICE Annual Conference 2013. 1884–1889.Google ScholarGoogle Scholar
  17. A. Ortega–Vidal, F. Salazar–Vasquez, and A. Rojas–Moreno. 2020. A comparison between optimal LQR control and LQR predictive control of a planar robot of 2DOF. In 2020 IEEE XXVII International Conference on Electronics, Electrical Engineering and Computing (INTERCON). 1–4. https://doi.org/10.1109/INTERCON50315.2020.9220263Google ScholarGoogle ScholarCross RefCross Ref
  18. Y. P. Pane, S. P. Nageshrao, and R. Babuška. 2016. Actor-critic reinforcement learning for tracking control in robotics. In 2016 IEEE 55th Conference on Decision and Control (CDC). 5819–5826. https://doi.org/10.1109/CDC.2016.7799164Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. Qin, X. He, and D. Zhang. 2011. Nonsingular and fast convergent terminal sliding mode control of robotic manipulators. In Proceedings of the 30th Chinese Control Conference. 2606–2611.Google ScholarGoogle Scholar
  20. Y. Vaghei, A. Ghanbari, and S. M. R. S. Noorani. 2014. Actor-critic neural network reinforcement learning for walking control of a 5-link bipedal robot. In 2014 Second RSI/ISM International Conference on Robotics and Mechatronics (ICRoM). 773–778. https://doi.org/10.1109/ICRoM.2014.6990997Google ScholarGoogle ScholarCross RefCross Ref
  21. Ziwei Wang, Zhang Chen, Bin Liang, and Bo Zhang. 2018. A novel adaptive finite time controller for bilateral teleoperation system. Acta Astronautica 144(2018), 263–270.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ziwei Wang, Zhang Chen, Yiman Zhang, Xingyao Yu, Xiang Wang, and Bin Liang. 2019. Adaptive finite-time control for bilateral teleoperation systems with jittering time delays. International Journal of Robust and Nonlinear Control 29, 4 (2019), 1007–1030.Google ScholarGoogle ScholarCross RefCross Ref
  23. Z. Wang, H. Lam, B. Xiao, Z. Chen, B. Liang, and T. Zhang. 2020. Event-Triggered Prescribed-Time Fuzzy Control for Space Teleoperation Systems Subject to Multiple Constraints and Uncertainties. IEEE Transactions on Fuzzy Systems(2020), 1–1. https://doi.org/10.1109/TFUZZ.2020.3007438Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Z. Wang, B. Liang, Y. Sun, and T. Zhang. 2020. Adaptive Fault-Tolerant Prescribed-Time Control for Teleoperation Systems With Position Error Constraints. IEEE Transactions on Industrial Informatics 16, 7 (2020), 4889–4899.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ziwei Wang, Yu Tian, Yanchao Sun, and Bin Liang. 2020. Finite-time output-feedback control for teleoperation systems subject to mismatched term and state constraints. Journal of the Franklin Institute 357, 16 (2020), 11421–11447.Google ScholarGoogle ScholarCross RefCross Ref
  26. L. Wei and W. Yao. 2015. Design and implement of LQR controller for a self-balancing unicycle robot. In 2015 IEEE International Conference on Information and Automation. 169–173. https://doi.org/10.1109/ICInfA.2015.7279279Google ScholarGoogle ScholarCross RefCross Ref
  27. M. Ye, G. Gao, and J. Zhong. 2020. Finite-Time Lyapunov-based Second-Order Sliding Mode Control for a Parallel Robot for Automobile Electro-Coating Conveying. In 2020 39th Chinese Control Conference (CCC). 3695–3700. https://doi.org/10.23919/CCC50068.2020.9188766Google ScholarGoogle ScholarCross RefCross Ref
  28. Z. Yin, H. Qian, A. Xiao, J. Wu, and G. Liu. 2011. The Application of Adaptive PID Control in the Spray Robot. In 2011 Fourth International Conference on Intelligent Computation Technology and Automation, Vol. 1. 528–531. https://doi.org/10.1109/ICICTA.2011.145Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Zhao, B. Tao, L. Qian, and H. Ding. 2020. Model-based actor-critic learning for optimal tracking control of robots with input saturation. IEEE Transactions on Industrial Electronics(2020), 1–1. https://doi.org/10.1109/TIE.2020.2992003Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICISDM '21: Proceedings of the 2021 5th International Conference on Information System and Data Mining
    May 2021
    162 pages
    ISBN:9781450389549
    DOI:10.1145/3471287

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 September 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)5

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format