Skip to main content
Log in

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

  • Published:
International Journal of Fuzzy Systems Aims and scope Submit manuscript

Abstract

In this work, we propose a new fuzzy reinforcement learning algorithm for differential games that have continuous state and action spaces. The proposed algorithm uses function approximation systems whose parameters are updated differently from the updating mechanisms used in the algorithms proposed in the literature. Unlike the algorithms presented in the literature which use the direct algorithms to update the parameters of their function approximation systems, the proposed algorithm uses the residual gradient value iteration algorithm to tune the input and output parameters of its function approximation systems. It has been shown in the literature that the direct algorithms may not converge to an answer in some cases, while the residual gradient algorithms are always guaranteed to converge to a local minimum. The proposed algorithm is called the residual gradient fuzzy actor–critic learning (RGFACL) algorithm. The proposed algorithm is used to learn three different pursuit–evasion differential games. Simulation results show that the performance of the proposed RGFACL algorithm outperforms the performance of the fuzzy actor–critic learning and the Q-learning fuzzy inference system algorithms in terms of convergence and speed of learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Passino, K.M., Yurkovich, S.: Fuzzy control. Addison Wesley Longman, Inc., Menlo Park (1998)

    MATH  Google Scholar 

  2. Marin, N., Ruiz, M.D., Sanchez, D.: Fuzzy frameworks for mining data associations: fuzzy association rules and beyond. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 6(2), 50–69 (2016)

    Article  Google Scholar 

  3. Micera, S., Sabatini, A.M., Dario, P.: Adaptive fuzzy control of electrically stimulated muscles for arm movements. Med. Biol. Eng. Comput. 37(6), 680–685 (1999)

    Article  Google Scholar 

  4. Daldaban, F., Ustkoyuncu, N., Guney, K.: Phase inductance estimation for switched reluctance motor using adaptive neuro- fuzzy inference system. Energy Convers. Manag. 47(5), 485–493 (2005)

    Article  Google Scholar 

  5. Jang, J.S.R., Sun, C.T., Mizutani, E.: Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence. Prentice Hall, Upper Saddle River (1997)

    Google Scholar 

  6. Labiod, S., Guerra, T.M.: Adaptive fuzzy control of a class of SISO nonaffine nonlinear systems. Fuzzy Sets Syst. 158(10), 1126–1137 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. Lam, H.K., Leung, F.H.F.: Fuzzy controller with stability and performance rules for nonlinear systems. Fuzzy Sets Syst. 158(2), 147–163 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  8. Hagras, H., Callaghan, V., Colley, M.: Learning and adaptation of an intelligent mobile robot navigator operating in unstructured environment based on a novel online Fuzzy-Genetic system. Fuzzy Sets Syst. 141(1), 107–160 (2004)

    Article  Google Scholar 

  9. Mucientes, M., Moreno, D.L., Bugarn, A., Barro, S.: Design of a fuzzy controller in mobile robotics using genetic algorithms. Appl. Soft Comput. 7(2), 540–546 (2007)

    Article  Google Scholar 

  10. Wang, L.X.: A Course in Fuzzy Systems and Control. Prentice Hall, Upper Saddle River (1997)

    MATH  Google Scholar 

  11. Desouky, S.F., Schwartz, H.M.: Self-learning fuzzy logic controllers for pursuit-evasion differential games. Robot. Auton. Syst. 59, 22–33 (2011)

    Article  MATH  Google Scholar 

  12. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 5, 834–846 (1983)

    Article  Google Scholar 

  13. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 1.1. MIT press, Cambridge (1998)

    Google Scholar 

  14. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Google Scholar 

  15. Awheda, M.D., Schwartz, H.M.: The residual gradient FACL algorithm for differential games. IN: IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1006–1011 (2015)

  16. Hinojosa, W., Nefti, S., Kaymak, U.: Systems control with generalized probabilistic fuzzy-reinforcement learning. IEEE Trans. Fuzzy Syst. 19(1), 51–64 (2011)

    Article  Google Scholar 

  17. Rodríguez, M., Iglesias, R., Regueiro, C.V., Correa, J., Barro, S.: Autonomous and fast robot learning through motivation. In: Robotics and Autonomous Systems, vol. 55.9, pp. 735–740. Elsevier (2007)

  18. Schwartz, H.M.: Multi-agent machine learning: a reinforcement approach. Wiley, New York (2014)

    Book  MATH  Google Scholar 

  19. Awheda, M.D., Schwartz, H.M.: A decentralized fuzzy learning algorithm for pursuit-evasion differential games with superior evaders. J. Intell. Robot. Syst. 83(1), 35–53 (2016)

    Article  Google Scholar 

  20. Awheda, M.D., Schwartz, H.M.: A fuzzy learning algorithm for multi-player pursuit-evasion differential games with superior evaders. In: Proceedings of the 2016 IEEE International Systems Conference, Orlando, Florida (2016)

  21. Awheda, M.D., Schwartz, H.M.: A fuzzy reinforcement learning algorithm using a predictor for pursuit-evasion games. In: Proceedings of the 2016 IEEE International Systems Conference, Orlando, Florida (2016)

  22. Smart, W.D., Kaelbling, L.P.: Effective reinforcement learning for mobile robots. In: IEEE International Conference on Robotics and Automation, Proceedings ICRA’02, 4 (2002)

  23. Ye, C., Yung, N.H.C., Wang, D.: A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 33(1), 17–27 (2003)

    Article  Google Scholar 

  24. Kondo, T., Ito, K.: A reinforcement learning with revolutionary state recruitment strategy for autonomous mobile robots control. Robot. Auton. Syst. 46, 111–124 (2004)

    Article  Google Scholar 

  25. Gutnisky, D.A., Zanutto, B.S.: Learning obstacle avoidance with an operant behavior model. Artif. Life 10(1), 65–81 (2004)

    Article  Google Scholar 

  26. Dai, X., Li, C., Rad, A.B.: An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. IEEE Trans. Intell. Transp. Syst. 6(3), 285–293 (2005)

    Article  Google Scholar 

  27. Luo, B., Wu, H.N., Huang, T.: Off-policy reinforcement learning for \(H_{\infty }\) control design. IEEE Trans. Cybern. 45.1, 65–76 (2015)

    Article  Google Scholar 

  28. Luo, B., Wu, H.N., Li, H.X.: Adaptive optimal control of highly dissipative nonlinear spatially distributed processes with neuro-dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 26.4, 684–696 (2015)

    MathSciNet  Google Scholar 

  29. Luo, B., Wu, H.N., Huang, T., Liu, D.: Reinforcement learning solution for HJB equation arising in constrained optimal control problem. In: Neural Networks, vol. 71, pp. 150–158. Elsevier (2015)

  30. Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. In: Automatica, vol. 50.1, pp. 193–202. Elsevier (2014)

  31. Dixon, W.: Optimal adaptive control and differential games by reinforcement learning principles, J. Guid. Control Dyn. 37.3, 1048–1049 (2014)

    Article  Google Scholar 

  32. Luo, B., Wu, H.N., Li, H.X.: Data-based suboptimal neuro-control design with reinforcement learning for dissipative spatially distributed processes, Ind. Eng. Chem. Res. 53.19, 8106–8119 (2014)

    Article  Google Scholar 

  33. Wu, H.N., Luo, B.: Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear control. IEEE Trans. Neural Netw. Learn. Syst. 23.12, 1884–1895 (2012)

    Google Scholar 

  34. Xia, Z., Zhao, D.: Online reinforcement learning control by Bayesian inference. IET Control Theory Appl. 10(12), 1331–1338 (2016)

    Article  MathSciNet  Google Scholar 

  35. Liu, Y.J., Gao, Y., Tong, S., Li, Y.: Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete-time systems with dead-zone. IEEE Trans. Fuzzy Syst. 24(1), 16–28 (2016)

    Article  Google Scholar 

  36. Zhu, Y., Zhao, D., Li, X.: Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET Control Theory Appl. 10(12), 1339–1347 (2016)

    Article  MathSciNet  Google Scholar 

  37. Kamalapurkar, R., Walters, P., Dixon, W.E.: Model-based reinforcement learning for approximate optimal regulation. Automatica 64, 94–104 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  38. Jiang, H., Zhang, H., Luo, Y., Wang, J.: Optimal tracking control for completely unknown nonlinear discrete-time Markov jump systems using data-based reinforcement learning method. Neurocomputing 194, 176–182 (2016)

    Article  Google Scholar 

  39. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Google Scholar 

  40. Dayan, P., Sejnowski, T.J.: TD(\(\lambda\)) converges with probability 1. Mach. Learn. 14, 295–301 (1994)

    Google Scholar 

  41. Dayan, P.: The convergence of TD(\(\lambda\)) for general \(\lambda\). Mach. Learn. 8(3–4), 341–362 (1992)

    MATH  Google Scholar 

  42. Jakkola, T., Jordan, M., Singh, S.: On the convergence of stochastic iterative dynamic programming. Neural Comput. 6, 1185–1201 (1993)

    Article  Google Scholar 

  43. Jouffe, L.: Fuzzy inference system learning by reinforcement methods. IEEE Trans. Syst. Man Cybern. C 28.3, 338–355 (1998)

    Article  Google Scholar 

  44. Bonarini, A., Lazaric, A., Montrone, F., Restelli, M.: Reinforcement distribution in fuzzy Q-learning. Fuzzy Sets Syst. 160(10), 1420–1443 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  45. Desouky, S.F., Schwartz, H.M.: Q (\(\lambda\))-learning adaptive fuzzy logic controllers for pursuit–evasion differential games. Int. J. Adapt. Control Signal Process. 25(10), 910–927 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  46. Givigi Jr., S.N., Schwartz, H.M., Lu, X.: A reinforcement learning adaptive fuzzy controller for differential games. J. Intell. Robot. Syst. 59, 3–30 (2010)

    Article  MATH  Google Scholar 

  47. Wang, X.S., Cheng, Y.H., Yi, J.Q.: A fuzzy Actor–Critic reinforcement learning network. Inf. Sci. 177(18), 3764–3781 (2007)

    Article  Google Scholar 

  48. Baird, L.: Residual algorithms: reinforcement learning with function approximation. In: ICML, pp. 30–37 (1995)

  49. Boyan, J., Moore, A.W.: Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, vol. 7, pp. 369–376. Cambridge, MA, The MIT Press (1995)

  50. Gordon, G.J.: Reinforcement learning with function approximation converges to a region. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1040–1046. MIT Press (2001)

  51. Schoknecht, R., Merke, A.: TD(0) converges provably faster than the residual gradient algorithm. In: ICML (2003)

  52. Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  53. Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artif. Intell. 136(2), 215–250 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  54. Van Buijtenen, W.M., Schram, G., Babuska, R., Verbruggen, H.B.: Adaptive fuzzy control of satellite attitude by reinforcement learning. IEEE Trans. Fuzzy Syst. 6(2), 185–194 (1998)

    Article  Google Scholar 

  55. Mamdani, E.H., Assilian, S.: An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. Man Mach. Stud. 7.1, 113 (1975)

    MATH  Google Scholar 

  56. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modelling and control. IEEE Trans. Syst. Man Cybern. SMC 15(1), 116–132 (1985)

    Article  MATH  Google Scholar 

  57. Sugeno, M., Kang, G.: Structure identification of fuzzy model. Fuzzy Sets Syst. 28, 15–33 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  58. Isaacs, R.: Differential Games. Wiley, New York (1965)

    MATH  Google Scholar 

  59. LaValle, S.M.: Planning Algorithms. Cambridge University Press, Cambridge (2006)

    Book  MATH  Google Scholar 

  60. Lim, S.H., Furukawa, T., Dissanayake, G., Whyte, H.F.D.: A time-optimal control strategy for pursuit-evasion games problems, In: International Conference on Robotics and Automation, New Orleans, LA (2004)

  61. Desouky, S.F., Schwartz, H.M.: Different hybrid intelligent systems applied for the pursuit–evasion game. In: 2009 IEEE International Conference on Systems, Man, and Cybernetics, pp. 2677–2682 (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mostafa D. Awheda.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Awheda, M.D., Schwartz, H.M. A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games. Int. J. Fuzzy Syst. 19, 1058–1076 (2017). https://doi.org/10.1007/s40815-016-0284-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40815-016-0284-8

Keywords

Navigation