Skip to main content
Log in

Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Approximate policy iteration (API) is studied to solve undiscounted optimal control problems in this paper. A discrete-time system with the continuous-state space and the finite-action set is considered. As approximation technique is used for the continuous-state space, approximation errors exist in the calculation and disturb the convergence of the original policy iteration. In our research, we analyze and prove the convergence of API for undiscounted optimal control. We use an iterative method to implement approximate policy evaluation and demonstrate that the error between approximate and exact value functions is bounded. Then, with the finite-action set, the greedy policy in policy improvement is generated directly. Our main theorem proves that if a sufficiently accurate approximator is used, API converges to the optimal policy. For implementation, we introduce a fuzzy approximator and verify the performance on the puddle world problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abu-Khalaf M, Lewis FL. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica. 2005;41(5):779–91.

    Article  Google Scholar 

  2. Abu-Khalaf M, Lewis F, Huang J. Policy iterations on the Hamilton–Jacobi–Isaacs equation for \(\text{ H }_{\infty }\) state feedback control with input saturation. IEEE Trans Autom Control. 2006;51(12):1989–95.

    Article  Google Scholar 

  3. Al-Tamimi A, Abu-Khalaf M, Lewis F. Adaptive critic designs for discrete-time zero-sum games with application to \(\text{ H }_{\infty }\) control. IEEE Trans Syst Man Cybern B. 2007;37(1):240–7.

    Article  Google Scholar 

  4. Al-Tamimi A, Lewis F, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Trans Syst Man Cybern B. 2008;38(4):943–9.

    Article  Google Scholar 

  5. Barty K, Girardeau P, Roy JS, Strugarek C. Q-learning with continuous state spaces and finite decision set. In: Proceedings of the 2007 IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL 2007); 2007. pp. 346–351.

  6. Bertsekas DP, Tsitsiklis JN. Neuro-dynamic programming. Belmont, MA: Athena Scientific; 1996.

    Google Scholar 

  7. Boaro M, Fuselli D, Angelis F, Liu D, Wei Q, Piazza F. Adaptive dynamic programming algorithm for renewable energy scheduling and battery management. Cogn Comput. 2013;5(2):264–77.

    Article  Google Scholar 

  8. Busoniu L, Ernst D, De Schutter B, Babuska R. Fuzzy approximation for convergent model-based reinforcement learning. In: Proceedings of the 2007 IEEE international conference on Fuzzy systems (FUZZ-IEEE-07), London, UK; 2007. pp. 968–973.

  9. Busoniu L, Babuska R, De Schutter B, Ernst D. Reinforcement learning and dynamic programming using function approximators. New York: CRC Press; 2010.

    Book  Google Scholar 

  10. Chen F, Jiang B, Tao G. Fault self-repairing flight control of a small helicopter via fuzzy feedforward and quantum control techniques. Cogn Comput. 2012;4(4):543–8.

    Article  Google Scholar 

  11. Derhami V, Majd VJ, Nili Ahmadabadi M. Exploration and exploitation balance management in fuzzy reinforcement learning. Fuzzy Sets Syst. 2010;161(4):578–95.

    Article  Google Scholar 

  12. Heydari A. Revisiting approximate dynamic programming and its convergence. IEEE Trans Cybern. 2014;44(12):2733–43.

    Article  PubMed  Google Scholar 

  13. Howard R. Dynamic programming and Markov processes. Cambridge, MA: MIT Press; 1960.

    Google Scholar 

  14. Hui G, Huang B, Wang Y, Meng X. Quantized control design for coupled dynamic networks with communication constraints. Cogn Comput. 2013;5(2):200–6.

    Article  Google Scholar 

  15. Ikonen E, Najim K. Multiple model-based control using finite controlled markov chains. Cogn Comput. 2009;1(3):234–43.

    Article  Google Scholar 

  16. Jia Z, Song Y, Cai W. Bio-inspired approach for smooth motion control of wheeled mobile robots. Cogn Comput. 2013;5(2):252–63.

    Article  Google Scholar 

  17. Lewis F, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag. 2009;9(3):32–50.

    Article  Google Scholar 

  18. Liu D, Wei Q. Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans Cybern. 2013;43(2):779–89.

    Article  Google Scholar 

  19. Liu D, Wei Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst. 2014;25(3):621–34.

    Article  PubMed  Google Scholar 

  20. Meng F, Chen X. Correlation coefficients of hesitant fuzzy sets and their application based on fuzzy measures. Cogn Comput. 2015;7(4):445–63.

    Article  Google Scholar 

  21. Munos R. Error bounds for approximate policy iteration. In: Proceedings of the 20th international conference on machine learning, Washington, Columbia; 2003. pp. 560–576.

  22. Muse D, Wermter S. Actor-critic learning for platform-independent robot navigation. Cogn Comput. 2009;1(3):203–20.

    Article  Google Scholar 

  23. Nedić A, Bertsekas DP. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn Syst. 2003;13(1–2):79–110.

    Google Scholar 

  24. Samar R, Kamal W. Optimal path computation for autonomous aerial vehicles. Cogn Comput. 2012;4(4):515–25.

    Article  Google Scholar 

  25. Song Y, Li Q, Kang Y. Conjugate unscented fastslam for autonomous mobile robots in large-scale environments. Cogn Comput. 2014;6(3):496–509.

    Article  Google Scholar 

  26. Sutton RS, Barto AG. Reinforcement learning: an introduction. Cambridge, MA: MIT Press; 1998.

    Google Scholar 

  27. Vieira D, Adeodato P, Goncalves P. A temporal difference GNG-based algorithm that can learn to control in reinforcement learning environments. In: Proceedings of the 12th international conference on machine learning and applications (ICMLA 2013), 2013; vol 1, pp. 329–332.

  28. Wang D, Liu D, Li H. Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems. IEEE Trans Autom Sci Eng. 2014;11(2):627–32.

    Article  CAS  Google Scholar 

  29. Wang Y, Feng G. On finite-time stability and stabilization of nonlinear port-controlled Hamiltonian systems. Sci China Inf Sci. 2013;56(10):1–14.

    Google Scholar 

  30. Wei Q, Liu D. A novel iterative \(\theta\)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans Autom Sci Eng. 2014;11(4):1176–90.

    Article  Google Scholar 

  31. Zhang H, Liu D, Luo Y, Wang D. Adaptive dynamic programming for control: algorithms and stability. London: Springer; 2013.

    Book  Google Scholar 

  32. Zhao D, Zhu Y. MEC-a near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans Neural Netw Learn Syst. 2015;26(2):346–56.

    Article  PubMed  Google Scholar 

  33. Zhao Y, Cheng D. On controllability and stabilizability of probabilistic Boolean control networks. Sci China Inf Sci. 2014;57(1):1–14.

    Article  CAS  Google Scholar 

  34. Zhu Y, Zhao D, Liu D. Convergence analysis and application of fuzzy-HDP for nonlinear discrete-time HJB systems. Neurocomputing. 2015;149:124–31.

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (No. 61273136), State Key Laboratory of Robotics and System (SKLRS-2015-ZD-04), and National Science Foundation (NSF) under grant ECCS 1053717.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongbin Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Zhao, D., He, H. et al. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems. Cogn Comput 7, 763–771 (2015). https://doi.org/10.1007/s12559-015-9350-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-015-9350-z

Keywords

Navigation