Skip to main content
Log in

Heuristic dynamic programming with internal goal representation

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this paper, we analyze an internal goal structure based on heuristic dynamic programming, named GrHDP, to tackle the 2-D maze navigation problem. Classical reinforcement learning approaches have been introduced to solve this problem in literature, yet no intermediate reward has been assigned before reaching the final goal. In this paper, we integrated one additional network, namely goal network, into the traditional heuristic dynamic programming (HDP) design to provide the internal reward/goal representation. The architecture of our proposed approach is presented, followed by the simulation of 2-D maze navigation (10*10) problem. For fair comparison, we conduct the same simulation environment settings for the traditional HDP approach. Simulation results show that our proposed GrHDP can obtain faster convergent speed with respect to the sum of square error, and also achieve lower error eventually.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Fang X, He H, Ni Z, Tang Y (2012) Learning and control in virtual reality for machine intelligence. In: International conference intelligent control and information processing (ICICIP’12), IEEE, Dalian, China, pp 63–67

  • Fu J, He H, Zhou X (2011) Adaptive learning and control for mimo system based on adaptive dynamic programming. IEEE Trans Neural Netw 22(7):1133–1148

    Article  Google Scholar 

  • Fu J, He H, Liu Q, Ni Z (2011) An adaptive dynamic programming approach for closely-coupled mimo system control. In: Int symp neural networks (ISNN’11), pp 1–10

  • Fu J, He H, Ni Z (2011) Adaptive dynamic programming with balanced weights seeking strategy. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), IEEE symposium series on computational intelligence (SSCI), France

  • He P, Jagannathan S (2007) Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input contraints. IEEE Trans Syst Man Cybern Part B-Cybern 37(2):425–436

    Article  Google Scholar 

  • He H (2011) Self-adaptive systems for machine intelligence. Wiley, New York

    Book  Google Scholar 

  • He H, Ni Z, Fu J (2012) A three-network architecture for on-line learning and optimization based on adaptive dynamic programming. Neurocomputing 78(1):3–13

    Article  Google Scholar 

  • He H, Ni Z, Zhao D (2012) Reinforcement learning and approximate dynamic programming for feedback control, ch. learning and optimization in hierarchical adaptive critic design. Wiley-IEEE Press, Hoboken

    Google Scholar 

  • He H, Ni Z, Prokhorov DV (2011) Actor-critic design for on-line learning and optimization for machine intelligence. In: International conference on cognitive and neural systems (ICCNS’11), Boston

  • He H, Ni Z, Zhao D (2012) Data-driven learning and control with multiple critic networks. In: The 10th world congress on, intelligent control and automation (WCICA’12), pp 523–527

  • Ilin R, Kozma R, Werbos P (2008) Beyond feedforward models trained by backpropagation: a practical training tool for a more efficient universal approximator. Neural Netw IEEE Trans 19(6):929–937

    Article  Google Scholar 

  • Ilin R, Kozma R, Werbos (2006) Cellular SRN trained by extended Kalman filter shows promise for ADP. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN), IEEE, pp 506–510

  • Ilin R, Kozma R, Werbos P (2007) Efficient learning in cellular simultaneous recurrent neural networks-the case of maze navigation problem. In: IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL), IEEE, pp 324–329

  • Lewis F, Liu D (eds) (2013) Reinforcement learning and approximate dynamic programming for feedback control. Wiley-IEEE Press, Hoboken

    Google Scholar 

  • Liu F, Sun J, Si J, Guo W, Mei S (2012) A boundedness result for the direct heuristic dynamic programming. Neural Netw 32:229–235

    Article  MATH  Google Scholar 

  • Liu D, Wang D, Zhao D, Wei Q, Jin N (2012) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634

    Article  Google Scholar 

  • Liu D, Wei Q (2013) Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans Cybern 43(2):779–789

    Article  Google Scholar 

  • Mitchell TM (1997) Machine learning. McGraw-Hill, Inc, New York

    MATH  Google Scholar 

  • Ni Z, He H, Wen J (2013) Adaptive learning in tracking control based on the dual critic network design. IEEE Trans Neural Netw Learn Syst 6(24):913–928

    Article  Google Scholar 

  • Ni Z, Fang X, He H, Zhao D, Xu X (2013) Real-time tracking control on adaptive critic design with uniformly ultimately bounded condition. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL’13). IEEE symposium series on computational intelligence (SSCI), USA

  • Ni Z, He H, Prokhorov DV, Fu J (2011) An online actor-critic learning approach with Levenberg-Marquardt algorithm. In: The 2011 international joint conference on neural networks (IJCNN), IEEE, pp 2333–2340

  • Ni Z, He H, Prokhorov DV (2012) Adaptive learning with goal generator network based on heuristic dynamic programming. In: Internatinal conference on cognitive and neural systems (ICCNS’12), Boston

  • Ni Z, He H, Wen J, Xu X (2013) Goal representation heuristic dynamic programming on maze navigation. IEEE Trans Neural Netw Learn Syst (to be published)

  • Ni Z, He H, Zhao D, Prokhorov D (2012) Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming. In: The 2012 international joint conference on neural networks (IJCNN), IEEE, pp 1–8

  • Pang X, Werbos PJ (1996) Neural network design for j function approximation in dynamic programming. In: Mathematical modelling and scientific computing. http://arxiv.org/pdf/adap-org/9806001.pdf

  • Prokhorov DV (1997) Adaptive critic designs and their applications, PhD. Dissertation. PhD thesis

  • Prokhorov DV, Santiago RA, Wunsch DC II (1995) Adaptive critic designs: a case study for neurocontrol. Neural Netw 8(9):1367–1372

    Article  Google Scholar 

  • Prokhorov D, Wunsch D (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997–1007

    Article  Google Scholar 

  • Si J, Wang Y-T (2001) Online learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276

    Article  MathSciNet  Google Scholar 

  • Si J, Barto AG, Powell WB, Wunsch DC (eds) (2004) Handbook of learning and approximate dynamic programming. Wiley, New York

    Google Scholar 

  • Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

  • Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48:1825–1832

    Article  MathSciNet  MATH  Google Scholar 

  • Werbos PJ (1990) Consistency of HDP applied to a simple reinforcement learning problem. Neural Netw 3(2):179–189

    Article  Google Scholar 

  • Werbos PJ (1992) Handbook of itelligent control, ch. Approximate dynamic programming for real-teim control and nerual modeling. Van Nostrand Reinhold, New York

  • Werbos PJ (2008) Adp: the key direction for future research in intelligent control and understanding brain intelligence. IEEE Trans Syst Man Cybern Part B-Cybern 38(4):898–900

    Google Scholar 

  • Werbos PJ (2009) Intelligence in the brain: a theory of how it works and how to build it. Neural Netw 22(3):200–212

    Article  Google Scholar 

  • Werbos P (2013) Reinforcement learning and approximate dynamic programming for feedback control, ch. reinforcement learning and approximate dynamic programming (RLADP)-foundations, common misconceptions and challenges ahead. Wiley-IEEE Press, Hoboken

    Google Scholar 

  • Werbos P, Pang X (1996) Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot. In: Systems, man, and cybernetics, 1996, IEEE international conference on, vol 3, pp 1764–1769

  • Wiering M, Van Hasselt H (2007) Two novel on-policy reinforcement learning algorithms based on td (\(\lambda \))-methods. In: IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL), IEEE, pp 280–287

  • Wunsch D (2000) The cellular simultaneous recurrent network adaptive critic design for the generalized maze problem has a simple closed-form solution. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN), IEEE, vol 3, pp 79–82

  • Yang L, Si J, Tsakalis KS, Rodriguez AA (2009) Direct heuristic dynamic programming for nonlinear tracking conrol with filtered tracking error. IEEE Trans Syst Man Cybern Part B-Cybern 39(6):1617–1622

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Science Foundation (NSF) under grant CAREER ECCS 1053717, Army Research Office (ARO) under grant W911NF-12-1-0378, and NSF-DFG Collaborative Research on “Autonomous Learning” (a supplement grant to CNS 1117314).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haibo He.

Additional information

Communicated by C. Alippi, D. Zhao and D. Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ni, Z., He, H. Heuristic dynamic programming with internal goal representation. Soft Comput 17, 2101–2108 (2013). https://doi.org/10.1007/s00500-013-1112-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-013-1112-9

Keywords

Navigation