Abstract
According to the autonomous learning problem for the two-wheeled self-balancing robot, a novel adaptive tropism reward ADHDP with robust property was proposed, which can get the online adaptive tropism reward information. The whole learning system used a form of three networks, including action neural networks (ANN), adaptive tropism reward neural networks (ATRNN) and critic neural networks (CNN). The design of adaptive tropism reward neural networks took example from the learning mechanism of actor-critic structure. And through the primary binary reward signal, the continuous secondary reward signal can be got adaptively and become the basis of critic neural networks learning. Through the simulation in two-wheeled self-balancing robot, we can conclude that the proposed learning mechanism is effective and has a better progressive learning property. The optimal learning performance is got finally. Through the comparison of statistical experiment, it can be found that the proposed method has a certain anti-noise ability and the robust learning performance is better.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, Z.-Y., Dai, Y.-P., Li, Y.-W., et al.: A kind of utility function in adaptive dynamic programming for inverted pendulum control. In: 2010 International Conference on Machine Learning and Cybernetics (ICMLC), pp. 1538–1543 (2010)
Doya, K.: Efficient nonlinear control with actor–tutor architecture. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, pp. 1012–1018. MIT Press, Cambridge (1997)
Doya, K.: Reinforcement Learning in Continuous Time and Space. Neural Computation 12(1), 219–245 (2000)
Li, X., Yang, Y., Xu, X.: Multiagent AGV dispatching system based on hierarchical reinforcement learning. Control and Decision 17(3), 292–296 (2002)
He, H., Ni, Z., Fu, J.: A three-network architecture for on-line learning and optimization based on adaptive dynamic programming. Neurocomputing 78(1), 3–13 (2012)
Liu, B., Li, S., Lou, Y., et al.: A hierarchical learning architecture with multiple-goal representations and multiple timescale based on approximate dynamic programming. Neural Computing & Applications, 1–17 (2012)
Botvinick, M.M., Niv, Y., Barto, A.C.: Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition 113(3), 262–280 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, J., Li, Z. (2013). A Novel Adaptive Tropism Reward ADHDP Method with Robust Property. In: Liu, D., Alippi, C., Zhao, D., Hussain, A. (eds) Advances in Brain Inspired Cognitive Systems. BICS 2013. Lecture Notes in Computer Science(), vol 7888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38786-9_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-38786-9_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38785-2
Online ISBN: 978-3-642-38786-9
eBook Packages: Computer ScienceComputer Science (R0)