Abstract
A new neural network approach is described for the task of pole-balancing, considered a benchmark learning control problem. This approach combines Barto, Sutton and Anderson's [1] Associative Search Element (ASE) with a Neuro-Resistive Grid (NRG) [2] acting as Adaptive Critic Element (ACE). The novel feature in NRG is that it provides evaluation of a state based on propagation of the failure information to the neighbours in the grid. NRG is updated only on a failure, and provides ASE with a continuous internal reinforcement signal by comparing the value of the present state to the previous state. The resulting system learns more rapidly and with fewer computations than that of Barto et al.[1]. To establish a uniform basis of comparison of algorithms for pole balancing, both the systems are simulated using benchmark parameters and tests specified in Geva and Sitte [3].
Similar content being viewed by others
References
Barto AG, Sutton RS, Anderson CW. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst, Man & Cybern 1983; 13: 834–846
Bugmann G, Taylor JG, Denham MJ. Route finding by neural nets. In: Taylor JG (ed), Neural Networks, Unicom & Alfred Waller, UK, 1995, 217–231
Geva S, Sitte J. A Cartpole experiment benchmark for trainable controllers. IEEE Control Systems Magazine 1993; 13: 40–51
Rosen BE, Goodwin JM, Vidal JJ. Process control with adaptive range coding. Biol Cybern 1992; 66: 419–428
Sutton RS. Learning to predict by the method of temporal differences. Machine Learning 1988; 3: 9–44
Barto AG, Sutton RS, Watkins CJCH. Learning and sequential decision making. In: Gabriel M, Moore J. (ed.), Learning and Computational Neuroscience: Foundation of Adaptive Networks, MIT Press, Cambridge, MA, 1990, 539–602
Barto AG, Bradtke SJ, Singh SP. Learning to act using real-time dynamic programming. Artificial Intelligence 1995; 72: 81–138
Ribeiro CHC. Attentional mechanism as a strategy for generalisation in the Q-learning algorithm. In: Fogelman-Soulié F, Gallinari P. (ed.), Proc. ICANN '95, Paris, 1995; 1: 455–460
Connolly CI, Burns JB, Weiss R. Path planning using Laplace's equation. Proc IEEE Int Conf Robotics & Automation 1990; 2102–2106
Tarassenko L, Blake A. Analogue computation of collision-free paths. Proc IEEE Int Conf on Robotics & Automation, Sacramento, CA, 1991, 540–545
Sutton RS, Pinette B. The learning of world models by connectionist networks. Proc Seventh Ann Conf of the Cog Sci Soc, Lawrence Erlbaum, 1985, 54–64
Moore AW. Efficient memory-based learning for robot control. PhD thesis, University of Cambridge, 1990
Tesauro G. Temporal difference learning and TD-Gammon. Comm ACM 1995; 38(3): 58–68
Prokhorov DV, Santiago RA, Wunsch II DC. Adaptive critic designs: A case study for neurocontrol. Neural Networks 1995; 8(9): 1367–1372
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bapi, R.S., D'Cruz, B. & Bugmann, G. Neuro-Resistive Grid approach to trainable controllers: A pole balancing example. Neural Comput & Applic 5, 33–44 (1997). https://doi.org/10.1007/BF01414101
Issue Date:
DOI: https://doi.org/10.1007/BF01414101