Skip to main content
Log in

Neuro-Resistive Grid approach to trainable controllers: A pole balancing example

  • Articles
  • Published:
Neural Computing & Applications Aims and scope Submit manuscript

Abstract

A new neural network approach is described for the task of pole-balancing, considered a benchmark learning control problem. This approach combines Barto, Sutton and Anderson's [1] Associative Search Element (ASE) with a Neuro-Resistive Grid (NRG) [2] acting as Adaptive Critic Element (ACE). The novel feature in NRG is that it provides evaluation of a state based on propagation of the failure information to the neighbours in the grid. NRG is updated only on a failure, and provides ASE with a continuous internal reinforcement signal by comparing the value of the present state to the previous state. The resulting system learns more rapidly and with fewer computations than that of Barto et al.[1]. To establish a uniform basis of comparison of algorithms for pole balancing, both the systems are simulated using benchmark parameters and tests specified in Geva and Sitte [3].

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Barto AG, Sutton RS, Anderson CW. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst, Man & Cybern 1983; 13: 834–846

    Google Scholar 

  2. Bugmann G, Taylor JG, Denham MJ. Route finding by neural nets. In: Taylor JG (ed), Neural Networks, Unicom & Alfred Waller, UK, 1995, 217–231

    Google Scholar 

  3. Geva S, Sitte J. A Cartpole experiment benchmark for trainable controllers. IEEE Control Systems Magazine 1993; 13: 40–51

    Google Scholar 

  4. Rosen BE, Goodwin JM, Vidal JJ. Process control with adaptive range coding. Biol Cybern 1992; 66: 419–428

    Google Scholar 

  5. Sutton RS. Learning to predict by the method of temporal differences. Machine Learning 1988; 3: 9–44

    Google Scholar 

  6. Barto AG, Sutton RS, Watkins CJCH. Learning and sequential decision making. In: Gabriel M, Moore J. (ed.), Learning and Computational Neuroscience: Foundation of Adaptive Networks, MIT Press, Cambridge, MA, 1990, 539–602

    Google Scholar 

  7. Barto AG, Bradtke SJ, Singh SP. Learning to act using real-time dynamic programming. Artificial Intelligence 1995; 72: 81–138

    Google Scholar 

  8. Ribeiro CHC. Attentional mechanism as a strategy for generalisation in the Q-learning algorithm. In: Fogelman-Soulié F, Gallinari P. (ed.), Proc. ICANN '95, Paris, 1995; 1: 455–460

  9. Connolly CI, Burns JB, Weiss R. Path planning using Laplace's equation. Proc IEEE Int Conf Robotics & Automation 1990; 2102–2106

  10. Tarassenko L, Blake A. Analogue computation of collision-free paths. Proc IEEE Int Conf on Robotics & Automation, Sacramento, CA, 1991, 540–545

  11. Sutton RS, Pinette B. The learning of world models by connectionist networks. Proc Seventh Ann Conf of the Cog Sci Soc, Lawrence Erlbaum, 1985, 54–64

  12. Moore AW. Efficient memory-based learning for robot control. PhD thesis, University of Cambridge, 1990

  13. Tesauro G. Temporal difference learning and TD-Gammon. Comm ACM 1995; 38(3): 58–68

    Google Scholar 

  14. Prokhorov DV, Santiago RA, Wunsch II DC. Adaptive critic designs: A case study for neurocontrol. Neural Networks 1995; 8(9): 1367–1372

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raju S. Bapi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bapi, R.S., D'Cruz, B. & Bugmann, G. Neuro-Resistive Grid approach to trainable controllers: A pole balancing example. Neural Comput & Applic 5, 33–44 (1997). https://doi.org/10.1007/BF01414101

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01414101

Keywords

Navigation