Abstract
In this paper we present a Reinforcement Learning method — B-Learning — for the control of a water production plant. A comparison between B-Learning and Dynamic Programming is provided from both theoretical and performance points of view. It is shown that Reinforcement-based neural control can lead to results comparable in quality to Dynamic Programming-based though less computationnally expensive.
The authors are grateful to Patrick Lesueur who performed the experiments using Dynamic Programming. 1r(t) is equal to 0 if the system is in an allowed state and equal to −1 if not.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Charles W. Anderson. Learning to control an inverted pendulum using neural networks. IEEE Control Magazine., pages 31–37, april 1989.
Andrew G. Barto, Steven J. Bradtke, and Satinder P. Singh. Real-time learning and control using asynchronous dynamic programming. Technical Report 91-57, University of Massachusetts, Dept of Computer Science, Amherst MA 01003, August 1991.
Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. Neuronlike adaptive elements that can solve difficult learning problems. IEEE Transactions on Systems, Man and Cybernetics, SMC-13(5):834–846, September October 1983.
J. C. Hoskins and D. M. Himelblau. Process control via incremental neural networks and reinforcement learning. In Chicago Meeting of American Institute of Chemical Engineers, Chicago Illinois, November 1990.
Wayne C. Jouse and John G. Williams. The control of nuclear reactor start-up using drive reinforcement theory. In Cihan H. Dagli, Soundar R. T. Kumara, and Yung C. Shin, editors, Intelligent Engineering Systems Through Artificial Neural Networks, pages 537–544, St. Louis, Missouri, USA, November 1991.
R. Kora, P. Lesueur, and P. Villon. An adaptive optimal control algorithm for water treatment plants. In to be published, 1992.
Thibault Langlois and Stéphane Canu. B-learning: a reinforcement learning variant for the control of a plant. In Intelligent Engineering Systems Through Artificial Neural Networks (ANNIE'92). ASME Press, 1992.
Thibault Langlois and Stéphane Canu. Control of time-delay systems using reinforcement learning. In Artificial Neural Networks, 2. Elsevier Science Publishers, 1992.
Long-Ji Lin. Programming robots using reinforcement learning and teaching. In NinthNational Conference on Artificial Intelligence, pages 781–786, 1991.
Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine-Learning, 8(3–4):923–321, 1992.
Richard S. Sutton. Learning to predict by the method of temporal differences. Machine learning, 3:9–44, 1988.
Richard S. Sutton. Integrated modeling control based on reinforcement learning and dynamic programming, hi Richard P. Lippman, John E. Moody, and David S. Touretzky, editors, Advances in Neural Information Processing Systems, volume 3, pages 471–478. Morgan Kaufmann, 1990.
Christopher J. C. H. Watkins. Learning with Delayed Rewards. PhD thesis, Cambridge University Psychology Department, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Langlols, T., Canu, S. (1993). B-Learning: A reinforcement learning algorithm, comparison with dynamic programming. In: Mira, J., Cabestany, J., Prieto, A. (eds) New Trends in Neural Computation. IWANN 1993. Lecture Notes in Computer Science, vol 686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56798-4_157
Download citation
DOI: https://doi.org/10.1007/3-540-56798-4_157
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56798-1
Online ISBN: 978-3-540-47741-9
eBook Packages: Springer Book Archive