Module-Based Reinforcement Learning: Experiments with a Real Robot

Kalmár, Zsolt; Szepesvári, Csaba; Lőrincz, András

doi:10.1023/A:1008858222277

Module-Based Reinforcement Learning: Experiments with a Real Robot

Published: July 1998

Volume 5, pages 273–295, (1998)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Zsolt Kalmár¹,
Csaba Szepesvári² &
András Lőrincz³

238 Accesses
Explore all metrics

Abstract

The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to: (i) decompose the task into subtasks using the qualitative knowledge at hand; (ii) design local controllers to solve the subtasks using the available quantitative knowledge, and (iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to nonadaptive ones in complex environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Asada, M., Noda, S., Tawaratsumida, S., and Hosoda, K. 1996. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279-303.
Google Scholar
Barto, A.G., Bradtke, S.J., and Singh, S.P. 1995. Learning to act using real-time dynamic programming. Artificial Intelligence, 1(72):81-138.
Google Scholar
Bellman, R. 1957. Dynamic Programming, Princeton University Press: Princeton, NJ.
Google Scholar
Birk, A. and Demiris, J. 1998. Sixth European Workshop on Learning Robots. Lecture Notes in Artificial Intelligence. Springer: Berlin.
Google Scholar
Brafman, R.I. and Moshe, T. 1997. Modeling agents as qualitative decision makers. Artificial Intelligence, 94(1):217-268.
Google Scholar
Branicky, M.S. 1995. Studies in hybrid systems: Modeling, analysis, and control. Ph.D. Thesis, Laboratory of Information and Decision, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA.
Google Scholar
Branicky, M.S., Borkar, V.S., and Mitter, S.K. 1994. A unified framework for hybrid control: Background, model, and theory. Technical report lids-p-2239, Laboratory for Information and Decision Systems, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA.
Google Scholar
Brockett, R.W. 1993. Hybrid models for motion control systems. Essays in Control: Perspectives in the Theory and its Applications, Birkhäuser: Boston, pp. 29-53.
Google Scholar
Brooks, R. 1991a. Artificial life and real robots. In Proc. of the 1st European Conf. on Artificial Life (ECAL), MIT Press, pp. 3-10.
Brooks, R.A. 1991b. Elephants don't play chess. Designing Autonomous Agents, Bradford-MIT Press.
Chrisman, L. 1992. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proc. of the 10th Nat. Conf. on Artificial Intell., San Jose, CA, AAAI Press, pp. 183-188.
Google Scholar
Colombetti, M., Dorigo, M., and Borghi. G. 1996. Behavior analysis and training: A methodology for behavior engineering. IEEE Trans. on Systems, Man, and Cybernetics—Part B, 26(3):365-380.
Google Scholar
de Kleer, J. and Seely, B.J. 1984. A qualitative physics based on confluences. Artificial Intelligence, 24(1-3):7-83.
Google Scholar
Dorigo, M. 1995. Alecsys and the autonomouse: Learning to control a real robot by distributed classifier systems. Machine Learning, 19(3):209-240.
Google Scholar
Dorigo, M. and Colombetti, M. 1994. Robot shaping: Developing autonomous agents through learning. Artificial Intelligence, 71:321-370.
Google Scholar
Gábor, Z., Kalmár, Zs., and Szepesvári, Cs., 1998. Multi-criteria reinforcement learning. Technical report 98-115, Research Group on Artificial Intelligence, JATE-MTA.
Grossman, R.L., Nerode, A., Ravn, A.P., and Rischel, H. 1993. Hybrid Systems, volume 736 of Lecture Notes in Computer Science. Springer-Verlag: New York.
Google Scholar
Heger, M. 1996. The loss from imperfect value functions in expectation-based and minimax-based tasks. Machine Learning, 22:197-225.
Google Scholar
Henig, M.I. 1983. Vector-valued dynamic programming. SIAM J. Control and Optimization, 21(3):490-499.
Google Scholar
Jaakkola, T., Jordan, M.I., and Singh, S.P. 1994. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185-1201.
Google Scholar
Kaebling, L.P., Littman, M.L., and Moore, A.W. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237-285.
Google Scholar
Kalmár, Zs., Szepesvári, Cs., and Lőrincz, A. 1994. Generalization in an autonomous agent. In Proc. of IEEE WCCI ICNN'94, Orlando, Florida, IEEE Inc., vol. 3, pp. 1815-1817.
Google Scholar
Kalmár, Zs., Szepesvári, Cs., and Lőrincz, A. 1995. Generalized dynamic concept model as a route to construct adaptive autonomous agents. Neural Network World, 5:353-360.
Google Scholar
Kalmár, Zs., Szepesvári, Cs., and Lőrincz, A. 1997. Module based reinforcement learning for a real robot. In Proc. of the 6th European Workshop on Learning Robots, pp. 22-32.
Koenig, S. and Simmons, R.G. 1997. Complexity analysis of realtime reinforcement learning applied to finding shortest paths in deterministic domains. Machine Learning: A Special Issue on Reinforcement Learning, 12:234-345.
Google Scholar
Korf, R.E. 1985a. Learning to solve problems by searching for macro-operators, Pitman Publisher: Massachusetts.
Google Scholar
Korf, R.E. 1985b. Macro-operators: A weak method for learning. Artificial Intelligence, 26:35-77.
Google Scholar
Korf, R.E. 1987. Planning as search: A quantitative approach. Artificial Intelligence, 33:65-88.
Google Scholar
Koza, J.R. and Rice, J.P. 1992. Automatic programming of robots using genetic programming. In Proc. of 10th Nat. Conf. on Artificial Intell., Menlo Park, CA, AAAIPress/The MIT Press, pp. 194-201.
Google Scholar
Kumar, P.R. 1985. A survey of some results in stochastic adaptive controls. SIAM Journal of Control and Optimization, 23:329-380.
Google Scholar
Littman, M.L. 1996. Algorithms for sequential decision making. Ph.D. Thesis, Department of Computer Science, Brown University. Also Technical Report CS-96-09.
Littman, M.L. and Szepesvári, Cs. 1996. A generalized reinforcement learning model: Convergence and applications. In Int. Conf. on Machine Learning, pp. 310-318.
Lygeros, J., Godbole, D.N., and Sastry, S.S. 1997. A design framework for hierarchical, hybrid control. IEEE Transactions on Automatic Control, special issue on Hybrid Systems, submitted.
Maes, P. 1991a. Adaptive action selection. In Proc. of the 13th Annual Conf. of the Cognitive Science Society. Lawrence Erlbaum Associates.
Maes, P. 1991b. A bottom-up mechanism for behavior selection in an artificial creature. In Proc. of the 1st Int. Conf. on Simulation of Adaptive Behavior, J.A. Meyer and S. Wilson, (Eds.), MIT Press.
Maes, P. 1992. Learning behavior networks from experience. Toward a Practice of Autonomous Systems (Proc. 1st European Conf. on Artificial Life), MIT Press: Cambridge, MA, pp. 48-57.
Google Scholar
Maes, P. and Brooks, R.A. 1990. Learning to coordinate behaviors. In Proc. of AAAI-90, Boston, MA, pp. 796-802.
Mahadevan, S. and Connell, J. 1992. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55:311-365.
Google Scholar
Matarić, M. 1997. Reinforcement learning in the multirobot domain. Autonomous Robots, 4.
McCallum, R.A. 1993. Overcoming incomplete perception with utile distinction memory. In Proc. of the 10th Int. Conf. on Machine Learning, Amherst, MA, Morgan Kaufmann, pp. 190-196.
Google Scholar
Munos, R. 1997. Finite-element methods with local triangulation refinement for continuous reinforcement learning problems. In Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings), M. van Someren and G. Widmer, (Eds.), volume 1224 of Lecture Notes in Artificial Intelligence, Springer: Berlin, pp 170-183.
Google Scholar
Newell, A. and Simon, H.A. 1972. Human Problem Solving. Prentice-Hall, Englewood Cliffs, NJ.
Google Scholar
Parr, R. and Russell, S. 1997. Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems 11, MIT Press: Cambridge, MA, in press.
Google Scholar
Pólya, Gy. 1945. How to solve it? Princeton University Press: Princeton, NJ.
Google Scholar
Precup, D., Sutton, R.S., and Singh, S.P. 1997. Planning with closed-loop macro actions. In Working notes of the 1997 AAAI Fall Symposium on Model-directed Autonomous Systems. AAAI Press/The MIT Press, in press.
Ross, S.M. 1970. Applied Probability Models with Optimization Applications, Holden Day: San Francisco, CA.
Google Scholar
Sacerdoti, E.D. 1974. Planning in a hierarchy of abstraction spaces. Artificial Intelligence, 5:115-135.
Google Scholar
Sastry, S. 1997. Algorithms for design of hybrid systems. In Proc. of Int. Conf. of Information Sciences.
Say, A.C.C. and Selahattin, K. 1996. Qualitative system identification: deriving structure from behavior. Artificial Intelligence, 83(1):75-141.
Google Scholar
Singh, S. and Cohn, D. 1997. How to dynamically merge markov decision processes. Advances in Neural Information Processing Systems 11, MIT Press: Cambridge, MA, in press.
Google Scholar
Singh, S., Jaakkola, T., Littman, M.L., and Szepesvári, Cs. 1997. On the convergence of single-step on-policy reinforcement-learning algorithms. Machine Learning, accepted.
Singh, S.P. 1992. Reinforcement learning with a hierarchy of abstract models. In Proc. of the 10th National Conf. on Artificial Intell., San Jose, CA, AAAI Press, pp. 202-207.
Google Scholar
Singh, S.P., Jaakkola, T., and Jordan, M.I. 1995. Learning without state-estimation in partially observable markovian decision processes. In Proc. of the 11th Machine Learning Conf., pp. 284-292.
Sutton, R.S. 1996. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems, 8.
Sutton, R.S. 1984. Temporal credit assignment in reinforcement learning. Ph.D. Thesis, University of Massachusetts, Amherst, MA.
Google Scholar
Szepesvári, Cs. 1994. Dynamic concept model learns optimal policies. In Proc. of IEEE WCCI ICNN'94, Orlando, Florida, 1994. IEEE Inc, vol. 3, pp. 1738-1742.
Google Scholar
Szepesvári, Cs. 1997a. Learning and exploitation do not conflict under minimax optimality. In Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings), M. van Someren and G. Widmer (Eds.), volume 1224 of Lecture Notes in Artificial Intelligence, Springer: Berlin, pp 242-249.
Google Scholar
Szepesvári, Cs. 1997b. Static and dynamic aspects of optimal sequential decision making. Ph.D. Thesis, Bolyai Institute of Mathematics, University of Szeged, Szeged, Aradi vrt. tere 1, Hungary, 6720.
Google Scholar
Szepesvári, Cs. and Lőrincz, A. 1994. Behavior of an adaptive self-organizing autonomous agent working with cues and competing concepts. Adaptive Behavior, 2(2):131-160.
Google Scholar
Szepesvári, Cs. and Littman, M.L. 1997. Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms. Neural Computation, in preparation.
Thrun, S.B. 1992. The Role of Exploration in Learning Control. Van Nostrand Rheinhold: Florence, KY.
Google Scholar
Thrun, S.B. and Schwartz, A. 1995. Finding structure in reinforcement learning. Advances in Neural Information Processing Systems, G. Tesauro, D.S. Touretzky, and T.K. Leen (Eds.), The MIT Press: Cambridge, vol. 7, pp. 385-392.
Google Scholar
Tóth, G.J., Kovács, Sz., and Lőrincz, A. 1995. Genetic algorithm with alphabet optimization. Biological Cybernetics, 73:61-68.
Google Scholar
Tsitsiklis, J.N. 1994. Asynchronous stochastic approximation and q-learning. Machine Learning, 8(3-4):257-277.
Google Scholar
Tsitsiklis, J.N. and Van Roy, B., 1995. An analysis of temporal difference learning with function approximation. Technical Report LIDS-P-2322, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology.
Tsitsiklis, J.N. and Van Roy, B. 1996. Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94.
Google Scholar
Tyrrell, T. 1993. Computational mechanisms for action selection. Ph.D. Thesis, University of Edinburgh.
Uchibe, E., Asada, M., and Hosoda, K. 1996. Behavior coordination for a mobile robot using modular reinforcement learning. In Proc. of IEEE/RSJ Int. Conf. on Intelligent Robot and Sytems, pp. 1329-1336.
Watkins, C.J.C.H. and Dayan, P. 1992. Q-learning. Machine Learning, 3(8):279-292.
Google Scholar
Werbös, P.J. 1977. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 22:25-38.
Google Scholar
Wiering, M. and Schmidhuber, J. 1997. HQ-learning. Adaptive Behavior, 6(2).
Zabczyk, J. 1973. Optimal control by means of switching. Studia Mathematica, 65:161-171.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, “József Attila”, University of Szeged, Szeged, Aradi vrt. tere 1, Hungary, H-6720
Zsolt Kalmár
Research Group on Artificial Intelligence, “József Attila”, Uni versity of Szeged, Szeged, Aradi vrt. tere 1, Hungary, H-6720
Csaba Szepesvári
Department of Adaptive Systems, “József Attila”, University of Szeged, Szeged, Aradi vrt. tere 1, Hungary, H-6720
András Lőrincz

Authors

Zsolt Kalmár
View author publications
You can also search for this author inPubMed Google Scholar
Csaba Szepesvári
View author publications
You can also search for this author inPubMed Google Scholar
András Lőrincz
View author publications
You can also search for this author inPubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalmár, Z., Szepesvári, C. & Lőrincz, A. Module-Based Reinforcement Learning: Experiments with a Real Robot. Autonomous Robots 5, 273–295 (1998). https://doi.org/10.1023/A:1008858222277

Download citation

Issue Date: July 1998
DOI: https://doi.org/10.1023/A:1008858222277

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Module-Based Reinforcement Learning: Experiments with a Real Robot

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Survey of Model-Based Reinforcement Learning: Applications on Robotics

The Challenges of Reinforcement Learning in Robotics and Optimal Control

Incremental reinforcement learning for multi-objective robotic tasks

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Module-Based Reinforcement Learning: Experiments with a Real Robot

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Survey of Model-Based Reinforcement Learning: Applications on Robotics

The Challenges of Reinforcement Learning in Robotics and Optimal Control

Incremental reinforcement learning for multi-objective robotic tasks

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now