Skip to main content
Log in

Module-Based Reinforcement Learning: Experiments with a Real Robot

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to: (i) decompose the task into subtasks using the qualitative knowledge at hand; (ii) design local controllers to solve the subtasks using the available quantitative knowledge, and (iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to nonadaptive ones in complex environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Asada, M., Noda, S., Tawaratsumida, S., and Hosoda, K. 1996. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279-303.

    Google Scholar 

  • Barto, A.G., Bradtke, S.J., and Singh, S.P. 1995. Learning to act using real-time dynamic programming. Artificial Intelligence, 1(72):81-138.

    Google Scholar 

  • Bellman, R. 1957. Dynamic Programming, Princeton University Press: Princeton, NJ.

    Google Scholar 

  • Birk, A. and Demiris, J. 1998. Sixth European Workshop on Learning Robots. Lecture Notes in Artificial Intelligence. Springer: Berlin.

    Google Scholar 

  • Brafman, R.I. and Moshe, T. 1997. Modeling agents as qualitative decision makers. Artificial Intelligence, 94(1):217-268.

    Google Scholar 

  • Branicky, M.S. 1995. Studies in hybrid systems: Modeling, analysis, and control. Ph.D. Thesis, Laboratory of Information and Decision, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA.

    Google Scholar 

  • Branicky, M.S., Borkar, V.S., and Mitter, S.K. 1994. A unified framework for hybrid control: Background, model, and theory. Technical report lids-p-2239, Laboratory for Information and Decision Systems, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA.

    Google Scholar 

  • Brockett, R.W. 1993. Hybrid models for motion control systems. Essays in Control: Perspectives in the Theory and its Applications, Birkhäuser: Boston, pp. 29-53.

    Google Scholar 

  • Brooks, R. 1991a. Artificial life and real robots. In Proc. of the 1st European Conf. on Artificial Life (ECAL), MIT Press, pp. 3-10.

  • Brooks, R.A. 1991b. Elephants don't play chess. Designing Autonomous Agents, Bradford-MIT Press.

  • Chrisman, L. 1992. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proc. of the 10th Nat. Conf. on Artificial Intell., San Jose, CA, AAAI Press, pp. 183-188.

    Google Scholar 

  • Colombetti, M., Dorigo, M., and Borghi. G. 1996. Behavior analysis and training: A methodology for behavior engineering. IEEE Trans. on Systems, Man, and Cybernetics—Part B, 26(3):365-380.

    Google Scholar 

  • de Kleer, J. and Seely, B.J. 1984. A qualitative physics based on confluences. Artificial Intelligence, 24(1-3):7-83.

    Google Scholar 

  • Dorigo, M. 1995. Alecsys and the autonomouse: Learning to control a real robot by distributed classifier systems. Machine Learning, 19(3):209-240.

    Google Scholar 

  • Dorigo, M. and Colombetti, M. 1994. Robot shaping: Developing autonomous agents through learning. Artificial Intelligence, 71:321-370.

    Google Scholar 

  • Gábor, Z., Kalmár, Zs., and Szepesvári, Cs., 1998. Multi-criteria reinforcement learning. Technical report 98-115, Research Group on Artificial Intelligence, JATE-MTA.

  • Grossman, R.L., Nerode, A., Ravn, A.P., and Rischel, H. 1993. Hybrid Systems, volume 736 of Lecture Notes in Computer Science. Springer-Verlag: New York.

    Google Scholar 

  • Heger, M. 1996. The loss from imperfect value functions in expectation-based and minimax-based tasks. Machine Learning, 22:197-225.

    Google Scholar 

  • Henig, M.I. 1983. Vector-valued dynamic programming. SIAM J. Control and Optimization, 21(3):490-499.

    Google Scholar 

  • Jaakkola, T., Jordan, M.I., and Singh, S.P. 1994. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185-1201.

    Google Scholar 

  • Kaebling, L.P., Littman, M.L., and Moore, A.W. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237-285.

    Google Scholar 

  • Kalmár, Zs., Szepesvári, Cs., and Lőrincz, A. 1994. Generalization in an autonomous agent. In Proc. of IEEE WCCI ICNN'94, Orlando, Florida, IEEE Inc., vol. 3, pp. 1815-1817.

    Google Scholar 

  • Kalmár, Zs., Szepesvári, Cs., and Lőrincz, A. 1995. Generalized dynamic concept model as a route to construct adaptive autonomous agents. Neural Network World, 5:353-360.

    Google Scholar 

  • Kalmár, Zs., Szepesvári, Cs., and Lőrincz, A. 1997. Module based reinforcement learning for a real robot. In Proc. of the 6th European Workshop on Learning Robots, pp. 22-32.

  • Koenig, S. and Simmons, R.G. 1997. Complexity analysis of realtime reinforcement learning applied to finding shortest paths in deterministic domains. Machine Learning: A Special Issue on Reinforcement Learning, 12:234-345.

    Google Scholar 

  • Korf, R.E. 1985a. Learning to solve problems by searching for macro-operators, Pitman Publisher: Massachusetts.

    Google Scholar 

  • Korf, R.E. 1985b. Macro-operators: A weak method for learning. Artificial Intelligence, 26:35-77.

    Google Scholar 

  • Korf, R.E. 1987. Planning as search: A quantitative approach. Artificial Intelligence, 33:65-88.

    Google Scholar 

  • Koza, J.R. and Rice, J.P. 1992. Automatic programming of robots using genetic programming. In Proc. of 10th Nat. Conf. on Artificial Intell., Menlo Park, CA, AAAIPress/The MIT Press, pp. 194-201.

    Google Scholar 

  • Kumar, P.R. 1985. A survey of some results in stochastic adaptive controls. SIAM Journal of Control and Optimization, 23:329-380.

    Google Scholar 

  • Littman, M.L. 1996. Algorithms for sequential decision making. Ph.D. Thesis, Department of Computer Science, Brown University. Also Technical Report CS-96-09.

  • Littman, M.L. and Szepesvári, Cs. 1996. A generalized reinforcement learning model: Convergence and applications. In Int. Conf. on Machine Learning, pp. 310-318.

  • Lygeros, J., Godbole, D.N., and Sastry, S.S. 1997. A design framework for hierarchical, hybrid control. IEEE Transactions on Automatic Control, special issue on Hybrid Systems, submitted.

  • Maes, P. 1991a. Adaptive action selection. In Proc. of the 13th Annual Conf. of the Cognitive Science Society. Lawrence Erlbaum Associates.

  • Maes, P. 1991b. A bottom-up mechanism for behavior selection in an artificial creature. In Proc. of the 1st Int. Conf. on Simulation of Adaptive Behavior, J.A. Meyer and S. Wilson, (Eds.), MIT Press.

  • Maes, P. 1992. Learning behavior networks from experience. Toward a Practice of Autonomous Systems (Proc. 1st European Conf. on Artificial Life), MIT Press: Cambridge, MA, pp. 48-57.

    Google Scholar 

  • Maes, P. and Brooks, R.A. 1990. Learning to coordinate behaviors. In Proc. of AAAI-90, Boston, MA, pp. 796-802.

  • Mahadevan, S. and Connell, J. 1992. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55:311-365.

    Google Scholar 

  • Matarić, M. 1997. Reinforcement learning in the multirobot domain. Autonomous Robots, 4.

  • McCallum, R.A. 1993. Overcoming incomplete perception with utile distinction memory. In Proc. of the 10th Int. Conf. on Machine Learning, Amherst, MA, Morgan Kaufmann, pp. 190-196.

    Google Scholar 

  • Munos, R. 1997. Finite-element methods with local triangulation refinement for continuous reinforcement learning problems. In Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings), M. van Someren and G. Widmer, (Eds.), volume 1224 of Lecture Notes in Artificial Intelligence, Springer: Berlin, pp 170-183.

    Google Scholar 

  • Newell, A. and Simon, H.A. 1972. Human Problem Solving. Prentice-Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • Parr, R. and Russell, S. 1997. Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems 11, MIT Press: Cambridge, MA, in press.

    Google Scholar 

  • Pólya, Gy. 1945. How to solve it? Princeton University Press: Princeton, NJ.

    Google Scholar 

  • Precup, D., Sutton, R.S., and Singh, S.P. 1997. Planning with closed-loop macro actions. In Working notes of the 1997 AAAI Fall Symposium on Model-directed Autonomous Systems. AAAI Press/The MIT Press, in press.

  • Ross, S.M. 1970. Applied Probability Models with Optimization Applications, Holden Day: San Francisco, CA.

    Google Scholar 

  • Sacerdoti, E.D. 1974. Planning in a hierarchy of abstraction spaces. Artificial Intelligence, 5:115-135.

    Google Scholar 

  • Sastry, S. 1997. Algorithms for design of hybrid systems. In Proc. of Int. Conf. of Information Sciences.

  • Say, A.C.C. and Selahattin, K. 1996. Qualitative system identification: deriving structure from behavior. Artificial Intelligence, 83(1):75-141.

    Google Scholar 

  • Singh, S. and Cohn, D. 1997. How to dynamically merge markov decision processes. Advances in Neural Information Processing Systems 11, MIT Press: Cambridge, MA, in press.

    Google Scholar 

  • Singh, S., Jaakkola, T., Littman, M.L., and Szepesvári, Cs. 1997. On the convergence of single-step on-policy reinforcement-learning algorithms. Machine Learning, accepted.

  • Singh, S.P. 1992. Reinforcement learning with a hierarchy of abstract models. In Proc. of the 10th National Conf. on Artificial Intell., San Jose, CA, AAAI Press, pp. 202-207.

    Google Scholar 

  • Singh, S.P., Jaakkola, T., and Jordan, M.I. 1995. Learning without state-estimation in partially observable markovian decision processes. In Proc. of the 11th Machine Learning Conf., pp. 284-292.

  • Sutton, R.S. 1996. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems, 8.

  • Sutton, R.S. 1984. Temporal credit assignment in reinforcement learning. Ph.D. Thesis, University of Massachusetts, Amherst, MA.

    Google Scholar 

  • Szepesvári, Cs. 1994. Dynamic concept model learns optimal policies. In Proc. of IEEE WCCI ICNN'94, Orlando, Florida, 1994. IEEE Inc, vol. 3, pp. 1738-1742.

    Google Scholar 

  • Szepesvári, Cs. 1997a. Learning and exploitation do not conflict under minimax optimality. In Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings), M. van Someren and G. Widmer (Eds.), volume 1224 of Lecture Notes in Artificial Intelligence, Springer: Berlin, pp 242-249.

    Google Scholar 

  • Szepesvári, Cs. 1997b. Static and dynamic aspects of optimal sequential decision making. Ph.D. Thesis, Bolyai Institute of Mathematics, University of Szeged, Szeged, Aradi vrt. tere 1, Hungary, 6720.

    Google Scholar 

  • Szepesvári, Cs. and Lőrincz, A. 1994. Behavior of an adaptive self-organizing autonomous agent working with cues and competing concepts. Adaptive Behavior, 2(2):131-160.

    Google Scholar 

  • Szepesvári, Cs. and Littman, M.L. 1997. Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms. Neural Computation, in preparation.

  • Thrun, S.B. 1992. The Role of Exploration in Learning Control. Van Nostrand Rheinhold: Florence, KY.

    Google Scholar 

  • Thrun, S.B. and Schwartz, A. 1995. Finding structure in reinforcement learning. Advances in Neural Information Processing Systems, G. Tesauro, D.S. Touretzky, and T.K. Leen (Eds.), The MIT Press: Cambridge, vol. 7, pp. 385-392.

    Google Scholar 

  • Tóth, G.J., Kovács, Sz., and Lőrincz, A. 1995. Genetic algorithm with alphabet optimization. Biological Cybernetics, 73:61-68.

    Google Scholar 

  • Tsitsiklis, J.N. 1994. Asynchronous stochastic approximation and q-learning. Machine Learning, 8(3-4):257-277.

    Google Scholar 

  • Tsitsiklis, J.N. and Van Roy, B., 1995. An analysis of temporal difference learning with function approximation. Technical Report LIDS-P-2322, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology.

  • Tsitsiklis, J.N. and Van Roy, B. 1996. Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94.

    Google Scholar 

  • Tyrrell, T. 1993. Computational mechanisms for action selection. Ph.D. Thesis, University of Edinburgh.

  • Uchibe, E., Asada, M., and Hosoda, K. 1996. Behavior coordination for a mobile robot using modular reinforcement learning. In Proc. of IEEE/RSJ Int. Conf. on Intelligent Robot and Sytems, pp. 1329-1336.

  • Watkins, C.J.C.H. and Dayan, P. 1992. Q-learning. Machine Learning, 3(8):279-292.

    Google Scholar 

  • Werbös, P.J. 1977. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 22:25-38.

    Google Scholar 

  • Wiering, M. and Schmidhuber, J. 1997. HQ-learning. Adaptive Behavior, 6(2).

  • Zabczyk, J. 1973. Optimal control by means of switching. Studia Mathematica, 65:161-171.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalmár, Z., Szepesvári, C. & Lőrincz, A. Module-Based Reinforcement Learning: Experiments with a Real Robot. Autonomous Robots 5, 273–295 (1998). https://doi.org/10.1023/A:1008858222277

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008858222277

Navigation