Abstract
Pure reinforcement learning does not scale well to domains with many degrees of freedom and particularly to continuous domains. In this paper, we introduce a hybrid method in which a symbolic planner constructs an approximate solution to a control problem. Subsequently, a numerical optimisation algorithm is used to refine the qualitative plan into an operational policy. The method is demonstrated on the problem of learning a stable walking gait for a bipedal robot. We use this approach to illustrate the benefits of a multistrategy approach to robot learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Apt, K.R., Wallace, M.: Constraint Logic Programming Using Eclipse. Cambridge University Press, Cambridge (2007)
Benson, S., Nilsson, N.J.: Reacting, planning and learning in an autonomous agent. In: Furukawa, K., Michie, D., Muggleton, S. (eds.) Machine Intelligence, vol. 14. Oxford University Press, Oxford (1995)
Dietterich, T.G.: The MAXQ method for hierarchical reinforcement learning. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 118–126. Morgan Kaufmann, San Francisco (1998)
Durrant-whyte, H., Bailey, T.: Simultaneous Localisation and Mapping (SLAM): Part I: The Essential Algorithms. Robotics and Automation Magazine 13, 99–110 (2006)
Dzeroski, S., De Raedt, L., Blockeel, H.: Relational reinforcement learning. In: Page, D.L. (ed.) ILP 1998. LNCS (LNAI), vol. 1446. Springer, Heidelberg (1998)
Ferrein, A., Lakemeyer, G.: Logic-based robot control in highly dynamic domains. Robotics and Autonomous Systems 56(11), 980–991 (2008)
Fikes, R., Nilsson, N.: STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence 2, 189–208 (1971)
Getoor, L., Taskar, B. (eds.): Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
Hengst, B.: Discovering Hierarchy in Reinforcement Learning with HEXQ. In: Sammut, C. (ed.) Proceedings of the International Conference on Machine Learning, Sydney (2002)
Hornby, G.S., Fujita, M., Takamura, S., Yamamoto, T., Hanagata, O.: Evolution of gaits with the sony quadruped robot. In: Proceedings of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Kim, M.S., Uther, W.: Automatic gait optimisation for quadruped robots. In: Australasian Conference on Robotics and Automation, Brisbane (2003)
Laird, J., Rosenbloom, P., Newell, A.: Soar: An Architecture for General Intelligence. Artificial Intelligence 33, 1–64 (1987)
Langley, P., Choi, D.: A unified cognitive architecture for physical agents. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence. AAAI Press, Boston (2006)
Leonard, J.J., Durrant-whyte, H.F.: Simultaneous map building and localization for an autonomous mobilerobot. In: Intelligent Robots and Systems 1991. Intelligence for Mechanical Systems, Proceedings IROS 1991. IEEE/RSJ International Workshop, pp. 1442–1447 (1991)
Michalski, R.S.: LEARNING = INFERENCING + MEMORIZING: Basic Concepts of Inferential Theory of Learning and Their Use for Classifying Learning Processes. In: Chipman, S. (ed.) Cognitive Models of Learning (1992)
Michalski, R.S.: Inferential Theory of Learning as a Conceptual Basis for Multistrategy Learning. Machine Learning, Special Issue on Multistrategy Learning 11, 111–151 (1993)
Michalski, R.S.: Toward a Unified Theory of Learning: Multistrategy Task-adaptive Learning. In: Buchanan, B.G., Wikins, D.C. (eds.) Readings in Knowledge Acquisition and Learning: Automating the Construction and Improvement of Expert Systems. Morgan Kaufmann, San Mateo (1993)
Michalski, R.S.: Inferential Theory of Learning: Developing Foundations for Multistrategy Learning. In: Machine Learning: A Multistrategy Approach, vol. IV. Morgan Kaufmann Publishers, San Francisco (1994)
Michie, D., Chambers, R.A.: Boxes: An Experiment in Adaptive Control. In: Dale, E., Michie, D. (eds.) Machine Intelligence, vol. 2. Oliver and Boyd, Edinburgh (1968)
Mitchell, T.M., Keller, R.M., Kedar-Cabelli, S.T.: Explanation-Based Generalization: A Unifying View. Machine Learning 1(1), 47–80 (1986)
Kohl, N., Stone, P.: Policy gradient reinforcement learning for fast quadrupedal locomotion. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2619–2624 (2004)
Nardi, D., Riedmiller, M., Sammut, C., Santos-Victor, J. (eds.): RoboCup 2004. LNCS (LNAI), vol. 3276. Springer, Heidelberg (2005)
Ogino, M., Katoh, Y., Asada, M., Hosoda, K.: Vision-Based Reinforcement Learning for Humanoid Behavior Generation with Rhythmic Walking Parameters. In: Proceedings of 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, pp. 1665–1671 (2003)
Ogino, M., Hosoda, K., Asada, M.: Learning Energy Efficient Walking with Ballistic Walking. In: 2nd International Symposium on Adaptive Motion of Animals and Machines (2003)
Ramos, F.T., Durrant-Whyte, H.F., Upcroft, B.: Learning Articulated Motion Structures with Bayesian Networks. In: 8th International Conference on Information Fusion, Philadelphia (2005)
Ryan, M.R.K.: Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies. In: Sammut, C. (ed.) Proceedings of The 19th International Conference on Machine Learning, Sydney (2002)
Sammut, C.A., Hume, D.V.: Observation and Generalisation in a Simulated Robot World. In: Proceedings of the Fourth International Machine Learning Workshop, Los Altos, California (1987)
Sammut, C., Hengst, B.: The Evolution of a Robot Soccer Team. In: Jarvis, R.A., Zelinksky, A. (eds.) Robotics Research: The Tenth International Conference, pp. 517–529. Springer, Heidelberg (2003)
Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement Learning for RoboCup-Soccer Keepaway. Adaptive Behavior 13(3), 165–188 (2005)
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Watkins, C.J.C.H.: Learning with Delayed Rewards. Ph.D. Dissertation, Psychology Department, University of Cambridge, England (1989)
Wyeth, G., Kee, D., Yik, T.F.: Evolving a Locus Based Gait for a Humanoid Robot. In: International Conference on Robotics and Intelligent Systems (2003)
Yik, T.K.: Locomotion of Bipedal Humanoid Robots: Planning and Learning to Walk. Ph.D. Dissertation, School of Computer Science and Engineering, Universty of New South Wales (2008)
Zhou, C., Yue, P.K., Ni, J., Chan, S.-B.: Dynamically stable gait planning for a humanoid robot to climb sloping surface. In: Proceedings of IEEE Conference on Robotics, Automation and Mechatronics, pp. 341–346 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sammut, C., Yik, T.F. (2010). Multistrategy Learning for Robot Behaviours. In: Koronacki, J., Raś, Z.W., Wierzchoń, S.T., Kacprzyk, J. (eds) Advances in Machine Learning I. Studies in Computational Intelligence, vol 262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05177-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-05177-7_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05176-0
Online ISBN: 978-3-642-05177-7
eBook Packages: EngineeringEngineering (R0)