poster

Towards reinforcement learning representation transfer

Authors:

Matthew E. Taylor,

Peter StoneAuthors Info & Claims

AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems

Article No.: 100, Pages 1 - 3

https://doi.org/10.1145/1329125.1329248

Published: 14 May 2007 Publication History

Abstract

Transfer learning problems are typically framed as leveraging knowledge learned on a source task to improve learning on a related, but different, target task. Current transfer methods are able to successfully transfer knowledge between agents in different reinforcement learning tasks, reducing the time needed to learn the target. However, the complimentary task of representation transfer, i.e. transferring knowledge between agents with different internal representations, has not been well explored. The goal in both types of transfer problems is the same: reduce the time needed to learn the target with transfer, relative to learning the target without transfer. This work introduces one such representation transfer algorithm which is implemented in a complex multiagent domain. Experiments demonstrate that transferring the learned knowledge between different representations is both possible and beneficial.

References

[1]

F. Fernandez and M. Veloso. Learning by probabilistic reuse of past policies. In Proc. of the 6th International Conference on Autonomous Agents and Multiagent Systems, 2006.

[2]

E. Fink. Automatic representation changes in problem solving. Technical Report CMU-CS-99-150, Depart. of Computer Science, Carnegie Mellon University, 1999.

[3]

C. A. Kaplan. Switch: A simulation of representational change in the mutilated checkboard problem. Technical Report C.I.P. 477, Department of Psychology, Carnegie Mellon University, 1989.

[4]

G. Konidaris and A. Barto. Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of the 23rd Internation Conference on Machine Learning, pages 489--496, 2006.

Digital Library

[5]

R. Maclin, J. Shavlik, L. Torrey, T. Walker, and E. Wild. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In Proceedings of the 20th National Conference on Artificial Intelligence, 2005.

Digital Library

[6]

S. Mahadevan and J. Connell. Automatic programming of behavior-based robots using reinforcement learning. In National Conference on Artificial Intelligence, pages 768--773, 1991.

[7]

J. McCarthy. A tough nut for proof procedures. Technical Report Sail AI Memo 16, Computer Science Department, Stanford University, 1964.

[8]

B. Price and C. Boutilier. Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research, 19:569--629, 2003.

[9]

G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG-RT 116, Engineering Department, Cambridge University, 1994.

[10]

S. P. Singh and R. S. Sutton. Reinforcement learning with replaceing eligibility traces. Machine Learning, 22:123--158, 1996.

Digital Library

[11]

V. Soni and S. Singh. Using homomorphisms to transfer options across continuous reinforcement learning domains. In Proceedings of the Twenty First National Conference on Artificial Intelligence, July 2006.

Digital Library

[12]

P. Stone, G. Kuhlmann, M. E. Taylor, and Y. Liu. Keepaway soccer: From machine learning testbed to benchmark. In I. Noda, A. Jacoff, A. Bredenfeld, and Y. Takahashi, editors, RoboCup-2005: Robot Soccer World Cup IX, volume 4020, pages 93--105. Springer Verlag, Berlin, 2006.

Digital Library

[13]

P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3):165--188, 2005.

[14]

R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press, 1998.

Digital Library

[15]

M. E. Taylor, P. Stone, and Y. Liu. Value functions for RL-based behavior transfer: A comparative study. In Proceedings of the Twentieth National Conference on Artificial Intelligence, July 2005.

Digital Library

Cited By

Wijesinghe RTissera DVithanage MXavier AFernando SSamarawickrama J(2023)An Advisor-Based Architecture for a Sample-Efficient Training of Autonomous Navigation Agents with Reinforcement LearningRobotics10.3390/robotics1205013312:5(133)Online publication date: 28-Sep-2023
https://doi.org/10.3390/robotics12050133
Mo YSasaki HMatsubara TYamazaki K(2023)Multi-step motion learning by combining learning-from-demonstration and policy-searchAdvanced Robotics10.1080/01691864.2022.216318737:9(560-575)Online publication date: 19-Jan-2023
https://doi.org/10.1080/01691864.2022.2163187
García JVisús ÁFernández F(2022)A taxonomy for similarity metrics between Markov decision processesMachine Learning10.1007/s10994-022-06242-4111:11(4217-4247)Online publication date: 14-Oct-2022
https://doi.org/10.1007/s10994-022-06242-4
Show More Cited By

Recommendations

Cross-domain transfer for reinforcement learning
ICML '07: Proceedings of the 24th international conference on Machine learning

A typical goal for transfer learning algorithms is to utilize knowledge gained in a source task to learn a target task faster. Recently introduced transfer methods in reinforcement learning settings have shown considerable promise, but they typically ...
Autonomous inter-task transfer in reinforcement learning domains
Relational transfer in reinforcement learning

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems

May 2007

1585 pages

ISBN:9788190426275

DOI:10.1145/1329125

Conference Chairs:
Edmund Durfee
University of Michigan
,
Makoto Yokoo
Kyushu University
,
Program Chairs:
Michael Huhns
University of South Carolina
,
Onn Shehory
IBM Haifa Research Lab, Israel

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFAAMAS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Poster

Funding Sources

Conference

AAMAS07

Sponsor:

AAMAS07: International Conference on Autonomous Agents and Mulitagent Systems

May 14 - 18, 2007

Hawaii, Honolulu

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
186
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wijesinghe RTissera DVithanage MXavier AFernando SSamarawickrama J(2023)An Advisor-Based Architecture for a Sample-Efficient Training of Autonomous Navigation Agents with Reinforcement LearningRobotics10.3390/robotics1205013312:5(133)Online publication date: 28-Sep-2023
https://doi.org/10.3390/robotics12050133
Mo YSasaki HMatsubara TYamazaki K(2023)Multi-step motion learning by combining learning-from-demonstration and policy-searchAdvanced Robotics10.1080/01691864.2022.216318737:9(560-575)Online publication date: 19-Jan-2023
https://doi.org/10.1080/01691864.2022.2163187
García JVisús ÁFernández F(2022)A taxonomy for similarity metrics between Markov decision processesMachine Learning10.1007/s10994-022-06242-4111:11(4217-4247)Online publication date: 14-Oct-2022
https://doi.org/10.1007/s10994-022-06242-4
Serrano SMartinez-Carranza JSucar L(2022)Inter-task Similarity Measure for Heterogeneous TasksRoboCup 2021: Robot World Cup XXIV10.1007/978-3-030-98682-7_4(40-52)Online publication date: 22-Mar-2022
https://doi.org/10.1007/978-3-030-98682-7_4
Hou YOng YTang JZeng Y(2021)Evolutionary Multiagent Transfer Learning With Model-Based Opponent Behavior PredictionIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2019.295884651:10(5962-5976)Online publication date: Oct-2021
https://doi.org/10.1109/TSMC.2019.2958846
Pan JWang XCheng YYu Q(2018)Multisource Transfer Double DQN Based on Actor LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2018.280608729:6(2227-2238)Online publication date: Jun-2018
https://doi.org/10.1109/TNNLS.2018.2806087
Hou YOng YFeng LZurada J(2017)An Evolutionary Transfer Reinforcement Learning Framework for Multiagent SystemsIEEE Transactions on Evolutionary Computation10.1109/TEVC.2017.266466521:4(601-615)Online publication date: Aug-2017
https://doi.org/10.1109/TEVC.2017.2664665
Doshi-Velez FKonidaris G(2016)Hidden parameter markov decision processesProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060621.3060820(1432-1440)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3060621.3060820
Genevay ALaroche RJonker CMarsella SThangarajah JTuyls K(2016)Transfer Learning for User Adaptation in Spoken Dialogue SystemsProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937067(975-983)Online publication date: 9-May-2016
https://dl.acm.org/doi/10.5555/2936924.2937067
Song JGao YWang HAn BJonker CMarsella SThangarajah JTuyls K(2016)Measuring the Distance Between Finite Markov Decision ProcessesProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2936994(468-476)Online publication date: 9-May-2016
https://dl.acm.org/doi/10.5555/2936924.2936994
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten