Abstract
Reinforcement learning applications are hampered by the tabula rasa approach taken by existing techniques. Transfer for reinforcement learning tackles this problem by enabling the reuse of previously learned behaviours. To be fully autonomous a transfer agent has to: (1) automatically choose a relevant source task(s) for a given target, (2) learn about the relation between the tasks, and (3) effectively and efficiently transfer between tasks. Currently, most transfer frameworks require substantial human intervention in at least one of the previous three steps. This discussion paper aims at: (1) positioning various knowledge re-use algorithms as forms of transfer, and (2) arguing the validity and possibility of autonomous transfer by detailing potential solutions to the above three steps.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In policy iteration algorithms, for example, the policy space can be defined as the space of all possible policies that can be learnt. In other words, this space can be defined by a combination of basis functions and parameterisations spanning different policies.
Such a setting is typical in continuous reinforcement learning. The reasons relate to: (1) Q-function, and (2) state and action space representations.
A typical criterion used is to maximise the expected value of the total discounted pay-off signal.
Typically, n 2 < < n 1 where only few transitions are available from the target task.
References
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st international conference on Machine learning, ICML ’04, ACM, New York, NY, USA
Ammar HB, Taylor ME, Tuyls K, Driessens K, Weiss G (2012) Reinforcement learning transfer via sparse coding (full paper). In: Proceedings of the 11th conference on Autonomous Agents and Multiagent Systems (AAMAS), Valencia
Argall BD, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst, 57(5):469–483
Buşoniu L, Babuška R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press, Boca Raton
Castro PS, Precup D (2010) Using bisimulation for policy transfer in mdps. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Volume 1, AAMAS ’10, International Foundation for Autonomous Agents and Multiagent Systems, Richland, pp 1399–1400
Ferns N, Panangaden P, Precup D (2004) Metrics for finite markov decision processes. In: Chickering DM, Halpern JY, (eds), UAI, AUAI Press pp 162–169
Ferns N, Panangaden P, Precup D (2011) Bisimulation metrics for continuous markov decision processes. SIAM J Comput, 40(6):1662–1714
Knox WB, Stone P, Breazeal C (2013) Teaching agents with human feedback: a demonstration of the tamer framework. In: IUI Companion, pp 65–66
Lee H, Battle A, Raina R, Ng AY (2007) Efficient sparse coding algorithms. In: In NIPS, NIPS pp 801–808
Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: theory and application to reward shaping. In: In Proceedings of the 16th International Conference on Machine Learning, Morgan Kaufmann, pp 278–287
Snelson E, Ghahramani Z (2006) Sparse gaussian processes using pseudo-inputs. In: Advances in Neural Information Processing Systems, MIT press, pp 1257–1264
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res, 10:1633–1685
Taylor ME, Stone P, Liu Y (2007) Transfer learning via inter-task mappings for temporal difference learning. J Mach Learn Res 8(1):2125–2167
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bou Ammar, H., Chen, S., Tuyls, K. et al. Automated Transfer for Reinforcement Learning Tasks. Künstl Intell 28, 7–14 (2014). https://doi.org/10.1007/s13218-013-0286-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13218-013-0286-8