Abstract
Although reinforcement learning is known as an effective machine learning technique, it might perform poorly in complex problems, especially real-world problems, leading to a slow rate of convergence. This issue magnifies when facing continuous domains where the curse of dimensionality is inevitable, and generalization is mostly desired. Transfer learning is a successful technique to remedy such a problem which results in significant improvements in learning performance by providing generalization not only within a task but also across different but related or similar tasks. The critical issue in transfer learning is how to incorporate the knowledge acquired from learning in a different but related task in the past. Domain adaptation is an exciting paradigm that seeks to address this challenge. In this paper, we propose a novel skill based Transfer Learning with Domain Adaptation (TLDA) approach suitable for continuous RL problems. TLDA discovers and learns skills as high-level knowledge from source task and then uses domain adaptation technique to help agent discover state-action mapping as a relation between the source and target tasks. With such mapping, TLDA can adapt source skills and speed up learning on a new target task. The experimental results verify the achievement of an effective transfer learning method for continuous reinforcement learning problems.
Similar content being viewed by others
References
Ammar H B, Tuyls K, Taylor M E, Driessens K, Weiss G (2012) Reinforcement learning transfer via sparse coding. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems-Volume 1, pp 383–390. International Foundation for Autonomous Agents and Multiagent Systems
Ammar HB, Eaton E, Ruvolo P, Taylor M (2014) Online multi-task learning for policy gradient methods. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 1206–1214
Ammar HB, Eaton E, Taylor M E, Mocanu D C, Driessens K, Weiss G, Tuyls K (2014) An automated measure of mdp similarity for transfer in reinforcement learning. In: Workshops at the 28th AAAI conference on artificial intelligence
Ammar H B, Eaton E, Ruvolo P, Taylor M E (2015) Unsupervised cross-domain transfer in policy gradient reinforcement learning via manifold alignment. In: Proc. of AAAI
Asadi M, Huber M (2007) Effective control knowledge transfer through learning skill and representation hierarchies. In: 20th international joint conference on artificial intelligence, number Icml, pp 2054–2059
Asadi M, Huber M (2015) A dynamic hierarchical task transfer in multiple robot explorations. In: Proceedings on the international conference on artificial intelligence (ICAI), vol 8, pp 22–27
Barreto A, Dabney W, Munos R, Hunt J J, Schaul T, van Hasselt H P, Silver D (2017) Successor features for transfer in reinforcement learning. In: Advances in neural information processing systems, pp 4055–4065
Barreto A, Borsa D, Quan J, Schaul T, Silver D, Hessel M, Mankowitz D, Zidek A, Munos R (2018) Transfer in deep reinforcement learning using successor features and generalised policy improvement. In: International Conference on Machine Learning, pages 510–519
Beijbom O (2012) Domain adaptations for computer vision applications. arXiv:1211.4860
Bocsi B, Csató L, Peters J (2013) Alignment-based transfer learning for robot models. In: The 2013 international joint conference on neural networks (IJCNN), pp 1–7. IEEE
Culotta A (2016) Training a text classifier with a single word using twitter lists and domain adaptation. Soc Netw Anal Min 6(1):1–15
Dabney W, Barto A G (2012) Adaptive step-size for online temporal difference learning
Dayan P (1993) Improving generalization for temporal difference learning: The successor representation. Neural Comput 5(4):613–624
Fang M, Guo Y, Zhang X, Li X (2015) Multi-source transfer learning based on label shared subspace. Pattern Recogn Lett 51:101–106
Ferns N, Panangaden P, Precup D (2011) Bisimulation metrics for continuous markov decision processes. SIAM J Comput 40(6):1662–1714
Florensa C, Duan Y, Abbeel P (2017) Stochastic neural networks for hierarchical reinforcement learning. arXiv:1704.03012
Florensa C, Held D, Geng X, Abbeel P (2018) Automatic goal generation for reinforcement learning agents. In: International conference on machine learning, pp 1514–1523
Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 513–520
Gretton A, Borgwardt K M, Rasch M J, Schölkopf B, Smola A (2008) A kernel method for the two-sample problem. J Mach Learn Res 1:1–10
Gupta A, Devin C, Liu Y, Abbeel P, Levine S (2017) Learning invariant feature spaces to transfer skills with reinforcement learning. arXiv:1703.02949
Heess N, Wayne G, Tassa Y, Lillicrap T, Riedmiller M, Silver D (2016) Learning and transfer of modulated locomotor controllers, 2016. arXiv:1610.05182
Held D, Geng X, Florensa C, Abbeel P (2017) Automatic goal generation for reinforcement learning agents. arXiv:1705.06366
Konidaris G, Barto A G (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in neural information processing systems, pages 1015–1023
Konidaris G, Kuindersma S, Grupen R, Barto A G (2010) Constructing skill trees for reinforcement learning agents from demonstration trajectories, pages 1162–1170
Konidaris G, Kuindersma S, Grupen R, Barto A CST: Constructing skill trees by demonstration. In: Proceedings of the ICML workshop on new developments in imitation learning, 2011
Konidaris G, Thomas P, Osentoski S, Thomas P Value Function Approximation in Reinforcement Learning using the Fourier Basis. Proceedings of the 25th conference on artificial intelligence, pp 380–385, 2011
Konidaris G, Kuindersma S, Grupen R, Barto A (2012) Robot learning from demonstration by constructing skill trees. Int J Robot Res 31(3):360–375
Lakshminarayanan A S, Krishnamurthy R, Kumar P, Ravindran B (2016) Option discovery in hierarchical reinforcement learning using spatio-temporal clustering. arXiv:1605.05359. Presented at ICML-16 Workshop on Abstraction in Reinforcement Learning
Lazaric A (2012) Transfer in reinforcement learning: A framework and a survey. Reinforcement Learning 12:143–173
Lazaric A, Restelli M (2011) Transfer from Multiple MDPs. In: Advances in neural information processing systems, pp 1746–1754
Lazaric A, Restelli M, Bonarini A (2008) Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th international conference on machine learning - ICML ’08. ACM Press, New York, pp 544–551
Lehnert L, Tellex S, Littman M L (2017) Advantages and limitations of using successor features for transfer in reinforcement learning. arXiv:1708.00102
Li M, Dai Q (2018) A novel knowledge-leverage-based transfer learning algorithm. Appl Intell 48(8):2355–2372
Liu Y, Stone P (1999) Value-function-based transfer for reinforcement learning using structure mapping. In: Proceedings of the national conference on artificial intelligence, vol 21. AAAI Press, Menlo Park, p 415
Long M, Wang J, Ding G, Shen D, Yang Q (2014) Transfer learning with graph co-regularization. IEEE Trans Knowl Data Eng 26(7):1805–1818
Ma C, Wen J, Bengio Y (2018) Universal successor representations for transfer reinforcement learning. arXiv:1804.03758
Machado M C, Bellemare M G, Bowling M (2017) A laplacian framework for option discovery in reinforcement learning. arXiv:1703.00956
Mahadevan S, Maggioni M (2007) Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. J Mach Learn Res 8(2169-2231):16
Mao Q, Xue W, Rao Q, Zhang F, Zhan Y (2016) Domain adaptation for speech emotion recognition by sharing priors between related source and target classes. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2608–2612. IEEE
McGovern A, Barto A G (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the 18th international conference on machine learning, pp 361–368
Moore A W (1990) Efficient memory-based learning for robot control. University of Cambridge, PhD thesis
Moradi P, Shiri M E, Entezari N (2010) Automatic skill acquisition in reinforcement learning agents using connection bridge centrality. Communications in Computer and Information Science, 51–62
Moradi P, Shiri M E, Rad A A, Khadivi A, Hasler M (2012) Automatic skill acquisition in reinforcement learning using graph centrality measures. Intelligent Data Analysis 16:113–135
Müller K-R, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201
Nachum O, Gu S, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. arXiv:1805.08296
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Pan S J, Tsang I W, Kwok J T, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210
Patel V M, Gopalan R, Li R, Chellappa R (2015) Visual domain adaptation: A survey of recent advances. IEEE Signal Process Mag 32(3):53–69
Shoeleh F, Asadpour M (2017) Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Pattern Recogn Lett 87:104–116
Shoeleh F, Asadpour M (2017) Transfer learning through graph-based skill acquisition. Workshop on Transfer in Reinforcement Learning (TiRL)
Simsek O (2008) Behavioral building blocks for autonomous agents: description, identification, and learning. PhD thesis, University of Massachusetts Amherst
Soni V, Singh S (2006) Using homomorphisms to transfer options across continuous reinforcement learning domains. In: AAAI, vol 6, pp 494–499
Stolle M, Precup D (2002) Learning options in reinforcement learning. In: International symposium on abstraction, reformulation, and approximation, pp 212–223. Springer
Sutton RSS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell 112(1-2):181–211
Taghizadeh N, Beigy H (2013) A novel graphical approach to automatic abstraction in reinforcement learning. Robot Auton Syst 61(8):821–835
Taylor M E, Stone P (2009) Transfer learning for reinforcement learning domains: A survey. J Mach Learn Res 10:1633–1685
Taylor M E, Stone P (2011) An introduction to intertask transfer for reinforcement learning. AI Mag 32 (1):15
Taylor M, Whiteson S, Stone P (2007) Transfer via inter-task mappings in policy search reinforcement learning. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems. http://www.cs.utexas.edu/users/ai-lab/?taylor:ijcaams07
Wang H, Fan S, Song J, Gao Y, Chen X (2014) Reinforcement learning transfer based on subgoal discovery and subtask similarity. IEEE/CAA Journal of Automatica Sinica 1(3):257–266
Zhang Y, Tang B, Jiang M, Wang J, Xu H (2015) Domain adaptation for semantic role labeling of clinical text. Journal of the American Medical Informatics Association, pp 48–56
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shoeleh, F., Asadpour, M. Skill based transfer learning with domain adaptation for continuous reinforcement learning domains. Appl Intell 50, 502–518 (2020). https://doi.org/10.1007/s10489-019-01527-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01527-z