Skip to main content
Log in

Skill based transfer learning with domain adaptation for continuous reinforcement learning domains

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Although reinforcement learning is known as an effective machine learning technique, it might perform poorly in complex problems, especially real-world problems, leading to a slow rate of convergence. This issue magnifies when facing continuous domains where the curse of dimensionality is inevitable, and generalization is mostly desired. Transfer learning is a successful technique to remedy such a problem which results in significant improvements in learning performance by providing generalization not only within a task but also across different but related or similar tasks. The critical issue in transfer learning is how to incorporate the knowledge acquired from learning in a different but related task in the past. Domain adaptation is an exciting paradigm that seeks to address this challenge. In this paper, we propose a novel skill based Transfer Learning with Domain Adaptation (TLDA) approach suitable for continuous RL problems. TLDA discovers and learns skills as high-level knowledge from source task and then uses domain adaptation technique to help agent discover state-action mapping as a relation between the source and target tasks. With such mapping, TLDA can adapt source skills and speed up learning on a new target task. The experimental results verify the achievement of an effective transfer learning method for continuous reinforcement learning problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ammar H B, Tuyls K, Taylor M E, Driessens K, Weiss G (2012) Reinforcement learning transfer via sparse coding. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems-Volume 1, pp 383–390. International Foundation for Autonomous Agents and Multiagent Systems

  2. Ammar HB, Eaton E, Ruvolo P, Taylor M (2014) Online multi-task learning for policy gradient methods. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 1206–1214

  3. Ammar HB, Eaton E, Taylor M E, Mocanu D C, Driessens K, Weiss G, Tuyls K (2014) An automated measure of mdp similarity for transfer in reinforcement learning. In: Workshops at the 28th AAAI conference on artificial intelligence

  4. Ammar H B, Eaton E, Ruvolo P, Taylor M E (2015) Unsupervised cross-domain transfer in policy gradient reinforcement learning via manifold alignment. In: Proc. of AAAI

  5. Asadi M, Huber M (2007) Effective control knowledge transfer through learning skill and representation hierarchies. In: 20th international joint conference on artificial intelligence, number Icml, pp 2054–2059

  6. Asadi M, Huber M (2015) A dynamic hierarchical task transfer in multiple robot explorations. In: Proceedings on the international conference on artificial intelligence (ICAI), vol 8, pp 22–27

  7. Barreto A, Dabney W, Munos R, Hunt J J, Schaul T, van Hasselt H P, Silver D (2017) Successor features for transfer in reinforcement learning. In: Advances in neural information processing systems, pp 4055–4065

  8. Barreto A, Borsa D, Quan J, Schaul T, Silver D, Hessel M, Mankowitz D, Zidek A, Munos R (2018) Transfer in deep reinforcement learning using successor features and generalised policy improvement. In: International Conference on Machine Learning, pages 510–519

  9. Beijbom O (2012) Domain adaptations for computer vision applications. arXiv:1211.4860

  10. Bocsi B, Csató L, Peters J (2013) Alignment-based transfer learning for robot models. In: The 2013 international joint conference on neural networks (IJCNN), pp 1–7. IEEE

  11. Culotta A (2016) Training a text classifier with a single word using twitter lists and domain adaptation. Soc Netw Anal Min 6(1):1–15

    Article  Google Scholar 

  12. Dabney W, Barto A G (2012) Adaptive step-size for online temporal difference learning

  13. Dayan P (1993) Improving generalization for temporal difference learning: The successor representation. Neural Comput 5(4):613–624

    Article  Google Scholar 

  14. Fang M, Guo Y, Zhang X, Li X (2015) Multi-source transfer learning based on label shared subspace. Pattern Recogn Lett 51:101–106

    Article  Google Scholar 

  15. Ferns N, Panangaden P, Precup D (2011) Bisimulation metrics for continuous markov decision processes. SIAM J Comput 40(6):1662–1714

    Article  MathSciNet  Google Scholar 

  16. Florensa C, Duan Y, Abbeel P (2017) Stochastic neural networks for hierarchical reinforcement learning. arXiv:1704.03012

  17. Florensa C, Held D, Geng X, Abbeel P (2018) Automatic goal generation for reinforcement learning agents. In: International conference on machine learning, pp 1514–1523

  18. Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 513–520

  19. Gretton A, Borgwardt K M, Rasch M J, Schölkopf B, Smola A (2008) A kernel method for the two-sample problem. J Mach Learn Res 1:1–10

    MATH  Google Scholar 

  20. Gupta A, Devin C, Liu Y, Abbeel P, Levine S (2017) Learning invariant feature spaces to transfer skills with reinforcement learning. arXiv:1703.02949

  21. Heess N, Wayne G, Tassa Y, Lillicrap T, Riedmiller M, Silver D (2016) Learning and transfer of modulated locomotor controllers, 2016. arXiv:1610.05182

  22. Held D, Geng X, Florensa C, Abbeel P (2017) Automatic goal generation for reinforcement learning agents. arXiv:1705.06366

  23. Konidaris G, Barto A G (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in neural information processing systems, pages 1015–1023

  24. Konidaris G, Kuindersma S, Grupen R, Barto A G (2010) Constructing skill trees for reinforcement learning agents from demonstration trajectories, pages 1162–1170

  25. Konidaris G, Kuindersma S, Grupen R, Barto A CST: Constructing skill trees by demonstration. In: Proceedings of the ICML workshop on new developments in imitation learning, 2011

  26. Konidaris G, Thomas P, Osentoski S, Thomas P Value Function Approximation in Reinforcement Learning using the Fourier Basis. Proceedings of the 25th conference on artificial intelligence, pp 380–385, 2011

  27. Konidaris G, Kuindersma S, Grupen R, Barto A (2012) Robot learning from demonstration by constructing skill trees. Int J Robot Res 31(3):360–375

    Article  Google Scholar 

  28. Lakshminarayanan A S, Krishnamurthy R, Kumar P, Ravindran B (2016) Option discovery in hierarchical reinforcement learning using spatio-temporal clustering. arXiv:1605.05359. Presented at ICML-16 Workshop on Abstraction in Reinforcement Learning

  29. Lazaric A (2012) Transfer in reinforcement learning: A framework and a survey. Reinforcement Learning 12:143–173

    Article  Google Scholar 

  30. Lazaric A, Restelli M (2011) Transfer from Multiple MDPs. In: Advances in neural information processing systems, pp 1746–1754

  31. Lazaric A, Restelli M, Bonarini A (2008) Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th international conference on machine learning - ICML ’08. ACM Press, New York, pp 544–551

  32. Lehnert L, Tellex S, Littman M L (2017) Advantages and limitations of using successor features for transfer in reinforcement learning. arXiv:1708.00102

  33. Li M, Dai Q (2018) A novel knowledge-leverage-based transfer learning algorithm. Appl Intell 48(8):2355–2372

    Article  Google Scholar 

  34. Liu Y, Stone P (1999) Value-function-based transfer for reinforcement learning using structure mapping. In: Proceedings of the national conference on artificial intelligence, vol 21. AAAI Press, Menlo Park, p 415

  35. Long M, Wang J, Ding G, Shen D, Yang Q (2014) Transfer learning with graph co-regularization. IEEE Trans Knowl Data Eng 26(7):1805–1818

    Article  Google Scholar 

  36. Ma C, Wen J, Bengio Y (2018) Universal successor representations for transfer reinforcement learning. arXiv:1804.03758

  37. Machado M C, Bellemare M G, Bowling M (2017) A laplacian framework for option discovery in reinforcement learning. arXiv:1703.00956

  38. Mahadevan S, Maggioni M (2007) Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. J Mach Learn Res 8(2169-2231):16

    MathSciNet  MATH  Google Scholar 

  39. Mao Q, Xue W, Rao Q, Zhang F, Zhan Y (2016) Domain adaptation for speech emotion recognition by sharing priors between related source and target classes. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2608–2612. IEEE

  40. McGovern A, Barto A G (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the 18th international conference on machine learning, pp 361–368

  41. Moore A W (1990) Efficient memory-based learning for robot control. University of Cambridge, PhD thesis

    Google Scholar 

  42. Moradi P, Shiri M E, Entezari N (2010) Automatic skill acquisition in reinforcement learning agents using connection bridge centrality. Communications in Computer and Information Science, 51–62

  43. Moradi P, Shiri M E, Rad A A, Khadivi A, Hasler M (2012) Automatic skill acquisition in reinforcement learning using graph centrality measures. Intelligent Data Analysis 16:113–135

    Article  Google Scholar 

  44. Müller K-R, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201

    Article  Google Scholar 

  45. Nachum O, Gu S, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. arXiv:1805.08296

  46. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  47. Pan S J, Tsang I W, Kwok J T, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210

    Article  Google Scholar 

  48. Patel V M, Gopalan R, Li R, Chellappa R (2015) Visual domain adaptation: A survey of recent advances. IEEE Signal Process Mag 32(3):53–69

    Article  Google Scholar 

  49. Shoeleh F, Asadpour M (2017) Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Pattern Recogn Lett 87:104–116

    Article  Google Scholar 

  50. Shoeleh F, Asadpour M (2017) Transfer learning through graph-based skill acquisition. Workshop on Transfer in Reinforcement Learning (TiRL)

  51. Simsek O (2008) Behavioral building blocks for autonomous agents: description, identification, and learning. PhD thesis, University of Massachusetts Amherst

  52. Soni V, Singh S (2006) Using homomorphisms to transfer options across continuous reinforcement learning domains. In: AAAI, vol 6, pp 494–499

  53. Stolle M, Precup D (2002) Learning options in reinforcement learning. In: International symposium on abstraction, reformulation, and approximation, pp 212–223. Springer

  54. Sutton RSS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell 112(1-2):181–211

    Article  MathSciNet  Google Scholar 

  55. Taghizadeh N, Beigy H (2013) A novel graphical approach to automatic abstraction in reinforcement learning. Robot Auton Syst 61(8):821–835

    Article  Google Scholar 

  56. Taylor M E, Stone P (2009) Transfer learning for reinforcement learning domains: A survey. J Mach Learn Res 10:1633–1685

    MathSciNet  MATH  Google Scholar 

  57. Taylor M E, Stone P (2011) An introduction to intertask transfer for reinforcement learning. AI Mag 32 (1):15

    Article  Google Scholar 

  58. Taylor M, Whiteson S, Stone P (2007) Transfer via inter-task mappings in policy search reinforcement learning. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems. http://www.cs.utexas.edu/users/ai-lab/?taylor:ijcaams07

  59. Wang H, Fan S, Song J, Gao Y, Chen X (2014) Reinforcement learning transfer based on subgoal discovery and subtask similarity. IEEE/CAA Journal of Automatica Sinica 1(3):257–266

    Article  Google Scholar 

  60. Zhang Y, Tang B, Jiang M, Wang J, Xu H (2015) Domain adaptation for semantic role labeling of clinical text. Journal of the American Medical Informatics Association, pp 48–56

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farzaneh Shoeleh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shoeleh, F., Asadpour, M. Skill based transfer learning with domain adaptation for continuous reinforcement learning domains. Appl Intell 50, 502–518 (2020). https://doi.org/10.1007/s10489-019-01527-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01527-z

Keywords

Navigation