Abstract
In many reinforcement learning (RL) tasks, the state-action space may be subject to changes over time (e.g., increased number of observable features, changes of representation of actions). Given these changes, the previously learnt policy will likely fail due to the mismatch of input and output features, and another policy must be trained from scratch, which is inefficient in terms of sample complexity. Recent works in transfer learning have succeeded in making RL algorithms more efficient by incorporating knowledge from previous tasks, thus partially alleviating this problem. However, such methods typically must provide an explicit state-action correspondence of one task into the other. An autonomous agent may not have access to such high-level information, but should be able to analyze its experience to identify similarities between tasks. In this paper, we propose a novel method for automatically learning a correspondence of states and actions from one task to another through an agent’s experience. In contrast to previous approaches, our method is based on two key insights: i) only the first state of the trajectories of the two tasks is paired, while the rest are unpaired and randomly collected, and ii) the transition model of the source task is used to predict the dynamics of the target task, thus aligning the unpaired states and actions. Additionally, this paper intentionally decouples the learning of the state-action corresponce from the transfer technique used, making it easy to combine with any transfer method. Our experiments demonstrate that our approach significantly accelerates transfer learning across a diverse set of problems, varying in state/action representation, physics parameters, and morphology, when compared to state-of-the-art algorithms that rely on cycle-consistency.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets generated during and/or analyzed during the current study are available in the GIT repository, https://github.com/fjaviergp/learning_correspondence_paper
Notes
In RL, a trajectory refers to a sequence of states, actions, and rewards that an agent experiences while interacting with an environment over time.
References
Sutton RS, Barto AG (2011) Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2023) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nat 529(7587):484
Sinha S, Mandlekar A, Garg A (2022) S4rl: Surprisingly simple self-supervision for offline reinforcement learning in robotics. In: Conference on Robot Learning, PMLR. pp 907–917
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: A survey. J Mach Learn Res 10(7)
Lazaric A (2012) Transfer in reinforcement learning: a framework and a survey. Reinforcement Learning: State of the Art, 143–173
Fernández F, Veloso M (2006) Probabilistic policy reuse in a reinforcement learning agent. In: Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’06)
Zhang Q, Xiao T, Efros AA, Pinto L, Wang X (2020) Learning cross-domain correspondence for control with dynamics cycle-consistency. arXiv preprint arXiv:2012.09811
You H, Yang T, Zheng Y, Hao J, E Taylor M (2022) Cross-domain adaptive transfer reinforcement-learning based on state-action correspondence. In: Uncertainty in Artificial Intelligence, PMLR, pp 2299–2309
Gupta A, Devin C, Liu Y, Abbeel P, Levine S (2017) Learning invariant feature spaces to transfer skills with reinforcement learning. arXiv preprint arXiv:1703.02949
Taylor ME, Kuhlmann G, Stone P (2008) Autonomous transfer for reinforcement learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, ACM, pp 283–290
García J, Visús Á, Fernández F (2022) A taxonomy for similarity metrics between markov decision processes. Mach Learn 111(11):4217–4247
Wan M, Gangwani T, Peng J (2020) Mutual information based knowledge transfer under state-action dimension mismatch. arXiv preprint arXiv:2006.07041
Fernández F, Veloso M (2013) Learning domain structure through probabilistic policy reuse in reinforcement learning. Prog Artif Intell 2(1):13–27
Gamrian S, Goldberg Y (2019) Transfer learning for related reinforcement learning tasks via image-to-image translation. In: International Conference on Machine Learning, PMLR, pp 2063–2072
Watkins C (1989) Learning from delayed rewards. PhD thesis, King’s College, Cambridge, UK
Sinclair SR, Banerjee S, Yu CL (2023) Adaptive discretization in online reinforcement learning. Oper Res 71(5):1636–1652
Reinforcement Learning (2014) State-of-the-Art. In: Wiering M, Van Otterlo M (eds) Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Germany
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2021) A comprehensive survey on transfer learning. IEEE Trans Neural Netw Learn Syst 32(10):4100–4122
Fernández D, Fernández F, García J (2021) Probabilistic multi-knowledge transfer in reinforcement learning. In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp 471–476
Torrey L, Walker T, Shavlik J, Maclin R (2005) Using advice to transfer knowledge acquired in one reinforcement learning task to another. In: Machine Learning: ECML 2005: 16th European Conference on Machine Learning. Proceedings 16, Springer, Porto, Portugal, 3-7 Oct 2005. pp 412–424
Taylor ME, Stone P, Liu Y (2007) Transfer learning via inter-task mappings for temporal difference learning. J Mach Learn Res 8(1):2125–2167
Fernández F, García J, Veloso M (2010) Probabilistic policy reuse for inter-task transfer learning. Robot Auton Syst 58(7):866–871
Ammar HB, Taylor ME (2012) Reinforcement learning transfer via common subspaces. In: Adaptive and Learning Agents: International Workshop, ALA 2011, Held at AAMAS 2011, Taipei, Taiwan, May 2, 2011, Revised Selected Papers, Springer, pp 21–36
Sun, Y., Yin, X., Huang, F.: Temple: Learning template of transitions for sample efficient multi-task rl. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 9765–9773 (2021)
Sun Y, Zheng R, Wang X, Cohen A, Huang F (2022) Transfer rl across observation feature spaces via model-based regularization. arXiv preprint arXiv:2201.00248
Chen Y, Chen Y, Hu Z, Yang T, Fan C, Yu Y, Hao J (2019) Learning action-transferable policy with action embedding. arXiv preprint arXiv:1909.02291
Raiman J, Zhang S, Dennison C (2019) Neural network surgery with sets. arXiv preprint arXiv:1912.06719
Buljan M, Canal O, Taschin F (2021) Neural Network Surgery in Deep Reinforcement Learning. Accessed 10 Dec 2024. https://campusai.github.io/pdf/nn-surgery-report.pdf
Sermanet P, Lynch C, Chebotar Y, Hsu J, Jang E, Schaal S, Levine S, Brain G (2018) Time-contrastive networks: Self-supervised learning from video. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1134–1141
Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bull 2(4):160–163
Wu G, Fang W, Wang J, Ge P, Cao J, Ping Y, Gou P (2022) Dyna-ppo reinforcement learning with gaussian process for the continuous action decision-making in autonomous driving. Appl Intell 1–15
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI Gym
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nat 518(7540):529
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
Barnett SA (2018) Convergence problems with generative adversarial networks (gans). arXiv preprint arXiv:1806.11382
Da Silva FL, Costa AHR (2019) A survey on transfer learning for multiagent reinforcement learning systems. J Artif Intell Res 64:645–703
Funding
This research was supported in part by AEI Grant PID2020-119367RB-I00 and PID2023-153341OB-I00.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interest
On behalf of all authors, the corresponding author states there is no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A Experiment setting
Appendix A Experiment setting
Table 3 displays the parameter settings used in the training of the source policy for each domain, spanning across seven dimensions: the state space, \(\mathcal S\), the action space, \(\mathcal A\), the learning algorithm, the number of episodes, H, and the maximum number of steps per episode, K. The learning rate, \(\alpha \), and the discount factor, \(\gamma \), are set to \(\alpha =10^{-3}\) and \(\gamma =0.99\), respectively.
Table 4 presents the parameter settings employed for each domain in the forward dynamics model. This model takes the current state and action as input and predicts the next state. The number of samples utilized for learning the forward dynamics models is the same as that employed in the learning of the source policies.
Finally, Table 5 shows the network architectures for the functions \(\phi _{s}\), \(\phi _{a}\), and \(\phi _{a}^{-}\). Table 5 presents supplementary rows for each of the target tasks designed within the Swimmer domain. These rows are denoted as Swimmer-X, with X representing the specific number of limbs under consideration.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
García, J., Rañó, I., Burés, J.M. et al. Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories. Appl Intell 55, 219 (2025). https://doi.org/10.1007/s10489-024-06190-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06190-7