Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories

García, Javier; Rañó, Iñaki; Burés, J. Miguel; Fdez-Vidal, Xosé R.; Iglesias, Roberto

doi:10.1007/s10489-024-06190-7

Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories

Published: 24 December 2024

Volume 55, article number 219, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Javier García ORCID: orcid.org/0000-0002-5638-5240¹,
Iñaki Rañó¹^na1,
J. Miguel Burés²^na1,
Xosé R. Fdez-Vidal²^na1 &
…
Roberto Iglesias²^na1

99 Accesses
Explore all metrics

Abstract

In many reinforcement learning (RL) tasks, the state-action space may be subject to changes over time (e.g., increased number of observable features, changes of representation of actions). Given these changes, the previously learnt policy will likely fail due to the mismatch of input and output features, and another policy must be trained from scratch, which is inefficient in terms of sample complexity. Recent works in transfer learning have succeeded in making RL algorithms more efficient by incorporating knowledge from previous tasks, thus partially alleviating this problem. However, such methods typically must provide an explicit state-action correspondence of one task into the other. An autonomous agent may not have access to such high-level information, but should be able to analyze its experience to identify similarities between tasks. In this paper, we propose a novel method for automatically learning a correspondence of states and actions from one task to another through an agent’s experience. In contrast to previous approaches, our method is based on two key insights: i) only the first state of the trajectories of the two tasks is paired, while the rest are unpaired and randomly collected, and ii) the transition model of the source task is used to predict the dynamics of the target task, thus aligning the unpaired states and actions. Additionally, this paper intentionally decouples the learning of the state-action corresponce from the transfer technique used, making it easy to combine with any transfer method. Our experiments demonstrate that our approach significantly accelerates transfer learning across a diverse set of problems, varying in state/action representation, physics parameters, and morphology, when compared to state-of-the-art algorithms that rely on cycle-consistency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformed Successor Features for Transfer Reinforcement Learning

Data-efficient model-based reinforcement learning with trajectory discrimination

Article Open access 11 October 2023

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

Article 19 March 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets generated during and/or analyzed during the current study are available in the GIT repository, https://github.com/fjaviergp/learning_correspondence_paper

Notes

In RL, a trajectory refers to a sequence of states, actions, and rewards that an agent experiences while interacting with an environment over time.

References

Sutton RS, Barto AG (2011) Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA
Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2023) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nat 529(7587):484
Sinha S, Mandlekar A, Garg A (2022) S4rl: Surprisingly simple self-supervision for offline reinforcement learning in robotics. In: Conference on Robot Learning, PMLR. pp 907–917
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: A survey. J Mach Learn Res 10(7)
Lazaric A (2012) Transfer in reinforcement learning: a framework and a survey. Reinforcement Learning: State of the Art, 143–173
Fernández F, Veloso M (2006) Probabilistic policy reuse in a reinforcement learning agent. In: Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’06)
Zhang Q, Xiao T, Efros AA, Pinto L, Wang X (2020) Learning cross-domain correspondence for control with dynamics cycle-consistency. arXiv preprint arXiv:2012.09811
You H, Yang T, Zheng Y, Hao J, E Taylor M (2022) Cross-domain adaptive transfer reinforcement-learning based on state-action correspondence. In: Uncertainty in Artificial Intelligence, PMLR, pp 2299–2309
Gupta A, Devin C, Liu Y, Abbeel P, Levine S (2017) Learning invariant feature spaces to transfer skills with reinforcement learning. arXiv preprint arXiv:1703.02949
Taylor ME, Kuhlmann G, Stone P (2008) Autonomous transfer for reinforcement learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, ACM, pp 283–290
García J, Visús Á, Fernández F (2022) A taxonomy for similarity metrics between markov decision processes. Mach Learn 111(11):4217–4247
Article MathSciNet Google Scholar
Wan M, Gangwani T, Peng J (2020) Mutual information based knowledge transfer under state-action dimension mismatch. arXiv preprint arXiv:2006.07041
Fernández F, Veloso M (2013) Learning domain structure through probabilistic policy reuse in reinforcement learning. Prog Artif Intell 2(1):13–27
Article Google Scholar
Gamrian S, Goldberg Y (2019) Transfer learning for related reinforcement learning tasks via image-to-image translation. In: International Conference on Machine Learning, PMLR, pp 2063–2072
Watkins C (1989) Learning from delayed rewards. PhD thesis, King’s College, Cambridge, UK
Sinclair SR, Banerjee S, Yu CL (2023) Adaptive discretization in online reinforcement learning. Oper Res 71(5):1636–1652
Article MathSciNet Google Scholar
Reinforcement Learning (2014) State-of-the-Art. In: Wiering M, Van Otterlo M (eds) Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Germany
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2021) A comprehensive survey on transfer learning. IEEE Trans Neural Netw Learn Syst 32(10):4100–4122
Google Scholar
Fernández D, Fernández F, García J (2021) Probabilistic multi-knowledge transfer in reinforcement learning. In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp 471–476
Torrey L, Walker T, Shavlik J, Maclin R (2005) Using advice to transfer knowledge acquired in one reinforcement learning task to another. In: Machine Learning: ECML 2005: 16th European Conference on Machine Learning. Proceedings 16, Springer, Porto, Portugal, 3-7 Oct 2005. pp 412–424
Taylor ME, Stone P, Liu Y (2007) Transfer learning via inter-task mappings for temporal difference learning. J Mach Learn Res 8(1):2125–2167
MathSciNet Google Scholar
Fernández F, García J, Veloso M (2010) Probabilistic policy reuse for inter-task transfer learning. Robot Auton Syst 58(7):866–871
Article Google Scholar
Ammar HB, Taylor ME (2012) Reinforcement learning transfer via common subspaces. In: Adaptive and Learning Agents: International Workshop, ALA 2011, Held at AAMAS 2011, Taipei, Taiwan, May 2, 2011, Revised Selected Papers, Springer, pp 21–36
Sun, Y., Yin, X., Huang, F.: Temple: Learning template of transitions for sample efficient multi-task rl. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 9765–9773 (2021)
Sun Y, Zheng R, Wang X, Cohen A, Huang F (2022) Transfer rl across observation feature spaces via model-based regularization. arXiv preprint arXiv:2201.00248
Chen Y, Chen Y, Hu Z, Yang T, Fan C, Yu Y, Hao J (2019) Learning action-transferable policy with action embedding. arXiv preprint arXiv:1909.02291
Raiman J, Zhang S, Dennison C (2019) Neural network surgery with sets. arXiv preprint arXiv:1912.06719
Buljan M, Canal O, Taschin F (2021) Neural Network Surgery in Deep Reinforcement Learning. Accessed 10 Dec 2024. https://campusai.github.io/pdf/nn-surgery-report.pdf
Sermanet P, Lynch C, Chebotar Y, Hsu J, Jang E, Schaal S, Levine S, Brain G (2018) Time-contrastive networks: Self-supervised learning from video. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1134–1141
Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bull 2(4):160–163
Article Google Scholar
Wu G, Fang W, Wang J, Ge P, Cao J, Ping Y, Gou P (2022) Dyna-ppo reinforcement learning with gaussian process for the continuous action decision-making in autonomous driving. Appl Intell 1–15
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI Gym
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nat 518(7540):529
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
Barnett SA (2018) Convergence problems with generative adversarial networks (gans). arXiv preprint arXiv:1806.11382
Da Silva FL, Costa AHR (2019) A survey on transfer learning for multiagent reinforcement learning systems. J Artif Intell Res 64:645–703

Download references

Funding

This research was supported in part by AEI Grant PID2020-119367RB-I00 and PID2023-153341OB-I00.

Author information

Iñaki Rañó, J. Miguel Burés, Xosé R. Fdez-Vidal, and Roberto Iglesias contributed equally to this work.

Authors and Affiliations

Department of Electronics and Computer Science, Universidad de Santiago de Compostela, Lugo, Spain
Javier García & Iñaki Rañó
CiTIUS (Centro de Investigación en Tecnoloxías Intelixentes), Universidad de Santiago de Compostela, Santiago de Compostela, Spain
J. Miguel Burés, Xosé R. Fdez-Vidal & Roberto Iglesias

Authors

Javier García
View author publications
You can also search for this author in PubMed Google Scholar
Iñaki Rañó
View author publications
You can also search for this author in PubMed Google Scholar
J. Miguel Burés
View author publications
You can also search for this author in PubMed Google Scholar
Xosé R. Fdez-Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Iglesias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javier García.

Ethics declarations

Conflicts of Interest

On behalf of all authors, the corresponding author states there is no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Experiment setting

Table 3 displays the parameter settings used in the training of the source policy for each domain, spanning across seven dimensions: the state space, $\mathcal S$, the action space, $\mathcal A$, the learning algorithm, the number of episodes, H, and the maximum number of steps per episode, K. The learning rate, $\alpha $, and the discount factor, $\gamma $, are set to $\alpha =10^{-3}$ and $\gamma =0.99$, respectively.

Table 5 Parameter setting for the learning of the source policy

Full size table

Table 4 presents the parameter settings employed for each domain in the forward dynamics model. This model takes the current state and action as input and predicts the next state. The number of samples utilized for learning the forward dynamics models is the same as that employed in the learning of the source policies.

Finally, Table 5 shows the network architectures for the functions $\phi _{s}$, $\phi _{a}$, and $\phi _{a}^{-}$. Table 5 presents supplementary rows for each of the target tasks designed within the Swimmer domain. These rows are denoted as Swimmer-X, with X representing the specific number of limbs under consideration.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

García, J., Rañó, I., Burés, J.M. et al. Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories. Appl Intell 55, 219 (2025). https://doi.org/10.1007/s10489-024-06190-7

Download citation

Accepted: 11 December 2024
Published: 24 December 2024
DOI: https://doi.org/10.1007/s10489-024-06190-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Transformed Successor Features for Transfer Reinforcement Learning

Data-efficient model-based reinforcement learning with trajectory discrimination

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Appendix A Experiment setting

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Transformed Successor Features for Transfer Reinforcement Learning

Data-efficient model-based reinforcement learning with trajectory discrimination

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

Explore related subjects

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Appendix A Experiment setting

Appendix A Experiment setting

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation