Skill Learning for Long-Horizon Sequential Tasks

Alves, João; Lau, Nuno; Silva, Filipe

doi:10.1007/978-3-031-16474-3_58

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13566))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

1767 Accesses

Abstract

Solving long-horizon problems is a desirable property in autonomous agents. Learning reusable behaviours can equip the agent with this property, allowing it to adapt them when performing various real-world tasks. Our approach for learning these behaviours is composed of three modules, operating in two separate timescales and it uses a hierarchical model with both discrete and continuous variables. This modular structure allows an independent training process for each stage. These stages are organized using a two-level temporal hierarchy. The first level contains the planner, responsible for issuing the skills that should be executed, while the second level executes the skill. In this latter level, to achieve the desired skill behaviour, the discrete skill is converted to a continuous vector that contains information regarding which environment change must occur. With this approach, we aimed to solve long-horizon sequential tasks with delayed rewards. Contrary to existing work, our method uses both variable types to allow an agent to learn high-level behaviours consisting of an interpretable set of skills. This method allows to compose the discrete skills easily, while keeping the flexibility, provided by the continuous representations, to execute them in several different ways. Using a 2D scenario where the agent has to catch a set of objects in a specific order, we demonstrate that our approach is scalable to scenarios with increasingly longer tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adaptive Skill Acquisition in Hierarchical Reinforcement Learning

Advances in Adaptive Skill Acquisition

Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Notes

1.
Stable baselines 3 PPO.

References

Ajay, A., Kumar, A., Agrawal, P., Levine, S., Nachum, O.: Opal: offline primitive discovery for accelerating offline reinforcement learning. In: International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Hakhamaneshi, K., Zhao, R., Zhan, A., Abbeel, P., Laskin, M.: Hierarchical few-shot imitation with skill transition models. In: International Conference on Learning Representations (ICLR) (2022)
Google Scholar
Kipf, T., et al.: Compile: compositional imitation learning and execution. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 3418–3428. PMLR, 09–15 Jun 2019
Google Scholar
Kroemer, O., Niekum, S., Konidaris, G.: A review of robot learning for manipulation: challenges, representations, and algorithms. J. Mach. Learn. Res. 22(30), 1–82 (2021)
MathSciNet MATH Google Scholar
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191. Curran Associates, Inc. (2020)
Google Scholar
Lynch, C., et al.: Learning latent plans from play. In: 3rd Conference on Robot Learning (2019)
Google Scholar
Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 3303–3313. Curran Associates, Inc. (2018)
Google Scholar
Pertsch, K., Lee, Y., Wu, Y., Lim, J.J.: Demonstration-guided reinforcement learning with learned skills. In: 5th Conference on Robot Learning (2021)
Google Scholar
Rao, D., et al.: Learning transferable motor skills with hierarchical latent mixture policies. In: International Conference on Learning Representations (ICLR) (2022)
Google Scholar
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 15, pp. 627–635. PMLR, Fort Lauderdale, FL, USA, 11–13 Apr 2011
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Shankar, T., Gupta, A.: Learning robot skills with temporal variational inference. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 8624–8633. PMLR (2020)
Google Scholar
Świechowski, M., Godlewski, K., Sawicki, B., Mańdziuk, J.: Monte Carlo tree search: a review of recent modifications and applications. arXiv preprint arXiv:2103.04931 (2021)
Tanneberg, D., Ploeger, K., Rueckert, E., Peters, J.: SKID raw: skill discovery from raw trajectories. IEEE Robot. Autom. Lett. 6(3), 4696–4703 (2021)
Article Google Scholar
Wulfmeier, M., et al.: Compositional transfer in hierarchical reinforcement learning. In: Proceedings of Robotics: Science and Systems (2020)
Google Scholar
Wulfmeier, M., et al.: Data-efficient hindsight off-policy option learning. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 11340–11350. PMLR, 18–24 Jul 2021
Google Scholar
Zhang, J., Yu, H., Xu, W.: Hierarchical reinforcement learning by discovering intrinsic options. In: International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Zhu, Y., Stone, P., Zhu, Y.: Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation. IEEE Robot. Autom. Lett. 7(2), 4126–4133 (2022). https://doi.org/10.1109/LRA.2022.3146589
Article Google Scholar

Download references

Acknowledgement

This research was developed in the scope of the PhD grant[2020.05789.BD], funded by FCT - Foundation for Science and Technology. This study was also supported by IEETA - Institute of Electronics and Informatics Engineering of Aveiro, funded by National Funds through the FCT - Foundation for Science and Technology, in the context of the project [UIDB/00127/2020].

Author information

Authors and Affiliations

University of Aveiro, DETI-IEETA, Aveiro, Portugal
João Alves, Nuno Lau & Filipe Silva

Authors

João Alves
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Lau
View author publications
You can also search for this author in PubMed Google Scholar
Filipe Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Alves .

Editor information

Editors and Affiliations

ISEP/GECAD, Polytechnic Institute of Porto, Porto, Portugal
Goreti Marreiros
IST/INESC-ID, University of Lisbon, Lisbon, Portugal
Bruno Martins
IST/INESC-ID, University of Lisbon, Porto Salvo, Portugal
Ana Paiva
CISUC, University of Coimbra, Coimbra, Portugal
Bernardete Ribeiro
IST/INESC-ID, University of Lisbon, Porto Salvo, Portugal
Alberto Sardinha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alves, J., Lau, N., Silva, F. (2022). Skill Learning for Long-Horizon Sequential Tasks. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds) Progress in Artificial Intelligence. EPIA 2022. Lecture Notes in Computer Science(), vol 13566. Springer, Cham. https://doi.org/10.1007/978-3-031-16474-3_58

Download citation

DOI: https://doi.org/10.1007/978-3-031-16474-3_58
Published: 13 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16473-6
Online ISBN: 978-3-031-16474-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Skill Learning for Long-Horizon Sequential Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Skill Acquisition in Hierarchical Reinforcement Learning

Advances in Adaptive Skill Acquisition

Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Skill Learning for Long-Horizon Sequential Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Skill Acquisition in Hierarchical Reinforcement Learning

Advances in Adaptive Skill Acquisition

Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation