Skip to main content

Skill Learning for Long-Horizon Sequential Tasks

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13566))

Included in the following conference series:

  • 1445 Accesses

Abstract

Solving long-horizon problems is a desirable property in autonomous agents. Learning reusable behaviours can equip the agent with this property, allowing it to adapt them when performing various real-world tasks. Our approach for learning these behaviours is composed of three modules, operating in two separate timescales and it uses a hierarchical model with both discrete and continuous variables. This modular structure allows an independent training process for each stage. These stages are organized using a two-level temporal hierarchy. The first level contains the planner, responsible for issuing the skills that should be executed, while the second level executes the skill. In this latter level, to achieve the desired skill behaviour, the discrete skill is converted to a continuous vector that contains information regarding which environment change must occur. With this approach, we aimed to solve long-horizon sequential tasks with delayed rewards. Contrary to existing work, our method uses both variable types to allow an agent to learn high-level behaviours consisting of an interpretable set of skills. This method allows to compose the discrete skills easily, while keeping the flexibility, provided by the continuous representations, to execute them in several different ways. Using a 2D scenario where the agent has to catch a set of objects in a specific order, we demonstrate that our approach is scalable to scenarios with increasingly longer tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Stable baselines 3 PPO.

References

  1. Ajay, A., Kumar, A., Agrawal, P., Levine, S., Nachum, O.: Opal: offline primitive discovery for accelerating offline reinforcement learning. In: International Conference on Learning Representations (ICLR) (2021)

    Google Scholar 

  2. Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  3. Hakhamaneshi, K., Zhao, R., Zhan, A., Abbeel, P., Laskin, M.: Hierarchical few-shot imitation with skill transition models. In: International Conference on Learning Representations (ICLR) (2022)

    Google Scholar 

  4. Kipf, T., et al.: Compile: compositional imitation learning and execution. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 3418–3428. PMLR, 09–15 Jun 2019

    Google Scholar 

  5. Kroemer, O., Niekum, S., Konidaris, G.: A review of robot learning for manipulation: challenges, representations, and algorithms. J. Mach. Learn. Res. 22(30), 1–82 (2021)

    MathSciNet  MATH  Google Scholar 

  6. Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191. Curran Associates, Inc. (2020)

    Google Scholar 

  7. Lynch, C., et al.: Learning latent plans from play. In: 3rd Conference on Robot Learning (2019)

    Google Scholar 

  8. Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 3303–3313. Curran Associates, Inc. (2018)

    Google Scholar 

  9. Pertsch, K., Lee, Y., Wu, Y., Lim, J.J.: Demonstration-guided reinforcement learning with learned skills. In: 5th Conference on Robot Learning (2021)

    Google Scholar 

  10. Rao, D., et al.: Learning transferable motor skills with hierarchical latent mixture policies. In: International Conference on Learning Representations (ICLR) (2022)

    Google Scholar 

  11. Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 15, pp. 627–635. PMLR, Fort Lauderdale, FL, USA, 11–13 Apr 2011

    Google Scholar 

  12. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  13. Shankar, T., Gupta, A.: Learning robot skills with temporal variational inference. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 8624–8633. PMLR (2020)

    Google Scholar 

  14. Świechowski, M., Godlewski, K., Sawicki, B., Mańdziuk, J.: Monte Carlo tree search: a review of recent modifications and applications. arXiv preprint arXiv:2103.04931 (2021)

  15. Tanneberg, D., Ploeger, K., Rueckert, E., Peters, J.: SKID raw: skill discovery from raw trajectories. IEEE Robot. Autom. Lett. 6(3), 4696–4703 (2021)

    Article  Google Scholar 

  16. Wulfmeier, M., et al.: Compositional transfer in hierarchical reinforcement learning. In: Proceedings of Robotics: Science and Systems (2020)

    Google Scholar 

  17. Wulfmeier, M., et al.: Data-efficient hindsight off-policy option learning. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 11340–11350. PMLR, 18–24 Jul 2021

    Google Scholar 

  18. Zhang, J., Yu, H., Xu, W.: Hierarchical reinforcement learning by discovering intrinsic options. In: International Conference on Learning Representations (ICLR) (2021)

    Google Scholar 

  19. Zhu, Y., Stone, P., Zhu, Y.: Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation. IEEE Robot. Autom. Lett. 7(2), 4126–4133 (2022). https://doi.org/10.1109/LRA.2022.3146589

    Article  Google Scholar 

Download references

Acknowledgement

This research was developed in the scope of the PhD grant[2020.05789.BD], funded by FCT - Foundation for Science and Technology. This study was also supported by IEETA - Institute of Electronics and Informatics Engineering of Aveiro, funded by National Funds through the FCT - Foundation for Science and Technology, in the context of the project [UIDB/00127/2020].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Alves .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alves, J., Lau, N., Silva, F. (2022). Skill Learning for Long-Horizon Sequential Tasks. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds) Progress in Artificial Intelligence. EPIA 2022. Lecture Notes in Computer Science(), vol 13566. Springer, Cham. https://doi.org/10.1007/978-3-031-16474-3_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16474-3_58

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16473-6

  • Online ISBN: 978-3-031-16474-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics