Abstract
Solving long-horizon problems is a desirable property in autonomous agents. Learning reusable behaviours can equip the agent with this property, allowing it to adapt them when performing various real-world tasks. Our approach for learning these behaviours is composed of three modules, operating in two separate timescales and it uses a hierarchical model with both discrete and continuous variables. This modular structure allows an independent training process for each stage. These stages are organized using a two-level temporal hierarchy. The first level contains the planner, responsible for issuing the skills that should be executed, while the second level executes the skill. In this latter level, to achieve the desired skill behaviour, the discrete skill is converted to a continuous vector that contains information regarding which environment change must occur. With this approach, we aimed to solve long-horizon sequential tasks with delayed rewards. Contrary to existing work, our method uses both variable types to allow an agent to learn high-level behaviours consisting of an interpretable set of skills. This method allows to compose the discrete skills easily, while keeping the flexibility, provided by the continuous representations, to execute them in several different ways. Using a 2D scenario where the agent has to catch a set of objects in a specific order, we demonstrate that our approach is scalable to scenarios with increasingly longer tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Ajay, A., Kumar, A., Agrawal, P., Levine, S., Nachum, O.: Opal: offline primitive discovery for accelerating offline reinforcement learning. In: International Conference on Learning Representations (ICLR) (2021)
Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. In: International Conference on Learning Representations (ICLR) (2017)
Hakhamaneshi, K., Zhao, R., Zhan, A., Abbeel, P., Laskin, M.: Hierarchical few-shot imitation with skill transition models. In: International Conference on Learning Representations (ICLR) (2022)
Kipf, T., et al.: Compile: compositional imitation learning and execution. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 3418–3428. PMLR, 09–15 Jun 2019
Kroemer, O., Niekum, S., Konidaris, G.: A review of robot learning for manipulation: challenges, representations, and algorithms. J. Mach. Learn. Res. 22(30), 1–82 (2021)
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191. Curran Associates, Inc. (2020)
Lynch, C., et al.: Learning latent plans from play. In: 3rd Conference on Robot Learning (2019)
Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 3303–3313. Curran Associates, Inc. (2018)
Pertsch, K., Lee, Y., Wu, Y., Lim, J.J.: Demonstration-guided reinforcement learning with learned skills. In: 5th Conference on Robot Learning (2021)
Rao, D., et al.: Learning transferable motor skills with hierarchical latent mixture policies. In: International Conference on Learning Representations (ICLR) (2022)
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 15, pp. 627–635. PMLR, Fort Lauderdale, FL, USA, 11–13 Apr 2011
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Shankar, T., Gupta, A.: Learning robot skills with temporal variational inference. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 8624–8633. PMLR (2020)
Świechowski, M., Godlewski, K., Sawicki, B., Mańdziuk, J.: Monte Carlo tree search: a review of recent modifications and applications. arXiv preprint arXiv:2103.04931 (2021)
Tanneberg, D., Ploeger, K., Rueckert, E., Peters, J.: SKID raw: skill discovery from raw trajectories. IEEE Robot. Autom. Lett. 6(3), 4696–4703 (2021)
Wulfmeier, M., et al.: Compositional transfer in hierarchical reinforcement learning. In: Proceedings of Robotics: Science and Systems (2020)
Wulfmeier, M., et al.: Data-efficient hindsight off-policy option learning. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 11340–11350. PMLR, 18–24 Jul 2021
Zhang, J., Yu, H., Xu, W.: Hierarchical reinforcement learning by discovering intrinsic options. In: International Conference on Learning Representations (ICLR) (2021)
Zhu, Y., Stone, P., Zhu, Y.: Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation. IEEE Robot. Autom. Lett. 7(2), 4126–4133 (2022). https://doi.org/10.1109/LRA.2022.3146589
Acknowledgement
This research was developed in the scope of the PhD grant[2020.05789.BD], funded by FCT - Foundation for Science and Technology. This study was also supported by IEETA - Institute of Electronics and Informatics Engineering of Aveiro, funded by National Funds through the FCT - Foundation for Science and Technology, in the context of the project [UIDB/00127/2020].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Alves, J., Lau, N., Silva, F. (2022). Skill Learning for Long-Horizon Sequential Tasks. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds) Progress in Artificial Intelligence. EPIA 2022. Lecture Notes in Computer Science(), vol 13566. Springer, Cham. https://doi.org/10.1007/978-3-031-16474-3_58
Download citation
DOI: https://doi.org/10.1007/978-3-031-16474-3_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16473-6
Online ISBN: 978-3-031-16474-3
eBook Packages: Computer ScienceComputer Science (R0)