Deep latent-space sequential skill chaining from incomplete demonstrations

Kang, Minjae; Oh, Songhwai

doi:10.1007/s11370-021-00409-z

Deep latent-space sequential skill chaining from incomplete demonstrations

Original Research Paper
Published: 01 March 2022

Volume 15, pages 203–213, (2022)
Cite this article

Intelligent Service Robotics Aims and scope Submit manuscript

Minjae Kang¹ &
Songhwai Oh¹

385 Accesses
1 Citation
Explore all metrics

Abstract

Imitation learning is a methodology, which trains an agent using demonstrations from skilled experts without external rewards. However, for a complex task with a long horizon, it is challenging to obtain data that exactly match the desired task. In general, humans can easily assign a sequence of simple tasks for performing complex tasks. If a person gives an agent an order of simple tasks to carry out a complex task, we can find a skill sequence efficiently by learning the corresponding skills. However, independently trained low-level skills (simple tasks) are incompatible, so they cannot be performed in sequence without additional refinement. In this context, we propose a method to create a skill chain by connecting independently learned skills. For connecting two consecutive low-level policies, we need to find a new policy defined as a bridge skill. To train a bridge skill, a well-designed reward function is required, but in real world, only sparse rewards can be given according to the success of the overall task. To complement this issue, we introduce a novel latent-distance reward function from fragmented demonstrations. Also, we use binary classifiers to determine whether the current state is capable of performing the skill that follows. As a result, the skill chain formed from incomplete demonstrations can successfully perform complex tasks which require performing multiple skills in a sequence. In the experiment, we solve manipulation tasks with RGBD images as input in the Baxter simulator implemented using MuJoCo. We verify that skill chains can be successfully trained from incomplete data while confirming that the agent can be trained much more efficiently and stably through the proposed latent-distance rewards. Also, we perform block stacking using a real Baxter robot in the simple set-up environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

A Survey on Deep Transfer Learning

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Article 17 August 2020

References

Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence
Bagaria A, Konidaris G (2019) Option discovery using deep skill chaining. In: International Conference on Learning Representations
Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1582–1591
Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, PMLR, pp 1352–1361
Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1856–1865
Ho J, Ermon S (2016) Generative adversarial imitation learning. arxiv:1606.03476
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arxiv:1312.6114
Konidaris G, Barto A (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. Adv Neural Inf Process Syst 22:1015–1023
Google Scholar
Lee G, Kim D, Oh W, et al (2020) Mixgail: Autonomous driving using demonstrations with mixed qualities. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 5425–5430
Lee K, Choi S, Oh S (2018) Maximum causal tsallis entropy imitation learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp 4408–4418
Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arxiv:1509.0297
Nachum O, Gu S, Lee H, et al (2018) Data-efficient hierarchical reinforcement learning. arxiv:1805.08296
Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proc. of the International Conference on Machine Learning
Osband I, Blundell C, Pritzel A et al (2016) Deep exploration via bootstrapped DQN. Adv Neural Inf Process Syst 26:4026–4034
Google Scholar
Pan Y, Cheng CA, Saigol K, et al (2017) Agile autonomous driving using end-to-end deep imitation learning. arxiv:1709.0717
Peng XB, Kanazawa A, Toyer S, et al (2018) Variational discriminator bottleneck: Improving imitation learning, inverse rl, and gans by constraining information flow. arxiv:1810.00821
Peng XB, Coumans E, Zhang T, et al (2020) Learning agile robotic locomotion skills by imitating animals. Robotics: Science and Systems (RSS)
Pomerleau DA (1989) Alvinn: An autonomous land vehicle in a neural network. In: Advances in neural information processing systems
Ratliff N, Bagnell JA, Srinivasa SS (2007) Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, IEEE, pp 392–397
Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. CoRR
Vezhnevets AS, Osindero S, Schaul T, et al (2017) Feudal networks for hierarchical reinforcement learning. In: International Conference on Machine Learning, PMLR, pp 3540–3549
Xie F, Chowdhury A, Kaluza M, et al (2020) Deep imitation learning for bimanual robotic manipulation. arxiv:2010.0513
Zhang T, McCarthy Z, Jow O, et al (2018) Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5628–5635

Download references

Acknowledgements

This work was supported in part by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01371, Development of Brain-Inspired AI with Human-Like Intelligence), and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017R1A2B2006136).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering and ASRI, Seoul National University, Seoul, 08826, Korea
Minjae Kang & Songhwai Oh

Authors

Minjae Kang
View author publications
You can also search for this author in PubMed Google Scholar
Songhwai Oh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Songhwai Oh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, M., Oh, S. Deep latent-space sequential skill chaining from incomplete demonstrations. Intel Serv Robotics 15, 203–213 (2022). https://doi.org/10.1007/s11370-021-00409-z

Download citation

Received: 22 July 2021
Accepted: 30 December 2021
Published: 01 March 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11370-021-00409-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep latent-space sequential skill chaining from incomplete demonstrations

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

A Survey on Deep Transfer Learning

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep latent-space sequential skill chaining from incomplete demonstrations

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

A Survey on Deep Transfer Learning

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation