Skip to main content
Log in

Deep latent-space sequential skill chaining from incomplete demonstrations

  • Original Research Paper
  • Published:
Intelligent Service Robotics Aims and scope Submit manuscript

Abstract

Imitation learning is a methodology, which trains an agent using demonstrations from skilled experts without external rewards. However, for a complex task with a long horizon, it is challenging to obtain data that exactly match the desired task. In general, humans can easily assign a sequence of simple tasks for performing complex tasks. If a person gives an agent an order of simple tasks to carry out a complex task, we can find a skill sequence efficiently by learning the corresponding skills. However, independently trained low-level skills (simple tasks) are incompatible, so they cannot be performed in sequence without additional refinement. In this context, we propose a method to create a skill chain by connecting independently learned skills. For connecting two consecutive low-level policies, we need to find a new policy defined as a bridge skill. To train a bridge skill, a well-designed reward function is required, but in real world, only sparse rewards can be given according to the success of the overall task. To complement this issue, we introduce a novel latent-distance reward function from fragmented demonstrations. Also, we use binary classifiers to determine whether the current state is capable of performing the skill that follows. As a result, the skill chain formed from incomplete demonstrations can successfully perform complex tasks which require performing multiple skills in a sequence. In the experiment, we solve manipulation tasks with RGBD images as input in the Baxter simulator implemented using MuJoCo. We verify that skill chains can be successfully trained from incomplete data while confirming that the agent can be trained much more efficiently and stably through the proposed latent-distance rewards. Also, we perform block stacking using a real Baxter robot in the simple set-up environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence

  2. Bagaria A, Konidaris G (2019) Option discovery using deep skill chaining. In: International Conference on Learning Representations

  3. Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1582–1591

  4. Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, PMLR, pp 1352–1361

  5. Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1856–1865

  6. Ho J, Ermon S (2016) Generative adversarial imitation learning. arxiv:1606.03476

  7. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arxiv:1312.6114

  8. Konidaris G, Barto A (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. Adv Neural Inf Process Syst 22:1015–1023

    Google Scholar 

  9. Lee G, Kim D, Oh W, et al (2020) Mixgail: Autonomous driving using demonstrations with mixed qualities. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 5425–5430

  10. Lee K, Choi S, Oh S (2018) Maximum causal tsallis entropy imitation learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp 4408–4418

  11. Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arxiv:1509.0297

  12. Nachum O, Gu S, Lee H, et al (2018) Data-efficient hierarchical reinforcement learning. arxiv:1805.08296

  13. Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proc. of the International Conference on Machine Learning

  14. Osband I, Blundell C, Pritzel A et al (2016) Deep exploration via bootstrapped DQN. Adv Neural Inf Process Syst 26:4026–4034

    Google Scholar 

  15. Pan Y, Cheng CA, Saigol K, et al (2017) Agile autonomous driving using end-to-end deep imitation learning. arxiv:1709.0717

  16. Peng XB, Kanazawa A, Toyer S, et al (2018) Variational discriminator bottleneck: Improving imitation learning, inverse rl, and gans by constraining information flow. arxiv:1810.00821

  17. Peng XB, Coumans E, Zhang T, et al (2020) Learning agile robotic locomotion skills by imitating animals. Robotics: Science and Systems (RSS)

  18. Pomerleau DA (1989) Alvinn: An autonomous land vehicle in a neural network. In: Advances in neural information processing systems

  19. Ratliff N, Bagnell JA, Srinivasa SS (2007) Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, IEEE, pp 392–397

  20. Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. CoRR

  21. Vezhnevets AS, Osindero S, Schaul T, et al (2017) Feudal networks for hierarchical reinforcement learning. In: International Conference on Machine Learning, PMLR, pp 3540–3549

  22. Xie F, Chowdhury A, Kaluza M, et al (2020) Deep imitation learning for bimanual robotic manipulation. arxiv:2010.0513

  23. Zhang T, McCarthy Z, Jow O, et al (2018) Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5628–5635

Download references

Acknowledgements

This work was supported in part by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01371, Development of Brain-Inspired AI with Human-Like Intelligence), and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017R1A2B2006136).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songhwai Oh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kang, M., Oh, S. Deep latent-space sequential skill chaining from incomplete demonstrations. Intel Serv Robotics 15, 203–213 (2022). https://doi.org/10.1007/s11370-021-00409-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11370-021-00409-z

Keywords

Navigation