Abstract
Imitation learning is a methodology, which trains an agent using demonstrations from skilled experts without external rewards. However, for a complex task with a long horizon, it is challenging to obtain data that exactly match the desired task. In general, humans can easily assign a sequence of simple tasks for performing complex tasks. If a person gives an agent an order of simple tasks to carry out a complex task, we can find a skill sequence efficiently by learning the corresponding skills. However, independently trained low-level skills (simple tasks) are incompatible, so they cannot be performed in sequence without additional refinement. In this context, we propose a method to create a skill chain by connecting independently learned skills. For connecting two consecutive low-level policies, we need to find a new policy defined as a bridge skill. To train a bridge skill, a well-designed reward function is required, but in real world, only sparse rewards can be given according to the success of the overall task. To complement this issue, we introduce a novel latent-distance reward function from fragmented demonstrations. Also, we use binary classifiers to determine whether the current state is capable of performing the skill that follows. As a result, the skill chain formed from incomplete demonstrations can successfully perform complex tasks which require performing multiple skills in a sequence. In the experiment, we solve manipulation tasks with RGBD images as input in the Baxter simulator implemented using MuJoCo. We verify that skill chains can be successfully trained from incomplete data while confirming that the agent can be trained much more efficiently and stably through the proposed latent-distance rewards. Also, we perform block stacking using a real Baxter robot in the simple set-up environment.
Similar content being viewed by others
References
Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence
Bagaria A, Konidaris G (2019) Option discovery using deep skill chaining. In: International Conference on Learning Representations
Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1582–1591
Haarnoja T, Tang H, Abbeel P, et al (2017) Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, PMLR, pp 1352–1361
Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 1856–1865
Ho J, Ermon S (2016) Generative adversarial imitation learning. arxiv:1606.03476
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arxiv:1312.6114
Konidaris G, Barto A (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. Adv Neural Inf Process Syst 22:1015–1023
Lee G, Kim D, Oh W, et al (2020) Mixgail: Autonomous driving using demonstrations with mixed qualities. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 5425–5430
Lee K, Choi S, Oh S (2018) Maximum causal tsallis entropy imitation learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp 4408–4418
Lillicrap TP, Hunt JJ, Pritzel A, et al (2015) Continuous control with deep reinforcement learning. arxiv:1509.0297
Nachum O, Gu S, Lee H, et al (2018) Data-efficient hierarchical reinforcement learning. arxiv:1805.08296
Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proc. of the International Conference on Machine Learning
Osband I, Blundell C, Pritzel A et al (2016) Deep exploration via bootstrapped DQN. Adv Neural Inf Process Syst 26:4026–4034
Pan Y, Cheng CA, Saigol K, et al (2017) Agile autonomous driving using end-to-end deep imitation learning. arxiv:1709.0717
Peng XB, Kanazawa A, Toyer S, et al (2018) Variational discriminator bottleneck: Improving imitation learning, inverse rl, and gans by constraining information flow. arxiv:1810.00821
Peng XB, Coumans E, Zhang T, et al (2020) Learning agile robotic locomotion skills by imitating animals. Robotics: Science and Systems (RSS)
Pomerleau DA (1989) Alvinn: An autonomous land vehicle in a neural network. In: Advances in neural information processing systems
Ratliff N, Bagnell JA, Srinivasa SS (2007) Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, IEEE, pp 392–397
Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. CoRR
Vezhnevets AS, Osindero S, Schaul T, et al (2017) Feudal networks for hierarchical reinforcement learning. In: International Conference on Machine Learning, PMLR, pp 3540–3549
Xie F, Chowdhury A, Kaluza M, et al (2020) Deep imitation learning for bimanual robotic manipulation. arxiv:2010.0513
Zhang T, McCarthy Z, Jow O, et al (2018) Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5628–5635
Acknowledgements
This work was supported in part by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01371, Development of Brain-Inspired AI with Human-Like Intelligence), and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017R1A2B2006136).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kang, M., Oh, S. Deep latent-space sequential skill chaining from incomplete demonstrations. Intel Serv Robotics 15, 203–213 (2022). https://doi.org/10.1007/s11370-021-00409-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-021-00409-z