Imitation Learning of Long-Horizon Manipulation Tasks Through Temporal Sub-action Sequencing

Singh, Niharika; Dutta, Samrat; Jain, Aditya; Prakash, Ravi; Majumder, Anima; Sinha, Rajesh; Behera, Laxmidhar; Sandhan, Tushar

doi:10.1007/978-3-031-58174-8_30

Niharika Singh¹⁰,
Samrat Dutta¹¹,
Aditya Jain¹¹,
Ravi Prakash¹⁰,
Anima Majumder¹¹,
Rajesh Sinha¹¹,
Laxmidhar Behera¹⁰ &
…
Tushar Sandhan¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2010))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

311 Accesses

Abstract

This research proposes an approach to long-horizon manipulation which uses video and kinesthetic demonstrations to imitate human actions. The task learning process involves two stages. To learn the sequence of the sub-actions in the video demonstration, the Task Sequencing Network (TSNet) - a hybrid neural network made up of Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Connectionist Temporal Classification (CTC) loss, is used in the first stage. Through dynamic movement primitive (DMP) models, task-agnostic task primitives are learned in the second stage via kinesthetic demonstrations. To encode the semantic relationship between the sub-actions and the objects, a Multi-relational Embedding Network (MRE) using YOLOv4 for object detection is used to estimate the affordances associated with the objects in the scene. For tasks like liquid pouring, table cleaning and object placement, the proposed imitation learning approach learns task planning and execution in a decoupled manner, resulting in effective sub-action sequencing and quicker and more precise learning of sub-action execution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Learn to Grasp Unknown-Adjacent Objects for Sequential Robotic Manipulation

Article 01 August 2022

Video driven adaptive grasp planning of virtual hand using deep reinforcement learning

Article 09 November 2022

Notes

1.
tasks like table cleaning, water pouring, table arrangement that involve multiple and precise object manipulations over a long time span.
2.
tasks are the combinations of multiple task primitives.

References

Abdulla, W.: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow (2017). https://github.com/matterport/Mask_RCNN
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009). https://doi.org/10.1016/j.robot.2008.10.024
Article Google Scholar
Behera, L., Kumar, S., Patchaikani, P.K., Nair, R.R., Dutta, S.: Intelligent Control of Robotic Systems. CRC Press, Boca Raton (2020)
Book Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 [cs.CV] (2020)
Chella, A., Dindo, H., Infantino, I.: A cognitive framework for imitation learning. Robot. Auton. Syst. 54(5), 403–408 (2006). https://doi.org/10.1016/j.robot.2006.01.008
Article Google Scholar
Daruna, A., Liu, W., Kira, Z., Chernova, S.: RoboCSE: robot common sense embedding. arXiv preprint arXiv:1903.00412 [cs.RO] (2019). https://doi.org/10.48550/ARXIV.1903.00412
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Huang, D.-A., Fei-Fei, L., Niebles, J.C.: Connectionist temporal modeling for weakly supervised action labeling. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 137–153. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_9
Chapter Google Scholar
Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 1–35 (2017)
Article Google Scholar
Jiang, C., Dehghan, M., Jagersand, M.: Understanding contexts inside robot and human manipulation tasks through a vision-language model and ontology system in a video stream. arXiv preprint arXiv:2003.01163 [cs.CV] (2020)
Kumar, A., Behera, L.: Semi supervised deep quick instance detection and segmentation. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8325–8331. IEEE (2019)
Google Scholar
Lin, M., Inoue, N., Shinoda, K.: CTC network with statistical language modeling for action sequence recognition in videos. In: Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pp. 393–401 (2017)
Google Scholar
Liu, H., Wu, Y., Yang, Y.: Analogical inference for multi-relational embeddings. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 2168–2178. PMLR (2017)
Google Scholar
Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs. Proc. IEEE 104(1), 11–33 (2016)
Article Google Scholar
Ramirez-Amaro, K., Dean-Leon, E., Cheng, G.: Robust semantic representations for inferring human co-manipulation activities even with different demonstration styles. In: 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pp. 1141–1146 (2015). https://doi.org/10.1109/HUMANOIDS.2015.7363496
Sharma, P., Mohan, L., Pinto, L., Gupta, A.: Multiple interactions made easy (mime): large scale demonstrations data for imitation. arXiv preprint arXiv:1810.07121 (2018)
Sharma, P., Pathak, D., Gupta, A.: Third-person visual imitation learning via decoupled hierarchical controller. arXiv preprint arXiv:1911.09676 (2019). https://doi.org/10.48550/ARXIV.1911.09676
Shiarlis, K., Wulfmeier, M., Salter, S., Whiteson, S., Posner, I.: TACO: learning task decomposition via temporal alignment for control. In: International Conference on Machine Learning, pp. 4654–4663. PMLR (2018)
Google Scholar
Smith, L., Dhawan, N., Zhang, M., Abbeel, P., Levine, S.: AVID: learning multi-stage tasks via pixel-level translation of human videos. arXiv preprint arXiv:1912.04443 (2019). https://doi.org/10.48550/ARXIV.1912.04443
Solutions, R.R.M.: RMS - 26" yellow grabber reacher with rotating head (2021). https://www.myrmsstore.com/collections/reachers-grabbers/products/26-yellow-grabber-reacher-with-rotating-head
Tomasello, M., Savage-Rumbaugh, S., Kruger, A.C.: Imitative learning of actions on objects by children, chimpanzees, and enculturated chimpanzees. Child Dev. 64(6), 1688–1705 (1993)
Article Google Scholar
Ude, A., Nemec, B., Petri?, T., Morimoto, J.: Orientation in cartesian space dynamic movement primitives. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2997–3004 (2014). https://doi.org/10.1109/ICRA.2014.6907291
Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743 (2017). https://doi.org/10.1109/TKDE.2017.2754499
Article Google Scholar
Yang, Y., Li, Y., Fermuller, C., Aloimonos, Y.: Robot learning manipulation action plans by “watching” unconstrained videos from the world wide web. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, no. 1 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
Niharika Singh, Ravi Prakash, Laxmidhar Behera & Tushar Sandhan
TCS Research, Bengaluru, Karnataka, India
Samrat Dutta, Aditya Jain, Anima Majumder & Rajesh Sinha

Authors

Niharika Singh
View author publications
You can also search for this author in PubMed Google Scholar
Samrat Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Jain
View author publications
You can also search for this author in PubMed Google Scholar
Ravi Prakash
View author publications
You can also search for this author in PubMed Google Scholar
Anima Majumder
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Laxmidhar Behera
View author publications
You can also search for this author in PubMed Google Scholar
Tushar Sandhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tushar Sandhan .

Editor information

Editors and Affiliations

Indian Institute of Technology, Jammu, India
Harkeerat Kaur
Indian Institute of Technology, Jammu, India
Vinit Jakhetiya
Indian Institute of Technology, Ropar, India
Puneet Goyal
Indian Institute of Information Technology, Jabalpur, India
Pritee Khanna
Indian Institute of Technology, Roorkee, Uttarakhand, India
Balasubramanian Raman
Indian Institute of Technology, Roorkee, Uttarakhand, India
Sanjeev Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, N. et al. (2024). Imitation Learning of Long-Horizon Manipulation Tasks Through Temporal Sub-action Sequencing. In: Kaur, H., Jakhetiya, V., Goyal, P., Khanna, P., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2023. Communications in Computer and Information Science, vol 2010. Springer, Cham. https://doi.org/10.1007/978-3-031-58174-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-58174-8_30
Published: 03 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58173-1
Online ISBN: 978-3-031-58174-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics