Decision-Making in Robotic Grasping with Large Language Models

Liao, Jianfeng; Zhang, Haoyang; Qian, Haofu; Meng, Qiwei; Sun, Yinan; Sun, Yao; Song, Wei; Zhu, Shiqiang; Gu, Jason

doi:10.1007/978-981-99-6495-6_36

Jianfeng Liao^15,16,
Haoyang Zhang¹⁷,
Haofu Qian¹⁷,
Qiwei Meng^15,16,
Yinan Sun^15,16,
Yao Sun^15,16,
Wei Song^15,16,
Shiqiang Zhu^15,16 &
…
Jason Gu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14271))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

1209 Accesses

Abstract

Recent advances in large language models have highlighted their potential to encode massive amounts of semantic knowledge for long-term autonomous decision-making, positioning them as a promising solution for powering the cognitive capabilities of future home-assistant robots. However, while large language models can provide high-level decision, there is still no unified paradigm for integrating them with robots’ perception and low-level action. In this paper, we propose a framework centered around a large language model, integrated with visual perception and motion planning modules, to investigate the robotic grasping task. Unlike traditional methods that only focus on generating stable grasps, our proposed approach can handle personalized user instructions and perform tasks more effectively in home scenarios. Our approach integrates existing state-of-the-art models in a simple and effective way, without requiring any fine-tuning, which makes it low-cost and easy to deploy. Experiments on a physical robot system demonstrate the feasibility of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tamei, T., Matsubara, T., Rai, A., Shibata, T.: Reinforcement learning of clothing assistance with a dual-arm robot. In: 2011 11th IEEE-RAS International Conference on Humanoid Robots, pp. 733–738. IEEE (2011)
Google Scholar
Lew, T., et al.: Robotic table wiping via reinforcement learning and whole-body trajectory optimization. arXiv preprint arXiv:2210.10865 (2022)
Wu, J., et al.: Tidybot: personalized robot assistance with large language models. arXiv preprint arXiv:2305.05658 (2023)
Miller, A.T., Allen, P.K.: Graspit! a versatile simulator for robotic grasping. IEEE Robot. Autom. Mag. 11(4), 110–122 (2004)
Article Google Scholar
Kehoe, B., Matsukawa, A., Candido, S., Kuffner, J., Goldberg, K.: Cloud-based robot grasping with the google object recognition engine. In: 2013 IEEE International Conference on Robotics and Automation, pp. 4263–4270. IEEE (2013)
Google Scholar
Mahler, J., et al.: Dex-Net 1.0: a cloud-based network of 3D objects for robust grasp planning using a multi-armed bandit model with correlated rewards. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1957–1964. IEEE (2016)
Google Scholar
Kuffner, J.J., LaValle, S.M.: RRT-connect: an efficient approach to single-query path planning. In: Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), vol. 2, pp. 995–1001. IEEE (2000)
Google Scholar
Gammell, J.D., Srinivasa, S.S., Barfoot, T.D.: Informed RRT: optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2997–3004. IEEE (2014)
Google Scholar
Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 769–776. IEEE (2017)
Google Scholar
Fang, H.S., Wang, C., Gou, M., Lu, C.: Graspnet-1billion: a large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11444–11453 (2020)
Google Scholar
Eppner, C., Mousavian, A., Fox, D.: Acronym: a large-scale grasp dataset based on simulation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 6222–6227. IEEE (2021)
Google Scholar
Morrison, D., Corke, P., Leitner, J.: Learning robust, real-time, reactive robotic grasping. Int. J. Robot. Res. 39(2–3), 183–201 (2020)
Article Google Scholar
Mousavian, A., Eppner, C., Fox, D.: 6-DOF GraspNet: variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2901–2910 (2019)
Google Scholar
Fang, H.S., et al.: Anygrasp: robust and efficient grasp perception in spatial and temporal domains. arXiv preprint arXiv:2212.08333 (2022)
Stepputtis, S., Campbell, J., Phielipp, M., Lee, S., Baral, C., Ben Amor, H.: Language-conditioned imitation learning for robot manipulation tasks. Adv. Neural. Inf. Process. Syst. 33, 13139–13150 (2020)
Google Scholar
Shao, L., Migimatsu, T., Zhang, Q., Yang, K., Bohg, J.: Concept2robot: learning manipulation concepts from instructions and human demonstrations. Int. J. Robot. Res. 40(12–14), 1419–1434 (2021)
Article Google Scholar
Shridhar, M., Manuelli, L., Fox, D.: Cliport: what and where pathways for robotic manipulation. In: Conference on Robot Learning, pp. 894–906. PMLR (2022)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Ouyang, L., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)
Google Scholar
Ahn, M., et al.: Do as i can, not as i say: grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022)
Lin, K., Agia, C., Migimatsu, T., Pavone, M., Bohg, J.: Text2motion: from natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153 (2023)
Vemprala, S., Bonatti, R., Bucker, A., Kapoor, A.: ChatGPT for robotics: design principles and model abilities (2023)
Google Scholar
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)
Google Scholar
Sucan, I.A., Moll, M., Kavraki, L.E.: The open motion planning library. IEEE Robot. Autom. Mag. 19(4), 72–82 (2012). https://doi.org/10.1109/MRA.2012.2205651
Article Google Scholar

Download references

Acknowledgement

This research was supported by Zhejiang Provincial Natural Science Foundation of China Grant No. LQ23F030009 and supported by Key Research Project of Zhejiang Lab (No. G2021NB0AL03).

Author information

Authors and Affiliations

Research Center for Intelligent Robotics, Zhejiang Lab, Hangzhou, China
Jianfeng Liao, Qiwei Meng, Yinan Sun, Yao Sun, Wei Song & Shiqiang Zhu
Zhejiang Engineering Research Center for Intelligent Robotics, Hangzhou, China
Jianfeng Liao, Qiwei Meng, Yinan Sun, Yao Sun, Wei Song & Shiqiang Zhu
Zhejiang University, Hangzhou, China
Haoyang Zhang & Haofu Qian
Department of Electrical and Computer Engineering, Dalhousie University, Halifax, Canada
Jason Gu

Authors

Jianfeng Liao
View author publications
You can also search for this author in PubMed Google Scholar
Haoyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haofu Qian
View author publications
You can also search for this author in PubMed Google Scholar
Qiwei Meng
View author publications
You can also search for this author in PubMed Google Scholar
Yinan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Wei Song
View author publications
You can also search for this author in PubMed Google Scholar
Shiqiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jason Gu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Song .

Editor information

Editors and Affiliations

Zhejiang University, Hangzhou, China
Huayong Yang
Harbin Institute of Technology, Shenzhen, China
Honghai Liu
Zhejiang University, Hangzhou, China
Jun Zou
Huazhong University of Science and Technology, Wuhan, China
Zhouping Yin
Shenyang Institute of Automation, Shenyang, Liaoning, China
Lianqing Liu
Zhejiang University, Hangzhou, China
Geng Yang
Zhejiang University, Hangzhou, China
Xiaoping Ouyang
Harbin Institute of Technology, Shenzhen, China
Zhiyong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liao, J. et al. (2023). Decision-Making in Robotic Grasping with Large Language Models. In: Yang, H., et al. Intelligent Robotics and Applications. ICIRA 2023. Lecture Notes in Computer Science(), vol 14271. Springer, Singapore. https://doi.org/10.1007/978-981-99-6495-6_36

Download citation

DOI: https://doi.org/10.1007/978-981-99-6495-6_36
Published: 16 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6494-9
Online ISBN: 978-981-99-6495-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Decision-Making in Robotic Grasping with Large Language Models