Skip to main content

Decision-Making in Robotic Grasping with Large Language Models

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14271))

Included in the following conference series:

  • 1209 Accesses

Abstract

Recent advances in large language models have highlighted their potential to encode massive amounts of semantic knowledge for long-term autonomous decision-making, positioning them as a promising solution for powering the cognitive capabilities of future home-assistant robots. However, while large language models can provide high-level decision, there is still no unified paradigm for integrating them with robots’ perception and low-level action. In this paper, we propose a framework centered around a large language model, integrated with visual perception and motion planning modules, to investigate the robotic grasping task. Unlike traditional methods that only focus on generating stable grasps, our proposed approach can handle personalized user instructions and perform tasks more effectively in home scenarios. Our approach integrates existing state-of-the-art models in a simple and effective way, without requiring any fine-tuning, which makes it low-cost and easy to deploy. Experiments on a physical robot system demonstrate the feasibility of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tamei, T., Matsubara, T., Rai, A., Shibata, T.: Reinforcement learning of clothing assistance with a dual-arm robot. In: 2011 11th IEEE-RAS International Conference on Humanoid Robots, pp. 733–738. IEEE (2011)

    Google Scholar 

  2. Lew, T., et al.: Robotic table wiping via reinforcement learning and whole-body trajectory optimization. arXiv preprint arXiv:2210.10865 (2022)

  3. Wu, J., et al.: Tidybot: personalized robot assistance with large language models. arXiv preprint arXiv:2305.05658 (2023)

  4. Miller, A.T., Allen, P.K.: Graspit! a versatile simulator for robotic grasping. IEEE Robot. Autom. Mag. 11(4), 110–122 (2004)

    Article  Google Scholar 

  5. Kehoe, B., Matsukawa, A., Candido, S., Kuffner, J., Goldberg, K.: Cloud-based robot grasping with the google object recognition engine. In: 2013 IEEE International Conference on Robotics and Automation, pp. 4263–4270. IEEE (2013)

    Google Scholar 

  6. Mahler, J., et al.: Dex-Net 1.0: a cloud-based network of 3D objects for robust grasp planning using a multi-armed bandit model with correlated rewards. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1957–1964. IEEE (2016)

    Google Scholar 

  7. Kuffner, J.J., LaValle, S.M.: RRT-connect: an efficient approach to single-query path planning. In: Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), vol. 2, pp. 995–1001. IEEE (2000)

    Google Scholar 

  8. Gammell, J.D., Srinivasa, S.S., Barfoot, T.D.: Informed RRT: optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2997–3004. IEEE (2014)

    Google Scholar 

  9. Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 769–776. IEEE (2017)

    Google Scholar 

  10. Fang, H.S., Wang, C., Gou, M., Lu, C.: Graspnet-1billion: a large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11444–11453 (2020)

    Google Scholar 

  11. Eppner, C., Mousavian, A., Fox, D.: Acronym: a large-scale grasp dataset based on simulation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 6222–6227. IEEE (2021)

    Google Scholar 

  12. Morrison, D., Corke, P., Leitner, J.: Learning robust, real-time, reactive robotic grasping. Int. J. Robot. Res. 39(2–3), 183–201 (2020)

    Article  Google Scholar 

  13. Mousavian, A., Eppner, C., Fox, D.: 6-DOF GraspNet: variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2901–2910 (2019)

    Google Scholar 

  14. Fang, H.S., et al.: Anygrasp: robust and efficient grasp perception in spatial and temporal domains. arXiv preprint arXiv:2212.08333 (2022)

  15. Stepputtis, S., Campbell, J., Phielipp, M., Lee, S., Baral, C., Ben Amor, H.: Language-conditioned imitation learning for robot manipulation tasks. Adv. Neural. Inf. Process. Syst. 33, 13139–13150 (2020)

    Google Scholar 

  16. Shao, L., Migimatsu, T., Zhang, Q., Yang, K., Bohg, J.: Concept2robot: learning manipulation concepts from instructions and human demonstrations. Int. J. Robot. Res. 40(12–14), 1419–1434 (2021)

    Article  Google Scholar 

  17. Shridhar, M., Manuelli, L., Fox, D.: Cliport: what and where pathways for robotic manipulation. In: Conference on Robot Learning, pp. 894–906. PMLR (2022)

    Google Scholar 

  18. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  19. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  20. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  21. Ouyang, L., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)

    Google Scholar 

  22. Ahn, M., et al.: Do as i can, not as i say: grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022)

  23. Lin, K., Agia, C., Migimatsu, T., Pavone, M., Bohg, J.: Text2motion: from natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153 (2023)

  24. Vemprala, S., Bonatti, R., Bucker, A., Kapoor, A.: ChatGPT for robotics: design principles and model abilities (2023)

    Google Scholar 

  25. Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)

    Google Scholar 

  26. Sucan, I.A., Moll, M., Kavraki, L.E.: The open motion planning library. IEEE Robot. Autom. Mag. 19(4), 72–82 (2012). https://doi.org/10.1109/MRA.2012.2205651

    Article  Google Scholar 

Download references

Acknowledgement

This research was supported by Zhejiang Provincial Natural Science Foundation of China Grant No. LQ23F030009 and supported by Key Research Project of Zhejiang Lab (No. G2021NB0AL03).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liao, J. et al. (2023). Decision-Making in Robotic Grasping with Large Language Models. In: Yang, H., et al. Intelligent Robotics and Applications. ICIRA 2023. Lecture Notes in Computer Science(), vol 14271. Springer, Singapore. https://doi.org/10.1007/978-981-99-6495-6_36

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-6495-6_36

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-6494-9

  • Online ISBN: 978-981-99-6495-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics