Abstract
Recent advances in large language models have highlighted their potential to encode massive amounts of semantic knowledge for long-term autonomous decision-making, positioning them as a promising solution for powering the cognitive capabilities of future home-assistant robots. However, while large language models can provide high-level decision, there is still no unified paradigm for integrating them with robots’ perception and low-level action. In this paper, we propose a framework centered around a large language model, integrated with visual perception and motion planning modules, to investigate the robotic grasping task. Unlike traditional methods that only focus on generating stable grasps, our proposed approach can handle personalized user instructions and perform tasks more effectively in home scenarios. Our approach integrates existing state-of-the-art models in a simple and effective way, without requiring any fine-tuning, which makes it low-cost and easy to deploy. Experiments on a physical robot system demonstrate the feasibility of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tamei, T., Matsubara, T., Rai, A., Shibata, T.: Reinforcement learning of clothing assistance with a dual-arm robot. In: 2011 11th IEEE-RAS International Conference on Humanoid Robots, pp. 733–738. IEEE (2011)
Lew, T., et al.: Robotic table wiping via reinforcement learning and whole-body trajectory optimization. arXiv preprint arXiv:2210.10865 (2022)
Wu, J., et al.: Tidybot: personalized robot assistance with large language models. arXiv preprint arXiv:2305.05658 (2023)
Miller, A.T., Allen, P.K.: Graspit! a versatile simulator for robotic grasping. IEEE Robot. Autom. Mag. 11(4), 110–122 (2004)
Kehoe, B., Matsukawa, A., Candido, S., Kuffner, J., Goldberg, K.: Cloud-based robot grasping with the google object recognition engine. In: 2013 IEEE International Conference on Robotics and Automation, pp. 4263–4270. IEEE (2013)
Mahler, J., et al.: Dex-Net 1.0: a cloud-based network of 3D objects for robust grasp planning using a multi-armed bandit model with correlated rewards. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1957–1964. IEEE (2016)
Kuffner, J.J., LaValle, S.M.: RRT-connect: an efficient approach to single-query path planning. In: Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), vol. 2, pp. 995–1001. IEEE (2000)
Gammell, J.D., Srinivasa, S.S., Barfoot, T.D.: Informed RRT: optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2997–3004. IEEE (2014)
Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 769–776. IEEE (2017)
Fang, H.S., Wang, C., Gou, M., Lu, C.: Graspnet-1billion: a large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11444–11453 (2020)
Eppner, C., Mousavian, A., Fox, D.: Acronym: a large-scale grasp dataset based on simulation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 6222–6227. IEEE (2021)
Morrison, D., Corke, P., Leitner, J.: Learning robust, real-time, reactive robotic grasping. Int. J. Robot. Res. 39(2–3), 183–201 (2020)
Mousavian, A., Eppner, C., Fox, D.: 6-DOF GraspNet: variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2901–2910 (2019)
Fang, H.S., et al.: Anygrasp: robust and efficient grasp perception in spatial and temporal domains. arXiv preprint arXiv:2212.08333 (2022)
Stepputtis, S., Campbell, J., Phielipp, M., Lee, S., Baral, C., Ben Amor, H.: Language-conditioned imitation learning for robot manipulation tasks. Adv. Neural. Inf. Process. Syst. 33, 13139–13150 (2020)
Shao, L., Migimatsu, T., Zhang, Q., Yang, K., Bohg, J.: Concept2robot: learning manipulation concepts from instructions and human demonstrations. Int. J. Robot. Res. 40(12–14), 1419–1434 (2021)
Shridhar, M., Manuelli, L., Fox, D.: Cliport: what and where pathways for robotic manipulation. In: Conference on Robot Learning, pp. 894–906. PMLR (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Ouyang, L., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)
Ahn, M., et al.: Do as i can, not as i say: grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022)
Lin, K., Agia, C., Migimatsu, T., Pavone, M., Bohg, J.: Text2motion: from natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153 (2023)
Vemprala, S., Bonatti, R., Bucker, A., Kapoor, A.: ChatGPT for robotics: design principles and model abilities (2023)
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)
Sucan, I.A., Moll, M., Kavraki, L.E.: The open motion planning library. IEEE Robot. Autom. Mag. 19(4), 72–82 (2012). https://doi.org/10.1109/MRA.2012.2205651
Acknowledgement
This research was supported by Zhejiang Provincial Natural Science Foundation of China Grant No. LQ23F030009 and supported by Key Research Project of Zhejiang Lab (No. G2021NB0AL03).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liao, J. et al. (2023). Decision-Making in Robotic Grasping with Large Language Models. In: Yang, H., et al. Intelligent Robotics and Applications. ICIRA 2023. Lecture Notes in Computer Science(), vol 14271. Springer, Singapore. https://doi.org/10.1007/978-981-99-6495-6_36
Download citation
DOI: https://doi.org/10.1007/978-981-99-6495-6_36
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6494-9
Online ISBN: 978-981-99-6495-6
eBook Packages: Computer ScienceComputer Science (R0)