Abstract
Grasp generation is a crucial task in robotics, especially in unstructured environments, where robots must identify suitable grasp locations on objects and determine the grasp configuration. Recent advances in deep learning have led to the development of end-to-end models for 6-DOF grasp generation that can learn to directly map from input point clouds to grasp configurations without intermediate processing steps. However, these models often treat all points in a scene equally, leading to suboptimal results in cluttered contexts where meaningfulness distributions are disparate due to occlusion. While attention mechanisms have shown promise in improving the accuracy and efficiency of various tasks in occluded scenes, their effectiveness in improving grasp generation performance is still an active area of research. Inspired by this potential, we explore the power of attention mechanisms in improving grasp generation from 3D point clouds. Building upon the previous work with VoteGrasp 2022, we integrate a wide range of attention modules and compare their effects and characteristics to identify the most successful combination for enhancing grasp generation performance. We also extend VoteGrasp by adding a semantic object classification loss to the loss function, making our method more flexible than existing approaches. Based on the detailed experiments and analysis, our research provides valuable insights into the use of attention mechanisms for 3D point cloud grasp generation, highlighting their potential to improve the accuracy and efficiency of robotic systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The DexYCB and FPHAB datasets are publicly available at https://dex-ycb.github.io/ and https://guiggh.github.io/publications/first-person-hands/ respectively.
Code Availibility
Not applicable.
References
Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Sensor fusion IV: control paradigms and data structures, Spie, pp 586–606 (1992)
Bohg, J., Morales, A., Asfour, T., et al.: Data-driven grasp synthesis-a survey. IEEE Trans. Robot. 30(2), 289–309 (2013)
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), IEEE, pp 60–65 (2005)
Calli, B., Singh, A., Bruce, J., et al.: Yale-cmu-berkeley dataset for robotic manipulation research. Int. J. Robot. Res. 36(3), 261–268 (2017)
Choi, C., Schwarting, W., DelPreto, J., et al.: Learning object grasping for soft robot hands. IEEE Robot. Automat. Lett. 3(3), 2370–2377 (2018)
Chu, F.J., Xu, R., Vela, P.A.: Real-world multiobject, multigrasp detection. IEEE Robot. Automat. Lett. 3(4), 3355–3362 (2018)
Ciocarlie, M., Goldfeder, C., Allen, P.: Dimensionality reduction for hand-independent dexterous robotic grasping. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, pp 3270–3275 (2007)
Deng, H., Birdal, T., Ilic, S.: Ppfnet: Global context aware local features for robust 3d point matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 195–205 (2018)
Dias, A.S., Brites, C., Ascenso, J., et al.: Sift-based homographies for efficient multiview distributed visual sensing. IEEE Sens. J. 15(5), 2643–2656 (2014)
Fang, H.S., Wang, C., Gou, M., et al.: Graspnet-1billion: A large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11,444–11,453 (2020)
Feng, M., Zhang, L., Lin, X., et al.: Point attention network for semantic segmentation of 3d point clouds. Patt. Recognit. 107, 107,446 (2020)
Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3146–3154 (2019)
Gou, M., Fang, H.S., Zhu, Z., et al.: Rgb matters: Learning 7-dof grasp poses on monocular rgbd images. In: 2021 Ieee International Conference on Robotics and Automation (ICRA), IEEE, pp 13,459–13,466 (2021)
He, Y., Huang, H., Fan, H., et al.: Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3003–3013 (2021)
Hoang, D.C., Stork, J.A., Stoyanov, T.: Context-aware grasp generation in cluttered scenes. In: IEEE International Conference on Robotics and Automation (ICRA 2022), Philadelphia, USA, May 23-27, 2022 (2022)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141 (2018a)
Hu, S.M., Cai, J.X., Lai, Y.K.: Semantic labeling and instance segmentation of 3d point clouds using patch context analysis and multiscale processing. IEEE Trans Visualizat. Comput Graph 26(7), 2485–2498 (2018)
Huang, Z., Wang, X., Huang, L., et al.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 603–612 (2019)
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)
Liang, H., Ma, X., Li, S., et al.: Pointnetgpd: Detecting grasp configurations from point sets. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp 3629–3635 (2019)
Mahler, J., Matl, M., Liu, X., et al.: Dex-net 3.0: Computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5620–5627 (2018)
Miller, A.T., Allen, P.K.: Graspit! a versatile simulator for robotic grasping. IEEE Robot. Automat. Mag. 11(4), 110–122 (2004)
Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. arXiv:1804.05172 (2018)
Mousavian, A., Eppner, C., Fox, D.: 6-dof graspnet: Variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2901–2910 (2019)
Muñoz, E., Konishi, Y., Murino, V., et al.: Fast 6d pose estimation for texture-less objects from a single rgb image. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 5623–5630 (2016)
Ni, P., Zhang, W., Zhu, X., et al.: Pointnet++ grasping: learning an end-to-end spatial grasp generation algorithm from sparse point clouds. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 3619–3625 (2020)
Paigwar, A., Erkent, O., Wolf, C., et al.: Attentional pointnet for 3d-object detection in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0 (2019)
Ten Pas, A., Gualtieri, M., Saenko, K., et al.: Grasp pose detection in point clouds. Int. J. Robot. Res. 36(13–14), 1455–1473 (2017)
Qi, C.R., Su, H., Mo, K., et al.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 652–660 (2017a)
Qi, C.R., Yi, L., Su, H., et al.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv Neural Inf. Process. Syst. 30 (2017b)
Qi, C.R., Litany, O., He, K., et al.: Deep hough voting for 3d object detection in point clouds. In: proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9277–9286 (2019)
Redmon, J., Angelova, A.: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1316–1322 (2015)
Shi, Y., Chang, A.X., Wu, Z., et al.: Hierarchy denoising recursive autoencoders for 3d scene layout prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1771–1780 (2019)
Wang, C., Xu, D., Zhu, Y., et al.: Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3343–3352 (2019)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803 (2018)
Woo, S., Park, J., Lee, J.Y., et al.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19 (2018)
Wu, D., Zhuang, Z., Xiang, C., et al.: 6d-vnet: End-to-end 6-dof vehicle pose estimation from monocular rgb images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0 (2019)
Xie, S., Liu, S., Chen, Z., et al.: Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4606–4615 (2018)
Ye, X., Li, J., Huang, H., et al.: 3d recurrent neural networks with context fusion for point cloud semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 403–417 (2018)
Yue, K., Sun, M., Yuan, Y., et al.: Compact generalized non-local network. In: Advances in Neural Information Processing Systems, pp 6510–6519 (2018)
Zeng, A., Yu, K.T., Song, S., et al. Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1386–1383 (2017)
Zhang, W., Xiao, C.: Pcan: 3d attention map learning using contextual information for point cloud based retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12,436–12,445 (2019)
Zhao, H., Jiang, L., Jia, J., et al.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 16,259–16,268 (2021)
Author information
Authors and Affiliations
Contributions
All the authors conceived the research, designed and implemented the algorithm, and drafted the submitted version of the paper.
Corresponding author
Ethics declarations
Conflicts of Interest
The authors declare that they have no conflict of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hoang, DC., Nguyen, AN., Vu, VD. et al. Grasp Configuration Synthesis from 3D Point Clouds with Attention Mechanism. J Intell Robot Syst 109, 71 (2023). https://doi.org/10.1007/s10846-023-02007-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-023-02007-w