GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB–XYZ fusion in dense clutter

Wang, Weiming; Liu, Wenhai; Hu, Jie; Fang, Yi; Shao, Quanquan; Qi, Jin

doi:10.1007/s00138-020-01108-y

GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB–XYZ fusion in dense clutter

Original Paper
Published: 20 August 2020

Volume 31, article number 58, (2020)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Weiming Wang¹,
Wenhai Liu ORCID: orcid.org/0000-0003-0166-7774¹,
Jie Hu¹,
Yi Fang¹,
Quanquan Shao¹ &
…
Jin Qi¹

852 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

Robotic grasping of diverse range of novel objects is a great challenge in dense clutter, which is also critical to many applications. However, current methods are vulnerable to perception uncertainty for dense stacked objects, resulting in limited accuracy of multi-parameter grasp prediction. In this paper, we propose a two-stage grasp detection pipeline including sampling and predicting stages. The first sampling stage applies fully convolutional network to generate grasp proposal regions, which contain potential graspable objects. Among grasp proposal region, the second prediction stage predicts complete grasp parameters based on fusion of RGB–XYZ heightmaps, which are converted from color and depth images. To perceive essential structures of stable grasping, 2D CNN and 3D CNN are used to learn color and geometric features to predict multi-parameter grasp, respectively. The direct mapping from heightmaps to grasp parameters is realized based on a multi-task loss. Experiments on a self-built dataset and an open dataset are conducted to analyze the network performance. The results indicate that the proposed two-stage method achieves the best performance among other grasp detection algorithms. Robotic experiments demonstrate generalization ability and robustness in dense clutter for novel objects, and the proposed method achieves average grasp success rate of 82.4%, which is also better than other state-of-the-art methods. Our self-built dataset and robotic grasping video are available at https://github.com/liuwenhai/toteGrasping.git.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Article 17 August 2020

Guoguang Du, Kai Wang, … Kaiyong Zhao

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

Taha Samavati & Mohsen Soryani

6D object position estimation from 2D images: a literature review

Article Open access 28 November 2022

Giorgia Marullo, Leonardo Tanzi, … Enrico Vezzetti

References

Liu, M.Y., Tuzel, O., Veeraraghavan, A., Taguchi, Y., Marks, T.K., Chellappa, R.: Fast object localization and pose estimation in heavy clutter for robotic bin picking. Int. J. Robot. Re. 31(8), 951–973 (2012)
Article Google Scholar
Ciocarlie, M., Hsiao, K., Jones, E.G., Chitta, S., Rusu, R.B., Şucan, I.A.: Towards reliable grasping and manipulation in household environments. In: Khatib, O., Kumar, V., Sukhatme, G. (eds.) Experimental robotics, pp. 241–252. Springer, Berlin (2014). https://doi.org/10.1007/978-3-642-28572-1_17
Chapter Google Scholar
Eppner, C., Höfer, S., Jonschkowski, R., Martín-Martín, R., Sieverling, A., Wall, V., Brock, O.: Four aspects of building robotic systems: lessons from the Amazon picking challenge 2015. Autono. Robots 42(7), 1459–1475 (2018)
Article Google Scholar
Corbato, C.H., Bharatheesha, M., Van Egmond, J., Ju, J., Wisse, M.: Integrating different levels of automation: lessons from winning the Amazon robotics challenge 2016. IEEE Trans. Ind. Inf. 14(11), 4916–4926 (2018)
Article Google Scholar
Morrison, D., Tow, A.W., Mctaggart, M., Smith, R., Kelly-Boxall, N., Wade-Mccue, S., Erskine, J., Grinover, R., Gurman, A., Hunn, T., Lee, D.: Cartman: the low-cost cartesian manipulator that won the amazon robotics challenge. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7757–7764 (2018)
Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Aparicio Ojea, J., Goldberg, K.: Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In: Robotics: Science and Systems XIII (RSS), Massachusetts Institute of Technology, Cambridge, MA, 12–16 July 2017
ten Pas, A., Gualtieri, M., Saenko, K., Platt, R.: Grasp pose detection in point clouds. Int. J. Robot. Res. 36(13–14), 1455–1473 (2017)
Google Scholar
Mahler, J., Matl, M., Liu, X., Li, A., Gealy, D., Goldberg, K.: Dex-Net 3.0: computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8 (2018)
Milan, A., Pham, T., Vijay, K., Morrison, D., Tow, A.W., Liu, L., Erskine, J., Grinover, R., Gurman, A., Hunn, T., Kelly-Boxall, N.: Semantic segmentation from limited training data. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1908–1915 (2018)
Schwarz, M., Milan, A., Periyasamy, A.S., Behnke, S.: RGB-D object detection and semantic segmentation for autonomous manipulation in clutter. Int. J. Robot. Res. 37(4–5), 437–451 (2018)
Article Google Scholar
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)
Article Google Scholar
Zhang, H., Zhou, X., Lan, X., Li, J., Tian, Z.: Zheng, N: A real-time robotic grasping approach with oriented anchor box. IEEE Trans. Syst. Man Cybern. Syst. (2019). https://doi.org/10.1109/TSMC.2019.2917034
Article Google Scholar
Jiang, Y., Moseson, S., Saxena, A.: Efficient grasping from rgbd images: Learning using a new rectangle representation. In: 2011 IEEE International Conference on Robotics and Automation, pp. 3304–3311 (2011)
Redmon, J., & Angelova, A: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1316–1322 (2015)
Zeng, A., Song, S., Yu, K.T., Donlon, E., Hogan, F.R., Bauza, M., Ma, D., Taylor, O., Liu, M., Romo, E., Fazeli, N.: Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8 (2018)
Zeng, A., Song, S., Yu, K.T., Donlon, E., Hogan, F.R., Bauza, M., Ma, D., Taylor, O., Liu, M., Romo, E., Fazeli, N.: Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. Int. J. Robot. Res. (2018). https://doi.org/10.1177/0278364919868017
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Göngör, F., Tutsoy, Ö.: Design and implementation of a facial character analysis algorithm for humanoid robots. Robotica 37(11), 1850–1866 (2019)
Article Google Scholar
Qi, C. R., Su, H., Mo, K., & Guibas, L. J: Pointnet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Qi, C. R., Yi, L., Su, H., Guibas, L. J: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Liang, H., Ma, X., Li, S., Görner, M., Tang, S., Fang, B., Sun, F., Zhang, J.: Pointnetgpd: Detecting grasp configurations from point sets. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3629–3635 (2019)
Wang, C., Xu, D., Zhu, Y., Martín-Martín, R., Lu, C., Fei-Fei, L., Savarese, S: Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3343–3352 (2019)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012)
Article Google Scholar
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR, abs/1412.0767 2(7), 8 (2014)
Google Scholar
Maturana, D., Scherer, S.: Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928 (2015)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Che, C., et al.: Improved deep learning-based macromolecules structure classification from electron cryo-tomograms. Mach. Vis. Appl. 29(8), 1227–1236 (2018)
Article Google Scholar
Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3D bounding box estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 244–253 (2018)
Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. In: Proceedings of Robotics: Science and Systems (2018). https://doi.org/10.15607/RSS.2018.XIV.021
Caldera, S., Rassau, A., Chai, D.: Review of deep learning methods in robotic grasp detection. Multimodal Technol. Inter. 2(3), 57 (2018)
Article Google Scholar
Danielczuk, M., Matl, M., Gupta, S., Li, A., Lee, A., Mahler, J., Goldberg, K.: Segmenting unknown 3D objects from real depth images using mask R-CNN trained on synthetic data. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7283–7290 (2019)
Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 769–776 (2017)
Chu, F.J., Xu, R., Vela, P.: A: real-world multiobject, multigrasp detection. IEEE Robot. Autom. Lett. 3(4), 3355–3362 (2018)
Article Google Scholar
Morrison, D., Corke, P., Leitner, J.: Learning robust, real-time, reactive robotic grasping. Int. J. Robot. Res. 39, 183–201 (2019)
Article Google Scholar
Schwarz, M., Milan, A., Lenz, C., Munoz, A., Periyasamy, A.S., Schreiber, M., Schüller, S., Behnke, S.: NimbRo Picking: versatile part handling for warehouse automation. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3032–3039 (2017)
Schwarz, M., Lenz, C., García, G. M., Koo, S., Periyasamy, A. S., Schreiber, M., Behnke, S: Fast object learning and dual-arm coordination for cluttered stowing, picking, and packing. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3347–3354 (2018)
Liu, W., Pan, Z., Liu, W., Shao, Q., Hu, J., Wang, W., Ma, J., Qi, J., Zhao, W., Du, S.: Deep learning for picking point detection in dense cluster. In: 2017 11th Asian Control Conference (ASCC), pp. 1644–1649 (2017)
Eitel, A., Springenberg, J. T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 681-687), 2015
Asif, U., Bennamoun, M., Sohel, F.: A: RGB-D object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans. Robot. 33(3), 547–564 (2017)
Article Google Scholar
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer, Cham (2014)
Gualtieri, M., Ten Pas, A., Saenko, K., Platt, R.: High precision grasp pose detection in dense clutter. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 598–605 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Pinto, L., Gupta, A.: Supersizing self-supervision: learning to grasp from 50 k tries and 700 robot hours. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 3406–3413 (2016)
Zhang, Z.: Iterative point matching for registration of free-form curves and surfaces. Int. J. Comput. Vis. 13(2), 119–152 (1994)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (51775332, 51675329, 51675342, 51975350) and National Key Scientific Instruments and Equipment Development Projects of China (2016YFF0101602, 2016YFC0104104).

Author information

Authors and Affiliations

Institute of Knowledge Based Engineering, School of Mechanical Engineering, Shanghai Jiao Tong University, 800 Dongchuan RD, Minhang District, Shanghai, 200240, China
Weiming Wang, Wenhai Liu, Jie Hu, Yi Fang, Quanquan Shao & Jin Qi

Authors

Weiming Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenhai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Fang
View author publications
You can also search for this author in PubMed Google Scholar
Quanquan Shao
View author publications
You can also search for this author in PubMed Google Scholar
Jin Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Weiming Wang or Jie Hu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, W., Liu, W., Hu, J. et al. GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB–XYZ fusion in dense clutter. Machine Vision and Applications 31, 58 (2020). https://doi.org/10.1007/s00138-020-01108-y

Download citation

Received: 12 November 2019
Revised: 06 July 2020
Accepted: 17 July 2020
Published: 20 August 2020
DOI: https://doi.org/10.1007/s00138-020-01108-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB–XYZ fusion in dense clutter

Abstract

Access this article

Similar content being viewed by others

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Deep learning-based 3D reconstruction: a survey

6D object position estimation from 2D images: a literature review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB–XYZ fusion in dense clutter

Abstract

Access this article

Similar content being viewed by others

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Deep learning-based 3D reconstruction: a survey

6D object position estimation from 2D images: a literature review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation