Skip to main content
Log in

GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB–XYZ fusion in dense clutter

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Robotic grasping of diverse range of novel objects is a great challenge in dense clutter, which is also critical to many applications. However, current methods are vulnerable to perception uncertainty for dense stacked objects, resulting in limited accuracy of multi-parameter grasp prediction. In this paper, we propose a two-stage grasp detection pipeline including sampling and predicting stages. The first sampling stage applies fully convolutional network to generate grasp proposal regions, which contain potential graspable objects. Among grasp proposal region, the second prediction stage predicts complete grasp parameters based on fusion of RGB–XYZ heightmaps, which are converted from color and depth images. To perceive essential structures of stable grasping, 2D CNN and 3D CNN are used to learn color and geometric features to predict multi-parameter grasp, respectively. The direct mapping from heightmaps to grasp parameters is realized based on a multi-task loss. Experiments on a self-built dataset and an open dataset are conducted to analyze the network performance. The results indicate that the proposed two-stage method achieves the best performance among other grasp detection algorithms. Robotic experiments demonstrate generalization ability and robustness in dense clutter for novel objects, and the proposed method achieves average grasp success rate of 82.4%, which is also better than other state-of-the-art methods. Our self-built dataset and robotic grasping video are available at https://github.com/liuwenhai/toteGrasping.git.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Liu, M.Y., Tuzel, O., Veeraraghavan, A., Taguchi, Y., Marks, T.K., Chellappa, R.: Fast object localization and pose estimation in heavy clutter for robotic bin picking. Int. J. Robot. Re. 31(8), 951–973 (2012)

    Article  Google Scholar 

  2. Ciocarlie, M., Hsiao, K., Jones, E.G., Chitta, S., Rusu, R.B., Şucan, I.A.: Towards reliable grasping and manipulation in household environments. In: Khatib, O., Kumar, V., Sukhatme, G. (eds.) Experimental robotics, pp. 241–252. Springer, Berlin (2014). https://doi.org/10.1007/978-3-642-28572-1_17

    Chapter  Google Scholar 

  3. Eppner, C., Höfer, S., Jonschkowski, R., Martín-Martín, R., Sieverling, A., Wall, V., Brock, O.: Four aspects of building robotic systems: lessons from the Amazon picking challenge 2015. Autono. Robots 42(7), 1459–1475 (2018)

    Article  Google Scholar 

  4. Corbato, C.H., Bharatheesha, M., Van Egmond, J., Ju, J., Wisse, M.: Integrating different levels of automation: lessons from winning the Amazon robotics challenge 2016. IEEE Trans. Ind. Inf. 14(11), 4916–4926 (2018)

    Article  Google Scholar 

  5. Morrison, D., Tow, A.W., Mctaggart, M., Smith, R., Kelly-Boxall, N., Wade-Mccue, S., Erskine, J., Grinover, R., Gurman, A., Hunn, T., Lee, D.: Cartman: the low-cost cartesian manipulator that won the amazon robotics challenge. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7757–7764 (2018)

  6. Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Aparicio Ojea, J., Goldberg, K.: Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In: Robotics: Science and Systems XIII (RSS), Massachusetts Institute of Technology, Cambridge, MA, 12–16 July 2017

  7. ten Pas, A., Gualtieri, M., Saenko, K., Platt, R.: Grasp pose detection in point clouds. Int. J. Robot. Res. 36(13–14), 1455–1473 (2017)

    Google Scholar 

  8. Mahler, J., Matl, M., Liu, X., Li, A., Gealy, D., Goldberg, K.: Dex-Net 3.0: computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8 (2018)

  9. Milan, A., Pham, T., Vijay, K., Morrison, D., Tow, A.W., Liu, L., Erskine, J., Grinover, R., Gurman, A., Hunn, T., Kelly-Boxall, N.: Semantic segmentation from limited training data. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1908–1915 (2018)

  10. Schwarz, M., Milan, A., Periyasamy, A.S., Behnke, S.: RGB-D object detection and semantic segmentation for autonomous manipulation in clutter. Int. J. Robot. Res. 37(4–5), 437–451 (2018)

    Article  Google Scholar 

  11. Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)

    Article  Google Scholar 

  12. Zhang, H., Zhou, X., Lan, X., Li, J., Tian, Z.: Zheng, N: A real-time robotic grasping approach with oriented anchor box. IEEE Trans. Syst. Man Cybern. Syst. (2019). https://doi.org/10.1109/TSMC.2019.2917034

    Article  Google Scholar 

  13. Jiang, Y., Moseson, S., Saxena, A.: Efficient grasping from rgbd images: Learning using a new rectangle representation. In: 2011 IEEE International Conference on Robotics and Automation, pp. 3304–3311 (2011)

  14. Redmon, J., & Angelova, A: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1316–1322 (2015)

  15. Zeng, A., Song, S., Yu, K.T., Donlon, E., Hogan, F.R., Bauza, M., Ma, D., Taylor, O., Liu, M., Romo, E., Fazeli, N.: Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8 (2018)

  16. Zeng, A., Song, S., Yu, K.T., Donlon, E., Hogan, F.R., Bauza, M., Ma, D., Taylor, O., Liu, M., Romo, E., Fazeli, N.: Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. Int. J. Robot. Res. (2018). https://doi.org/10.1177/0278364919868017

    Article  Google Scholar 

  17. Long, J., Shelhamer, E., Darrell, T: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  18. Göngör, F., Tutsoy, Ö.: Design and implementation of a facial character analysis algorithm for humanoid robots. Robotica 37(11), 1850–1866 (2019)

    Article  Google Scholar 

  19. Qi, C. R., Su, H., Mo, K., & Guibas, L. J: Pointnet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

  20. Qi, C. R., Yi, L., Su, H., Guibas, L. J: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

  21. Liang, H., Ma, X., Li, S., Görner, M., Tang, S., Fang, B., Sun, F., Zhang, J.: Pointnetgpd: Detecting grasp configurations from point sets. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3629–3635 (2019)

  22. Wang, C., Xu, D., Zhu, Y., Martín-Martín, R., Lu, C., Fei-Fei, L., Savarese, S: Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3343–3352 (2019)

  23. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012)

    Article  Google Scholar 

  24. Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR, abs/1412.0767 2(7), 8 (2014)

    Google Scholar 

  25. Maturana, D., Scherer, S.: Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928 (2015)

  26. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

  27. Che, C., et al.: Improved deep learning-based macromolecules structure classification from electron cryo-tomograms. Mach. Vis. Appl. 29(8), 1227–1236 (2018)

    Article  Google Scholar 

  28. Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3D bounding box estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 244–253 (2018)

  29. Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. In: Proceedings of Robotics: Science and Systems (2018). https://doi.org/10.15607/RSS.2018.XIV.021

  30. Caldera, S., Rassau, A., Chai, D.: Review of deep learning methods in robotic grasp detection. Multimodal Technol. Inter. 2(3), 57 (2018)

    Article  Google Scholar 

  31. Danielczuk, M., Matl, M., Gupta, S., Li, A., Lee, A., Mahler, J., Goldberg, K.: Segmenting unknown 3D objects from real depth images using mask R-CNN trained on synthetic data. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7283–7290 (2019)

  32. Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 769–776 (2017)

  33. Chu, F.J., Xu, R., Vela, P.: A: real-world multiobject, multigrasp detection. IEEE Robot. Autom. Lett. 3(4), 3355–3362 (2018)

    Article  Google Scholar 

  34. Morrison, D., Corke, P., Leitner, J.: Learning robust, real-time, reactive robotic grasping. Int. J. Robot. Res. 39, 183–201 (2019)

    Article  Google Scholar 

  35. Schwarz, M., Milan, A., Lenz, C., Munoz, A., Periyasamy, A.S., Schreiber, M., Schüller, S., Behnke, S.: NimbRo Picking: versatile part handling for warehouse automation. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3032–3039 (2017)

  36. Schwarz, M., Lenz, C., García, G. M., Koo, S., Periyasamy, A. S., Schreiber, M., Behnke, S: Fast object learning and dual-arm coordination for cluttered stowing, picking, and packing. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3347–3354 (2018)

  37. Liu, W., Pan, Z., Liu, W., Shao, Q., Hu, J., Wang, W., Ma, J., Qi, J., Zhao, W., Du, S.: Deep learning for picking point detection in dense cluster. In: 2017 11th Asian Control Conference (ASCC), pp. 1644–1649 (2017)

  38. Eitel, A., Springenberg, J. T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 681-687), 2015

  39. Asif, U., Bennamoun, M., Sohel, F.: A: RGB-D object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans. Robot. 33(3), 547–564 (2017)

    Article  Google Scholar 

  40. Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer, Cham (2014)

  41. Gualtieri, M., Ten Pas, A., Saenko, K., Platt, R.: High precision grasp pose detection in dense clutter. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 598–605 (2016)

  42. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  43. Pinto, L., Gupta, A.: Supersizing self-supervision: learning to grasp from 50 k tries and 700 robot hours. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 3406–3413 (2016)

  44. Zhang, Z.: Iterative point matching for registration of free-form curves and surfaces. Int. J. Comput. Vis. 13(2), 119–152 (1994)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (51775332, 51675329, 51675342, 51975350) and National Key Scientific Instruments and Equipment Development Projects of China (2016YFF0101602, 2016YFC0104104).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Weiming Wang or Jie Hu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Liu, W., Hu, J. et al. GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB–XYZ fusion in dense clutter. Machine Vision and Applications 31, 58 (2020). https://doi.org/10.1007/s00138-020-01108-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-020-01108-y

Keywords

Navigation