Abstract
Robotic grasping of diverse range of novel objects is a great challenge in dense clutter, which is also critical to many applications. However, current methods are vulnerable to perception uncertainty for dense stacked objects, resulting in limited accuracy of multi-parameter grasp prediction. In this paper, we propose a two-stage grasp detection pipeline including sampling and predicting stages. The first sampling stage applies fully convolutional network to generate grasp proposal regions, which contain potential graspable objects. Among grasp proposal region, the second prediction stage predicts complete grasp parameters based on fusion of RGB–XYZ heightmaps, which are converted from color and depth images. To perceive essential structures of stable grasping, 2D CNN and 3D CNN are used to learn color and geometric features to predict multi-parameter grasp, respectively. The direct mapping from heightmaps to grasp parameters is realized based on a multi-task loss. Experiments on a self-built dataset and an open dataset are conducted to analyze the network performance. The results indicate that the proposed two-stage method achieves the best performance among other grasp detection algorithms. Robotic experiments demonstrate generalization ability and robustness in dense clutter for novel objects, and the proposed method achieves average grasp success rate of 82.4%, which is also better than other state-of-the-art methods. Our self-built dataset and robotic grasping video are available at https://github.com/liuwenhai/toteGrasping.git.
Similar content being viewed by others
References
Liu, M.Y., Tuzel, O., Veeraraghavan, A., Taguchi, Y., Marks, T.K., Chellappa, R.: Fast object localization and pose estimation in heavy clutter for robotic bin picking. Int. J. Robot. Re. 31(8), 951–973 (2012)
Ciocarlie, M., Hsiao, K., Jones, E.G., Chitta, S., Rusu, R.B., Şucan, I.A.: Towards reliable grasping and manipulation in household environments. In: Khatib, O., Kumar, V., Sukhatme, G. (eds.) Experimental robotics, pp. 241–252. Springer, Berlin (2014). https://doi.org/10.1007/978-3-642-28572-1_17
Eppner, C., Höfer, S., Jonschkowski, R., Martín-Martín, R., Sieverling, A., Wall, V., Brock, O.: Four aspects of building robotic systems: lessons from the Amazon picking challenge 2015. Autono. Robots 42(7), 1459–1475 (2018)
Corbato, C.H., Bharatheesha, M., Van Egmond, J., Ju, J., Wisse, M.: Integrating different levels of automation: lessons from winning the Amazon robotics challenge 2016. IEEE Trans. Ind. Inf. 14(11), 4916–4926 (2018)
Morrison, D., Tow, A.W., Mctaggart, M., Smith, R., Kelly-Boxall, N., Wade-Mccue, S., Erskine, J., Grinover, R., Gurman, A., Hunn, T., Lee, D.: Cartman: the low-cost cartesian manipulator that won the amazon robotics challenge. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7757–7764 (2018)
Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Aparicio Ojea, J., Goldberg, K.: Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In: Robotics: Science and Systems XIII (RSS), Massachusetts Institute of Technology, Cambridge, MA, 12–16 July 2017
ten Pas, A., Gualtieri, M., Saenko, K., Platt, R.: Grasp pose detection in point clouds. Int. J. Robot. Res. 36(13–14), 1455–1473 (2017)
Mahler, J., Matl, M., Liu, X., Li, A., Gealy, D., Goldberg, K.: Dex-Net 3.0: computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8 (2018)
Milan, A., Pham, T., Vijay, K., Morrison, D., Tow, A.W., Liu, L., Erskine, J., Grinover, R., Gurman, A., Hunn, T., Kelly-Boxall, N.: Semantic segmentation from limited training data. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1908–1915 (2018)
Schwarz, M., Milan, A., Periyasamy, A.S., Behnke, S.: RGB-D object detection and semantic segmentation for autonomous manipulation in clutter. Int. J. Robot. Res. 37(4–5), 437–451 (2018)
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)
Zhang, H., Zhou, X., Lan, X., Li, J., Tian, Z.: Zheng, N: A real-time robotic grasping approach with oriented anchor box. IEEE Trans. Syst. Man Cybern. Syst. (2019). https://doi.org/10.1109/TSMC.2019.2917034
Jiang, Y., Moseson, S., Saxena, A.: Efficient grasping from rgbd images: Learning using a new rectangle representation. In: 2011 IEEE International Conference on Robotics and Automation, pp. 3304–3311 (2011)
Redmon, J., & Angelova, A: Real-time grasp detection using convolutional neural networks. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1316–1322 (2015)
Zeng, A., Song, S., Yu, K.T., Donlon, E., Hogan, F.R., Bauza, M., Ma, D., Taylor, O., Liu, M., Romo, E., Fazeli, N.: Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8 (2018)
Zeng, A., Song, S., Yu, K.T., Donlon, E., Hogan, F.R., Bauza, M., Ma, D., Taylor, O., Liu, M., Romo, E., Fazeli, N.: Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. Int. J. Robot. Res. (2018). https://doi.org/10.1177/0278364919868017
Long, J., Shelhamer, E., Darrell, T: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Göngör, F., Tutsoy, Ö.: Design and implementation of a facial character analysis algorithm for humanoid robots. Robotica 37(11), 1850–1866 (2019)
Qi, C. R., Su, H., Mo, K., & Guibas, L. J: Pointnet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Qi, C. R., Yi, L., Su, H., Guibas, L. J: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Liang, H., Ma, X., Li, S., Görner, M., Tang, S., Fang, B., Sun, F., Zhang, J.: Pointnetgpd: Detecting grasp configurations from point sets. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3629–3635 (2019)
Wang, C., Xu, D., Zhu, Y., Martín-Martín, R., Lu, C., Fei-Fei, L., Savarese, S: Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3343–3352 (2019)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012)
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: C3D: generic features for video analysis. CoRR, abs/1412.0767 2(7), 8 (2014)
Maturana, D., Scherer, S.: Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928 (2015)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Che, C., et al.: Improved deep learning-based macromolecules structure classification from electron cryo-tomograms. Mach. Vis. Appl. 29(8), 1227–1236 (2018)
Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3D bounding box estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 244–253 (2018)
Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. In: Proceedings of Robotics: Science and Systems (2018). https://doi.org/10.15607/RSS.2018.XIV.021
Caldera, S., Rassau, A., Chai, D.: Review of deep learning methods in robotic grasp detection. Multimodal Technol. Inter. 2(3), 57 (2018)
Danielczuk, M., Matl, M., Gupta, S., Li, A., Lee, A., Mahler, J., Goldberg, K.: Segmenting unknown 3D objects from real depth images using mask R-CNN trained on synthetic data. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7283–7290 (2019)
Kumra, S., Kanan, C.: Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 769–776 (2017)
Chu, F.J., Xu, R., Vela, P.: A: real-world multiobject, multigrasp detection. IEEE Robot. Autom. Lett. 3(4), 3355–3362 (2018)
Morrison, D., Corke, P., Leitner, J.: Learning robust, real-time, reactive robotic grasping. Int. J. Robot. Res. 39, 183–201 (2019)
Schwarz, M., Milan, A., Lenz, C., Munoz, A., Periyasamy, A.S., Schreiber, M., Schüller, S., Behnke, S.: NimbRo Picking: versatile part handling for warehouse automation. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3032–3039 (2017)
Schwarz, M., Lenz, C., García, G. M., Koo, S., Periyasamy, A. S., Schreiber, M., Behnke, S: Fast object learning and dual-arm coordination for cluttered stowing, picking, and packing. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3347–3354 (2018)
Liu, W., Pan, Z., Liu, W., Shao, Q., Hu, J., Wang, W., Ma, J., Qi, J., Zhao, W., Du, S.: Deep learning for picking point detection in dense cluster. In: 2017 11th Asian Control Conference (ASCC), pp. 1644–1649 (2017)
Eitel, A., Springenberg, J. T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 681-687), 2015
Asif, U., Bennamoun, M., Sohel, F.: A: RGB-D object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans. Robot. 33(3), 547–564 (2017)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer, Cham (2014)
Gualtieri, M., Ten Pas, A., Saenko, K., Platt, R.: High precision grasp pose detection in dense clutter. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 598–605 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Pinto, L., Gupta, A.: Supersizing self-supervision: learning to grasp from 50 k tries and 700 robot hours. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 3406–3413 (2016)
Zhang, Z.: Iterative point matching for registration of free-form curves and surfaces. Int. J. Comput. Vis. 13(2), 119–152 (1994)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (51775332, 51675329, 51675342, 51975350) and National Key Scientific Instruments and Equipment Development Projects of China (2016YFF0101602, 2016YFC0104104).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, W., Liu, W., Hu, J. et al. GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB–XYZ fusion in dense clutter. Machine Vision and Applications 31, 58 (2020). https://doi.org/10.1007/s00138-020-01108-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-020-01108-y