Skip to main content
Log in

Segmentation based 6D pose estimation using integrated shape pattern and RGB information

  • Industrial and Commercial Application
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Point cloud is currently the most typical representation in describing the 3D world. However, recognizing objects as well as the poses from point clouds is still a great challenge due to the property of disordered 3D data arrangement. In this paper, a unified deep learning framework for 3D scene segmentation and 6D object pose estimation is proposed. In order to accurately segment foreground objects, a novel shape pattern aggregation module called PointDoN is proposed, which could learn meaningful deep geometric representations from both Difference of Normals (DoN) and the initial spatial coordinates of point cloud. Our PointDoN is flexible to be applied to any convolutional networks and shows improvements in the popular tasks of point cloud classification and semantic segmentation. Once the objects are segmented, the range of point clouds for each object in the entire scene could be specified, which enables us to further estimate the 6D pose for each object within local region of interest. To acquire good estimate, we propose a new 6D pose estimation approach that incorporates both 2D and 3D features generated from RGB images and point clouds, respectively. Specifically, 3D features are extracted via a CNN-based architecture where the input is XYZ map converted from the initial point cloud. Experiments showed that our method could achieve satisfactory results on the publicly available point cloud datasets in both tasks of segmentation and 6D pose estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Armeni I, Sener O, Zamir AR, Jiang H, Brilakis I, Fischer M, Savarese S (2016) 3d semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 1534–1543

  2. Aubry M, Maturana D, Efros AA, Russell BC, Sivic J (2014) Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3762–3769

  3. Bai S, Bai X, Zhou Z, Zhang Z, Jan Latecki L (2016) Gift: a real-time and scalable 3d shape search engine. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5023–5032

  4. Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision, pp. 536–551. Springer

  5. Brachmann E, Michel F, Krull A, Ying Yang M, Gumhold S, et al (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3364–3372

  6. Brock A, Lim T, Ritchie JM, Weston N (2016) Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236

  7. Cao Z, Huang Q, Karthik R (2017) 3d object classification via spherical projections. In: 2017 International Conference on 3D Vision (3DV), pp. 566–574. IEEE

  8. Collet A, Martinez M, Srinivasa SS (2011) The moped framework: object recognition and pose estimation for manipulation. Int J Robot Res 30(10):1284–1306

    Article  Google Scholar 

  9. Ferrari V, Tuytelaars T, Van Gool L (2006) Simultaneous object recognition and segmentation from single or multiple model views. Int J Comput Vis 67(2):159–188

    Article  Google Scholar 

  10. Gu C, Lu C, Gu C, Guan X (2019) Viewpoint estimation using triplet loss with a novel viewpoint-based input selection strategy. In: Journal of Physics: Conference Series, vol. 1207, p. 012009. IOP Publishing

  11. Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 international conference on computer vision, pp. 858–865. IEEE

  12. Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision, pp. 548–562. Springer

  13. Huang H, Kalogerakis E, Chaudhuri S, Ceylan D, Kim VG, Yumer E (2017) Learning local shape descriptors from part correspondences with multiview convolutional networks. ACM Trans Gr (TOG) 37(1):1–14

    Google Scholar 

  14. Huang J, You S (2016) Point cloud labeling using 3d convolutional neural network. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2670–2675. IEEE

  15. Huang Q, Wang W, Neumann U (2018) Recurrent slice networks for 3d segmentation of point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2635

  16. Ioannou Y, Taati B, Harrap R, Greenspan M (2012) Difference of normals as a multi-scale operator in unorganized point clouds. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 501–508. IEEE

  17. Kalogerakis E, Averkiou M, Maji S, Chaudhuri S (2017) 3d shape segmentation with projective convolutional networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3779–3788

  18. Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1521–1529

  19. Kehl W, Milletari F, Tombari F, Ilic S, Navab N (2016) Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation. In: European conference on computer vision, pp. 205–220. Springer

  20. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  21. Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4558–4567

  22. Le T, Duan Y (2018) Pointgrid: A deep network for 3d shape understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9204–9214

  23. Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: an accurate o (n) solution to the pnp problem. Int J of Comput Vis 81(2):155

    Article  Google Scholar 

  24. Li C, Bai J, Hager GD (2018) A unified framework for multi-view multi-class object pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 254–269

  25. Li J, Chen BM, Hee Lee G (2018) So-net: Self-organizing network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9397–9406

  26. Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. In: Advances in neural information processing systems, pp. 820–830

  27. Li Y, Wang G, Ji X, Xiang Y, Fox D (2018) Deepim: Deep iterative matching for 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 683–698

  28. Li Z, Sun Y, Tang J (2021) Ctnet: Context-based tandem network for semantic segmentation

  29. Li Z, Tang J, Mei T (2019) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083. https://doi.org/10.1109/TPAMI.2018.2852750

    Article  Google Scholar 

  30. Lu C, Gu C, Wu K, Xia S, Wang H, Guan X (2020) Deep transfer neural network using hybrid representations of domain discrepancy. Neurocomputing 409:60–73

    Article  Google Scholar 

  31. Lu C, Wang H, Gu C, Wu K, Guan X (2018) Viewpoint estimation for workpieces with deep transfer learning from cold to hot. In: International Conference on Neural Information Processing, pp. 21–32. Springer

  32. Maturana D, Scherer S (2015) 3d convolutional neural networks for landing zone detection from lidar. In: 2015 IEEE international conference on robotics and automation (ICRA), pp. 3471–3478. IEEE

  33. Maturana D, Scherer S (2015) Voxnet: A 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE

  34. Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7074–7082

  35. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp. 1520–1528

  36. Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 918–927

  37. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660

  38. Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5648–5656

  39. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: advances in neural information processing systems, pp. 5099–5108

  40. Rad M, Lepetit V (2017) Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3828–3836

  41. Riegler G, Osman Ulusoy A, Geiger A (2017) Octnet: Learning deep 3d representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3577–3586

  42. Rios-Cabrera R, Tuytelaars T (2013) Discriminatively trained templates for 3d object detection: a real time scalable approach. In: Proceedings of the IEEE international conference on computer vision, pp. 2048–2055

  43. Rothganger F, Lazebnik S, Schmid C, Ponce J (2003) 3d object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., vol. 2, pp. II–272. IEEE

  44. Schwarz M, Schulz H, Behnke S (2015) Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In: 2015 IEEE international conference on robotics and automation (ICRA), pp. 1329–1335. IEEE

  45. Sedaghat N, Zolfaghari M, Amiri E, Brox T (2016) Orientation-boosted voxel nets for 3d object recognition. arXiv preprint arXiv:1604.03351

  46. Singhirunnusorn K, Fahimi F, Aygun R (2018) Single-camera pose estimation using mirage. IET Comput Vis 12(5):720–727

    Article  Google Scholar 

  47. Su H, Jampani V, Sun D, Maji S, Kalogerakis E, Yang MH, Kautz J (2018) Splatnet: Sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2530–2539

  48. Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 945–953

  49. Sundermeyer M, Marton ZC, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 699–715

  50. Tatarchenko M, Dosovitskiy A, Brox T (2017) Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2088–2096

  51. Tchapmi L, Choy C, Armeni I, Gwak J, Savarese S (2017) Segcloud: semantic segmentation of 3d point clouds. In: 2017 international conference on 3D vision (3DV), pp. 537–547. IEEE

  52. Tejani A, Tang D, Kouskouridas R, Kim TK (2014) Latent-class hough forests for 3d object detection and pose estimation. In: European Conference on Computer Vision, pp. 462–477. Springer

  53. Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 292–301

  54. Tulsiani S, Malik J (2015) Viewpoints and keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1519

  55. Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 3343–3352

  56. Wang PS, Liu Y, Guo YX, Sun CY, Tong X (2017) O-cnn: octree-based convolutional neural networks for 3d shape analysis. ACM Trans Gr (TOG) 36(4):1–11

    Google Scholar 

  57. Wohlhart P, Lepetit V (2015) Learning descriptors for object recognition and 3d pose estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 3109–3118

  58. Wu X, Lu C, Gu C, Wu K, Zhu S (2021) Domain adaptation for viewpoint estimation with image generation. In: 2021 International Conference on control, automation and information sciences (ICCAIS), pp. 341–346. IEEE

  59. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 1912–1920

  60. Xiang Y, Choi W, Lin Y, Savarese S (2015) Data-driven 3d voxel patterns for object category recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 1903–1911

  61. Xiang Y, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp. 924–933. IEEE

  62. Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199

  63. Xie S, Liu S, Chen Z, Tu Z (2018) Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer vision and pattern recognition, pp. 4606–4615

  64. Xu D, Anguelov D, Jain A (2018) Pointfusion: Deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 244–253

  65. Zhang X, Jiang Z, Zhang H, Wei Q (2018) Vision-based pose estimation for textureless space objects by contour points matching. IEEE Trans Aerosp Electron Syst 54(5):2342–2355

    Article  Google Scholar 

  66. Zhao S, Gu C, Lu C, Huang Y, Wu K, Guan X (2019) Pointdon: A shape pattern aggregation module for deep learning on point cloud. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE

  67. Zhu M, Derpanis KG, Yang Y, Brahmbhatt S, Zhang M, Phillips C, Lecce M, Daniilidis K (2014) Single image 3d object detection and pose estimation for grasping. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3936–3943. IEEE

Download references

Acknowledgements

This work is supported by National Key R&D Program of China No. 2018YFB1703201, Shanghai Action Plan for Science and Technology Innovation No. 19511109500, Chinese Ministry of Education Research Fund on Intelligent Manufacturing No. MCM20180703, the Eighth Research Institute Fund on Industry-University-Research of China Aerospace Science and Technology Corporation No. USCAST2020-6.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaochen Gu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, C., Feng, Q., Lu, C. et al. Segmentation based 6D pose estimation using integrated shape pattern and RGB information. Pattern Anal Applic 25, 1055–1073 (2022). https://doi.org/10.1007/s10044-022-01078-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-022-01078-z

Keywords

Navigation