Abstract
Point cloud is currently the most typical representation in describing the 3D world. However, recognizing objects as well as the poses from point clouds is still a great challenge due to the property of disordered 3D data arrangement. In this paper, a unified deep learning framework for 3D scene segmentation and 6D object pose estimation is proposed. In order to accurately segment foreground objects, a novel shape pattern aggregation module called PointDoN is proposed, which could learn meaningful deep geometric representations from both Difference of Normals (DoN) and the initial spatial coordinates of point cloud. Our PointDoN is flexible to be applied to any convolutional networks and shows improvements in the popular tasks of point cloud classification and semantic segmentation. Once the objects are segmented, the range of point clouds for each object in the entire scene could be specified, which enables us to further estimate the 6D pose for each object within local region of interest. To acquire good estimate, we propose a new 6D pose estimation approach that incorporates both 2D and 3D features generated from RGB images and point clouds, respectively. Specifically, 3D features are extracted via a CNN-based architecture where the input is XYZ map converted from the initial point cloud. Experiments showed that our method could achieve satisfactory results on the publicly available point cloud datasets in both tasks of segmentation and 6D pose estimation.
Similar content being viewed by others
References
Armeni I, Sener O, Zamir AR, Jiang H, Brilakis I, Fischer M, Savarese S (2016) 3d semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 1534–1543
Aubry M, Maturana D, Efros AA, Russell BC, Sivic J (2014) Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3762–3769
Bai S, Bai X, Zhou Z, Zhang Z, Jan Latecki L (2016) Gift: a real-time and scalable 3d shape search engine. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5023–5032
Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision, pp. 536–551. Springer
Brachmann E, Michel F, Krull A, Ying Yang M, Gumhold S, et al (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3364–3372
Brock A, Lim T, Ritchie JM, Weston N (2016) Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236
Cao Z, Huang Q, Karthik R (2017) 3d object classification via spherical projections. In: 2017 International Conference on 3D Vision (3DV), pp. 566–574. IEEE
Collet A, Martinez M, Srinivasa SS (2011) The moped framework: object recognition and pose estimation for manipulation. Int J Robot Res 30(10):1284–1306
Ferrari V, Tuytelaars T, Van Gool L (2006) Simultaneous object recognition and segmentation from single or multiple model views. Int J Comput Vis 67(2):159–188
Gu C, Lu C, Gu C, Guan X (2019) Viewpoint estimation using triplet loss with a novel viewpoint-based input selection strategy. In: Journal of Physics: Conference Series, vol. 1207, p. 012009. IOP Publishing
Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 international conference on computer vision, pp. 858–865. IEEE
Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision, pp. 548–562. Springer
Huang H, Kalogerakis E, Chaudhuri S, Ceylan D, Kim VG, Yumer E (2017) Learning local shape descriptors from part correspondences with multiview convolutional networks. ACM Trans Gr (TOG) 37(1):1–14
Huang J, You S (2016) Point cloud labeling using 3d convolutional neural network. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2670–2675. IEEE
Huang Q, Wang W, Neumann U (2018) Recurrent slice networks for 3d segmentation of point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2635
Ioannou Y, Taati B, Harrap R, Greenspan M (2012) Difference of normals as a multi-scale operator in unorganized point clouds. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 501–508. IEEE
Kalogerakis E, Averkiou M, Maji S, Chaudhuri S (2017) 3d shape segmentation with projective convolutional networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3779–3788
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1521–1529
Kehl W, Milletari F, Tombari F, Ilic S, Navab N (2016) Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation. In: European conference on computer vision, pp. 205–220. Springer
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4558–4567
Le T, Duan Y (2018) Pointgrid: A deep network for 3d shape understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9204–9214
Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: an accurate o (n) solution to the pnp problem. Int J of Comput Vis 81(2):155
Li C, Bai J, Hager GD (2018) A unified framework for multi-view multi-class object pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 254–269
Li J, Chen BM, Hee Lee G (2018) So-net: Self-organizing network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9397–9406
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. In: Advances in neural information processing systems, pp. 820–830
Li Y, Wang G, Ji X, Xiang Y, Fox D (2018) Deepim: Deep iterative matching for 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 683–698
Li Z, Sun Y, Tang J (2021) Ctnet: Context-based tandem network for semantic segmentation
Li Z, Tang J, Mei T (2019) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083. https://doi.org/10.1109/TPAMI.2018.2852750
Lu C, Gu C, Wu K, Xia S, Wang H, Guan X (2020) Deep transfer neural network using hybrid representations of domain discrepancy. Neurocomputing 409:60–73
Lu C, Wang H, Gu C, Wu K, Guan X (2018) Viewpoint estimation for workpieces with deep transfer learning from cold to hot. In: International Conference on Neural Information Processing, pp. 21–32. Springer
Maturana D, Scherer S (2015) 3d convolutional neural networks for landing zone detection from lidar. In: 2015 IEEE international conference on robotics and automation (ICRA), pp. 3471–3478. IEEE
Maturana D, Scherer S (2015) Voxnet: A 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE
Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7074–7082
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp. 1520–1528
Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 918–927
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660
Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5648–5656
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: advances in neural information processing systems, pp. 5099–5108
Rad M, Lepetit V (2017) Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3828–3836
Riegler G, Osman Ulusoy A, Geiger A (2017) Octnet: Learning deep 3d representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3577–3586
Rios-Cabrera R, Tuytelaars T (2013) Discriminatively trained templates for 3d object detection: a real time scalable approach. In: Proceedings of the IEEE international conference on computer vision, pp. 2048–2055
Rothganger F, Lazebnik S, Schmid C, Ponce J (2003) 3d object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., vol. 2, pp. II–272. IEEE
Schwarz M, Schulz H, Behnke S (2015) Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In: 2015 IEEE international conference on robotics and automation (ICRA), pp. 1329–1335. IEEE
Sedaghat N, Zolfaghari M, Amiri E, Brox T (2016) Orientation-boosted voxel nets for 3d object recognition. arXiv preprint arXiv:1604.03351
Singhirunnusorn K, Fahimi F, Aygun R (2018) Single-camera pose estimation using mirage. IET Comput Vis 12(5):720–727
Su H, Jampani V, Sun D, Maji S, Kalogerakis E, Yang MH, Kautz J (2018) Splatnet: Sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2530–2539
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 945–953
Sundermeyer M, Marton ZC, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 699–715
Tatarchenko M, Dosovitskiy A, Brox T (2017) Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2088–2096
Tchapmi L, Choy C, Armeni I, Gwak J, Savarese S (2017) Segcloud: semantic segmentation of 3d point clouds. In: 2017 international conference on 3D vision (3DV), pp. 537–547. IEEE
Tejani A, Tang D, Kouskouridas R, Kim TK (2014) Latent-class hough forests for 3d object detection and pose estimation. In: European Conference on Computer Vision, pp. 462–477. Springer
Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 292–301
Tulsiani S, Malik J (2015) Viewpoints and keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1519
Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 3343–3352
Wang PS, Liu Y, Guo YX, Sun CY, Tong X (2017) O-cnn: octree-based convolutional neural networks for 3d shape analysis. ACM Trans Gr (TOG) 36(4):1–11
Wohlhart P, Lepetit V (2015) Learning descriptors for object recognition and 3d pose estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 3109–3118
Wu X, Lu C, Gu C, Wu K, Zhu S (2021) Domain adaptation for viewpoint estimation with image generation. In: 2021 International Conference on control, automation and information sciences (ICCAIS), pp. 341–346. IEEE
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 1912–1920
Xiang Y, Choi W, Lin Y, Savarese S (2015) Data-driven 3d voxel patterns for object category recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 1903–1911
Xiang Y, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp. 924–933. IEEE
Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199
Xie S, Liu S, Chen Z, Tu Z (2018) Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer vision and pattern recognition, pp. 4606–4615
Xu D, Anguelov D, Jain A (2018) Pointfusion: Deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 244–253
Zhang X, Jiang Z, Zhang H, Wei Q (2018) Vision-based pose estimation for textureless space objects by contour points matching. IEEE Trans Aerosp Electron Syst 54(5):2342–2355
Zhao S, Gu C, Lu C, Huang Y, Wu K, Guan X (2019) Pointdon: A shape pattern aggregation module for deep learning on point cloud. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE
Zhu M, Derpanis KG, Yang Y, Brahmbhatt S, Zhang M, Phillips C, Lecce M, Daniilidis K (2014) Single image 3d object detection and pose estimation for grasping. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3936–3943. IEEE
Acknowledgements
This work is supported by National Key R&D Program of China No. 2018YFB1703201, Shanghai Action Plan for Science and Technology Innovation No. 19511109500, Chinese Ministry of Education Research Fund on Intelligent Manufacturing No. MCM20180703, the Eighth Research Institute Fund on Industry-University-Research of China Aerospace Science and Technology Corporation No. USCAST2020-6.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gu, C., Feng, Q., Lu, C. et al. Segmentation based 6D pose estimation using integrated shape pattern and RGB information. Pattern Anal Applic 25, 1055–1073 (2022). https://doi.org/10.1007/s10044-022-01078-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-022-01078-z