Skip to main content
Log in

2D3D-MVPNet: Learning cross-domain feature descriptors for 2D-3D matching based on multi-view projections of point clouds

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Robust local cross-domain feature descriptors of 2D images and 3D point clouds play an important role in 2D and 3D vision applications, e.g. augmented Reality (AR) and robot navigation. Essentially, the robust local cross-domain feature descriptors have the potential to establish a spatial relationship between 2D space and 3D space. However, it is challenging for manual-based or traditional deep learning-based methods to represent the invariant cross-domain feature descriptors between 2D images and 3D point clouds. Specifically, the mainstream point cloud deep learning network is used to extract the global structure information of the scene. Due to the dimensional difference, there is a large gap between the two-dimensional picture and the three-dimensional structure feature in feature accommodation. In this paper, based on the 2D image patch and 3D point cloud volume dataset, a novel network, 2D3D-MVPNet, is proposed to jointly learn robust local cross-domain feature descriptors between 2D images and 3D point clouds. The 2D3D-MVPNet contains a point cloud branch and an image branch, which are optimized with triplet loss and a second-order similarity regularization. Specifically, for the point cloud branch, first, a novel point cloud feature descriptor extractor, named the image-based point cloud encoder, is introduced to learn a local 3D feature descriptor consistent with the local 2D feature descriptor, so that the local 3D feature descriptors contain both geometry and colour texture information. Second, to overcome the challenge of random order of projected image inputs, a symmetric function is introduced to deal with the feature combination of point cloud projections. Experiments show that the local cross-domain feature descriptors of 2D images and 3D point clouds learned by 2D3D-MVPNet achieve extraordinary 2D to 3D retrieval performance. In addition, several 3D point cloud registration results demonstrate the effectiveness of the image-based point cloud encoder.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Liu W, Wang C, Bian X, Chen S, Yu S, Lin X, Lai S-H, Weng D, Li J (2019) Learning to match ground camera image and uav 3-d model-rendered image based on siamese network with attention mechanism. IEEE Geosci Remote Sens Lett 17(9):1608–1612

    Article  Google Scholar 

  2. Li Y, Wang Z (2021) 3d reconstruction with single-shot structured light rgb line pattern. Sensors 21(14):4819

    Article  Google Scholar 

  3. Li Y, Wang Z (2020) Rgb line pattern-based stereo vision matching for single-shot 3-d measurement. IEEE Trans Instrum Meas 70:1–13

    Google Scholar 

  4. Shuang YC, Wang ZZ (2021) Active stereo vision three-dimensional reconstruction by rgb dot pattern projection and ray intersection. Meas 167:108195

    Article  Google Scholar 

  5. Yi W u, Jiang X, Fang Z, Gao Y, Fujita H (2021) Multi-modal 3d object detection by 2d-guided precision anchor proposal and multi-layer fusion. Appl Soft Comput 108:107405

    Article  Google Scholar 

  6. Liu W, Lai B, Wang C, Cai G, Yanfei S u, Bian X, Li Y, Chen S, Li J (2020) Ground camera image and large-scale 3-d image-based point cloud registration based on learning domain invariant feature descriptors. IEEE J Sel Top Appl Earth Obs Remote Sens 14:997–1009

    Article  Google Scholar 

  7. Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3d point clouds. In: European conference on computer vision (ECCV), Springer, pp 15–29

  8. Valgren C, Lilienthal AJ (2010) Sift, surf & seasons: Appearance-based long-term localization in outdoor environments. Robot Auton Syst 58(2):149–156

    Article  Google Scholar 

  9. Sattler T, Leibe B, Kobbelt L (2016) Efficient & effective prioritized matching for large-scale image-based localization. IEEE Trans Pattern Anal Mach Intell 39(9):1744–1756

    Article  Google Scholar 

  10. Feng M, Hu S, Ang MH, Lee GH (2019) 2d3d-matchnet: Learning to match keypoints across 2d image and 3d point cloud. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 4790–4796

  11. Liu W, Lai B, Wang C, Bian X, Yang W, Xia Y, Lin X, Lai S-H, Weng D, Li J (2020) Learning to match 2d images and 3d lidar point clouds for outdoor augmented reality. In: 2020 IEEE Conference on virtual reality and 3d user interfaces abstracts and workshops (VRW), IEEE, pp 654–655

  12. Pham Q-H, Uy MA, Hua B-S, Nguyen DT, Roig G, Yeung S-K (2020) Lcd: Learned cross-domain descriptors for 2d-3d matching. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), vol 34, pp 11856–11864

  13. Qi CR, Hao S u, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 652–660

  14. Xing X, Cai Y, Lu T, Cai S, Yang Y, Wen D (2018) 3dtnet: Learning local features using 2d and 3d cues. In: 2018 International conference on 3d vision (3DV), IEEE, pp 435–443

  15. Zeng A, Song S, Nießner M, Fisher M, Xiao J, Funkhouser T (2017) 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1802–1811

  16. Han X, Leung T, Jia Y, Sukthankar R, Berg AC (2015) Matchnet: Unifying feature and metric learning for patch-based matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3279–3286

  17. Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P, Moreno-Noguer F (2015) Discriminative learning of deep convolutional feature point descriptors. In: Proceedings of the IEEE international conference on computer vision (ICCV) pp 118–126

  18. Yang Tsun-Yi, Hsu Jo-Han, Lin Yen-Yu, Chuang Yung-Yu (2017) Deepcd: Learning deep complementary descriptors for patch representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3314–3322

  19. Tian Y, Fan B, Fuchao W u (2017) L2-net: Deep learning of discriminative patch descriptor in euclidean space. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 661–669

  20. Liu W, Shen X, Wang C, Zhang Z, Wen C, Li J (2018) H-net: neural network for cross-domain image patch matching. In: International joint conference on artificial intelligence (IJCAI), pp 856–863

  21. Dong Y, Jiao W, Long T, Liu L, He G, Gong C, Guo Y (2019) Local deep descriptor for remote sensing image feature matching. Remote Sens 11(4):430

    Article  Google Scholar 

  22. Liu W, Wang C, Bian X, Chen S, Li W, Lin X, Li Y, Weng D, Lai S-H, Li J (2019) Ae-gan-net: Learning invariant feature descriptor to match ground camera images and a large-scale 3d image-based point cloud for outdoor augmented reality. Remote Sens 11(19):2243

    Article  Google Scholar 

  23. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 815–823

  24. He K, Yan L u, Sclaroff S (2018) Local descriptors optimized for average precision. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 596–605

  25. Keller M, Chen Z, Maffra F, Schmuck P, Chli M (2018) Learning deep descriptors with scale-aware triplet networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2762–2770

  26. DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 224–236

  27. Revaud J, Weinzaepfel P, Souza César D, Pion N, Csurka G, Cabon Y, Humenberger M (2019) R2d2: Repeatable and reliable detector and descriptor. CoRR, arXiv:abs/1906.06195

  28. Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T (2019) D2-net: A trainable cnn for joint description and detection of local features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8092–8101

  29. Luo Z, Zhou L, Bai X, Chen H, Zhang J, Yao Y, Li S, Fang T, Quan L (2020) Aslfeat: Learning local features of accurate shape and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6589–6598

  30. Qi CR, Li Y i, Hao S u, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv Neural Inform Process Syst 30:5099–5108

    Google Scholar 

  31. Jiang M, Wu Y, Zhao T, Zhao Z, Lu C (2018) Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv:1807.00652

  32. Li Y, Rui B u, Sun M, Wei W u, Di X, Chen B (2018) Pointcnn: Convolution on x-transformed points. Adv Neural Inform Process Syst 31:820–830

    Google Scholar 

  33. Gojcic Z, Zhou C, Wegner JD, Wieser A (2019) The perfect match: 3d point cloud matching with smoothed densities. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5545–5554

  34. Deng H, Birdal T, Ilic S (2018) Ppf-foldnet: Unsupervised learning of rotation invariant 3d local descriptors. In: Proceedings of the European conference on computer vision (ECCV), pp 602–618

  35. Choy C, Park J, Koltun V (2019) Fully convolutional geometric features. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 8958–8966

  36. Yew ZJ, Lee GH (2018) 3dfeat-net: Weakly supervised local 3d features for point cloud registration. In: Proceedings of the European conference on computer vision (ECCV), pp 607–623

  37. Bai X, Luo Z, Zhou L, Fu H, Quan L, Tai C-L (2020) D3feat: Joint learning of dense detection and description of 3d local features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6359–6367

  38. Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 945–953

  39. Feng Y, Zhang Z, Zhao X, Ji R, Gao Y (2018) Gvcnn: Group-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 264–272

  40. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1912–1920

  41. Riegler G, Ulusoy AO, Geiger A (2017) Octnet: Learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3577–3586

  42. Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4558–4567

  43. Shi S, Guo C, Li J, Wang Z, Shi J, Wang X, Li H (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 10529–10538

  44. Xiao A, Yang X, Lu S, Guan D, Huang J (2021) Fps-net: a convolutional fusion network for large-scale lidar point cloud segmentation. ISPRS J Photogramm Remote Sens 176:237–249

    Article  Google Scholar 

  45. Zhong Y u (2009) Intrinsic shape signatures: A shape descriptor for 3d object recognition. In: IEEE International conference on computer vision workshops, ICCV workshops, IEEE, pp 689–696

  46. Huai Y u, Zhen W, Yang W, Ji Z, Scherer S (2020) Monocular camera localization in prior lidar maps with 2d-3d line correspondences. In: 2020 IEEE/RSJ International conference on intelligent robots and systems (IROS), IEEE, pp 4588–4594

  47. Li J, Lee GH (2021) Deepi2p: Image-to-point cloud registration via deep classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 15960–15969

  48. Cattaneo D, Vaghi M, Fontana S, Ballardini AL, Sorrenti DG (2020) Global visual localization in lidar-maps through shared 2d-3d embedding space. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 4365–4371

  49. Mishchuk A, Mishkin D, Radenovic F, Matas J (2017) Working hard to know your neighbor’s margins: Local descriptor learning loss. In: Advances in neural information processing systems, pp 4826–4837

Download references

Acknowledgements

This work is supported in part by China Postdoctoral Science Foundation (No.2021M690094), in part by National Natural Science Foundation of China (Nos. 61971363, U1605254, 61872306, 61701191, 41871380), in part by Natural Science Fund of Fujian Province (No. 2018J05108), in part by Xia-men Science and Technology Bureau (No. 3502Z20193017) and in part by the China Fundamental Research Funds for the Central Universities (No.20720210074). And we also thank Associate professor Yu Zang from the School of Informatics, Xiamen University, he helped us reorganize the logical relationship and language of this paper during rebuttal progress.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiquan Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lai, B., Liu, W., Wang, C. et al. 2D3D-MVPNet: Learning cross-domain feature descriptors for 2D-3D matching based on multi-view projections of point clouds. Appl Intell 52, 14178–14193 (2022). https://doi.org/10.1007/s10489-022-03372-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03372-z

Keywords

Navigation