2D3D-MVPNet: Learning cross-domain feature descriptors for 2D-3D matching based on multi-view projections of point clouds

Lai, Baiqi; Liu, Weiquan; Wang, Cheng; Fan, Xiaoliang; Lin, Yangbin; Bian, Xuesheng; Wu, Shangbin; Cheng, Ming; Li, Jonathan

doi:10.1007/s10489-022-03372-z

2D3D-MVPNet: Learning cross-domain feature descriptors for 2D-3D matching based on multi-view projections of point clouds

Published: 03 March 2022

Volume 52, pages 14178–14193, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Baiqi Lai¹,
Weiquan Liu ORCID: orcid.org/0000-0002-5934-1139¹,
Cheng Wang¹,
Xiaoliang Fan¹,
Yangbin Lin²,
Xuesheng Bian¹,
Shangbin Wu¹,
Ming Cheng¹ &
…
Jonathan Li³

1466 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Robust local cross-domain feature descriptors of 2D images and 3D point clouds play an important role in 2D and 3D vision applications, e.g. augmented Reality (AR) and robot navigation. Essentially, the robust local cross-domain feature descriptors have the potential to establish a spatial relationship between 2D space and 3D space. However, it is challenging for manual-based or traditional deep learning-based methods to represent the invariant cross-domain feature descriptors between 2D images and 3D point clouds. Specifically, the mainstream point cloud deep learning network is used to extract the global structure information of the scene. Due to the dimensional difference, there is a large gap between the two-dimensional picture and the three-dimensional structure feature in feature accommodation. In this paper, based on the 2D image patch and 3D point cloud volume dataset, a novel network, 2D3D-MVPNet, is proposed to jointly learn robust local cross-domain feature descriptors between 2D images and 3D point clouds. The 2D3D-MVPNet contains a point cloud branch and an image branch, which are optimized with triplet loss and a second-order similarity regularization. Specifically, for the point cloud branch, first, a novel point cloud feature descriptor extractor, named the image-based point cloud encoder, is introduced to learn a local 3D feature descriptor consistent with the local 2D feature descriptor, so that the local 3D feature descriptors contain both geometry and colour texture information. Second, to overcome the challenge of random order of projected image inputs, a symmetric function is introduced to deal with the feature combination of point cloud projections. Experiments show that the local cross-domain feature descriptors of 2D images and 3D point clouds learned by 2D3D-MVPNet achieve extraordinary 2D to 3D retrieval performance. In addition, several 3D point cloud registration results demonstrate the effectiveness of the image-based point cloud encoder.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Liu W, Wang C, Bian X, Chen S, Yu S, Lin X, Lai S-H, Weng D, Li J (2019) Learning to match ground camera image and uav 3-d model-rendered image based on siamese network with attention mechanism. IEEE Geosci Remote Sens Lett 17(9):1608–1612
Article Google Scholar
Li Y, Wang Z (2021) 3d reconstruction with single-shot structured light rgb line pattern. Sensors 21(14):4819
Article Google Scholar
Li Y, Wang Z (2020) Rgb line pattern-based stereo vision matching for single-shot 3-d measurement. IEEE Trans Instrum Meas 70:1–13
Google Scholar
Shuang YC, Wang ZZ (2021) Active stereo vision three-dimensional reconstruction by rgb dot pattern projection and ray intersection. Meas 167:108195
Article Google Scholar
Yi W u, Jiang X, Fang Z, Gao Y, Fujita H (2021) Multi-modal 3d object detection by 2d-guided precision anchor proposal and multi-layer fusion. Appl Soft Comput 108:107405
Article Google Scholar
Liu W, Lai B, Wang C, Cai G, Yanfei S u, Bian X, Li Y, Chen S, Li J (2020) Ground camera image and large-scale 3-d image-based point cloud registration based on learning domain invariant feature descriptors. IEEE J Sel Top Appl Earth Obs Remote Sens 14:997–1009
Article Google Scholar
Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3d point clouds. In: European conference on computer vision (ECCV), Springer, pp 15–29
Valgren C, Lilienthal AJ (2010) Sift, surf & seasons: Appearance-based long-term localization in outdoor environments. Robot Auton Syst 58(2):149–156
Article Google Scholar
Sattler T, Leibe B, Kobbelt L (2016) Efficient & effective prioritized matching for large-scale image-based localization. IEEE Trans Pattern Anal Mach Intell 39(9):1744–1756
Article Google Scholar
Feng M, Hu S, Ang MH, Lee GH (2019) 2d3d-matchnet: Learning to match keypoints across 2d image and 3d point cloud. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 4790–4796
Liu W, Lai B, Wang C, Bian X, Yang W, Xia Y, Lin X, Lai S-H, Weng D, Li J (2020) Learning to match 2d images and 3d lidar point clouds for outdoor augmented reality. In: 2020 IEEE Conference on virtual reality and 3d user interfaces abstracts and workshops (VRW), IEEE, pp 654–655
Pham Q-H, Uy MA, Hua B-S, Nguyen DT, Roig G, Yeung S-K (2020) Lcd: Learned cross-domain descriptors for 2d-3d matching. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), vol 34, pp 11856–11864
Qi CR, Hao S u, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 652–660
Xing X, Cai Y, Lu T, Cai S, Yang Y, Wen D (2018) 3dtnet: Learning local features using 2d and 3d cues. In: 2018 International conference on 3d vision (3DV), IEEE, pp 435–443
Zeng A, Song S, Nießner M, Fisher M, Xiao J, Funkhouser T (2017) 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1802–1811
Han X, Leung T, Jia Y, Sukthankar R, Berg AC (2015) Matchnet: Unifying feature and metric learning for patch-based matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3279–3286
Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P, Moreno-Noguer F (2015) Discriminative learning of deep convolutional feature point descriptors. In: Proceedings of the IEEE international conference on computer vision (ICCV) pp 118–126
Yang Tsun-Yi, Hsu Jo-Han, Lin Yen-Yu, Chuang Yung-Yu (2017) Deepcd: Learning deep complementary descriptors for patch representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3314–3322
Tian Y, Fan B, Fuchao W u (2017) L2-net: Deep learning of discriminative patch descriptor in euclidean space. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 661–669
Liu W, Shen X, Wang C, Zhang Z, Wen C, Li J (2018) H-net: neural network for cross-domain image patch matching. In: International joint conference on artificial intelligence (IJCAI), pp 856–863
Dong Y, Jiao W, Long T, Liu L, He G, Gong C, Guo Y (2019) Local deep descriptor for remote sensing image feature matching. Remote Sens 11(4):430
Article Google Scholar
Liu W, Wang C, Bian X, Chen S, Li W, Lin X, Li Y, Weng D, Lai S-H, Li J (2019) Ae-gan-net: Learning invariant feature descriptor to match ground camera images and a large-scale 3d image-based point cloud for outdoor augmented reality. Remote Sens 11(19):2243
Article Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 815–823
He K, Yan L u, Sclaroff S (2018) Local descriptors optimized for average precision. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 596–605
Keller M, Chen Z, Maffra F, Schmuck P, Chli M (2018) Learning deep descriptors with scale-aware triplet networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2762–2770
DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 224–236
Revaud J, Weinzaepfel P, Souza César D, Pion N, Csurka G, Cabon Y, Humenberger M (2019) R2d2: Repeatable and reliable detector and descriptor. CoRR, arXiv:abs/1906.06195
Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T (2019) D2-net: A trainable cnn for joint description and detection of local features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8092–8101
Luo Z, Zhou L, Bai X, Chen H, Zhang J, Yao Y, Li S, Fang T, Quan L (2020) Aslfeat: Learning local features of accurate shape and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6589–6598
Qi CR, Li Y i, Hao S u, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv Neural Inform Process Syst 30:5099–5108
Google Scholar
Jiang M, Wu Y, Zhao T, Zhao Z, Lu C (2018) Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv:1807.00652
Li Y, Rui B u, Sun M, Wei W u, Di X, Chen B (2018) Pointcnn: Convolution on x-transformed points. Adv Neural Inform Process Syst 31:820–830
Google Scholar
Gojcic Z, Zhou C, Wegner JD, Wieser A (2019) The perfect match: 3d point cloud matching with smoothed densities. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5545–5554
Deng H, Birdal T, Ilic S (2018) Ppf-foldnet: Unsupervised learning of rotation invariant 3d local descriptors. In: Proceedings of the European conference on computer vision (ECCV), pp 602–618
Choy C, Park J, Koltun V (2019) Fully convolutional geometric features. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 8958–8966
Yew ZJ, Lee GH (2018) 3dfeat-net: Weakly supervised local 3d features for point cloud registration. In: Proceedings of the European conference on computer vision (ECCV), pp 607–623
Bai X, Luo Z, Zhou L, Fu H, Quan L, Tai C-L (2020) D3feat: Joint learning of dense detection and description of 3d local features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6359–6367
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 945–953
Feng Y, Zhang Z, Zhao X, Ji R, Gao Y (2018) Gvcnn: Group-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 264–272
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1912–1920
Riegler G, Ulusoy AO, Geiger A (2017) Octnet: Learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3577–3586
Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4558–4567
Shi S, Guo C, Li J, Wang Z, Shi J, Wang X, Li H (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 10529–10538
Xiao A, Yang X, Lu S, Guan D, Huang J (2021) Fps-net: a convolutional fusion network for large-scale lidar point cloud segmentation. ISPRS J Photogramm Remote Sens 176:237–249
Article Google Scholar
Zhong Y u (2009) Intrinsic shape signatures: A shape descriptor for 3d object recognition. In: IEEE International conference on computer vision workshops, ICCV workshops, IEEE, pp 689–696
Huai Y u, Zhen W, Yang W, Ji Z, Scherer S (2020) Monocular camera localization in prior lidar maps with 2d-3d line correspondences. In: 2020 IEEE/RSJ International conference on intelligent robots and systems (IROS), IEEE, pp 4588–4594
Li J, Lee GH (2021) Deepi2p: Image-to-point cloud registration via deep classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 15960–15969
Cattaneo D, Vaghi M, Fontana S, Ballardini AL, Sorrenti DG (2020) Global visual localization in lidar-maps through shared 2d-3d embedding space. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 4365–4371
Mishchuk A, Mishkin D, Radenovic F, Matas J (2017) Working hard to know your neighbor’s margins: Local descriptor learning loss. In: Advances in neural information processing systems, pp 4826–4837

Download references

Acknowledgements

This work is supported in part by China Postdoctoral Science Foundation (No.2021M690094), in part by National Natural Science Foundation of China (Nos. 61971363, U1605254, 61872306, 61701191, 41871380), in part by Natural Science Fund of Fujian Province (No. 2018J05108), in part by Xia-men Science and Technology Bureau (No. 3502Z20193017) and in part by the China Fundamental Research Funds for the Central Universities (No.20720210074). And we also thank Associate professor Yu Zang from the School of Informatics, Xiamen University, he helped us reorganize the logical relationship and language of this paper during rebuttal progress.

Author information

Authors and Affiliations

Fujian Key Laboratory of Sensing and Computing for Smart Cities, School of Informatics, Xiamen University, Xiamen, 361005, China
Baiqi Lai, Weiquan Liu, Cheng Wang, Xiaoliang Fan, Xuesheng Bian, Shangbin Wu & Ming Cheng
Computer Engineering College, Jimei University, Xiamen, 361021, China
Yangbin Lin
Departments of Geography and Environmental Management and Systems Design Engineering, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Jonathan Li

Authors

Baiqi Lai
View author publications
You can also search for this author in PubMed Google Scholar
Weiquan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoliang Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yangbin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xuesheng Bian
View author publications
You can also search for this author in PubMed Google Scholar
Shangbin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ming Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiquan Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, B., Liu, W., Wang, C. et al. 2D3D-MVPNet: Learning cross-domain feature descriptors for 2D-3D matching based on multi-view projections of point clouds. Appl Intell 52, 14178–14193 (2022). https://doi.org/10.1007/s10489-022-03372-z

Download citation

Accepted: 09 February 2022
Published: 03 March 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10489-022-03372-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

2D3D-MVPNet: Learning cross-domain feature descriptors for 2D-3D matching based on multi-view projections of point clouds

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

2D3D-MVPNet: Learning cross-domain feature descriptors for 2D-3D matching based on multi-view projections of point clouds

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation