Learning Cross-Domain Descriptors for 2D-3D Matching with Hard Triplet Loss and Spatial Transformer Network

Lai, Baiqi; Liu, Weiquan; Wang, Cheng; Bian, Xuesheng; Su, Yanfei; Lin, Xiuhong; Yuan, Zhimin; Shen, Siqi; Cheng, Ming

doi:10.1007/978-3-030-87361-5_2

Learning Cross-Domain Descriptors for 2D-3D Matching with Hard Triplet Loss and Spatial Transformer Network

Baiqi Lai¹⁴,
Weiquan Liu ORCID: orcid.org/0000-0002-5934-1139¹⁴,
Cheng Wang ORCID: orcid.org/0000-0001-6075-796X¹⁴,
Xuesheng Bian¹⁴,
Yanfei Su¹⁴,
Xiuhong Lin¹⁴,
Zhimin Yuan¹⁴,
Siqi Shen¹⁴ &
…
Ming Cheng¹⁴

Conference paper
First Online: 30 September 2021

2558 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12890))

Abstract

The 2D-3D matching determine the spatial relationship between 2D and 3D space, which can be used for Augmented Reality (AR) and robot pose estimation, and provides support for multi-sensor fusion. Specifically, the cross-domain descriptor extraction between 2D images and 3D point clouds is a solution to achieve 2D-3D matching. Essentially, the 3D point cloud volumes and 2D image patches can be sampled based on the keypoints of 3D point clouds and 2D images, which are used to learn the cross-domain descriptors for 2D-3D matching. However, it is difficult to achieve 2D-3D matching by using handcrafted descriptors; meanwhile, the cross-domain descriptors based on learning is vulnerable to translation, scale, rotation of cross-domain data. In this paper, we propose a novel network, HAS-Net, for learning cross-domain descriptors to achieve 2D image patch and 3D point cloud volume matching. The HAS-Net introduces the spatial transformer network (STN) to overcome the translation, scale, rotation and more generic warping of 2D image patches. In addition, the HAS-Net uses the negative sample sampling strategy of hard triplet loss to solve the uncertainty of randomly sampling negative samples during training, thereby improving the ability to distinguish hardest samples. Experiments demonstrate the superiority of HAS-Net on the 2D-3D retrieval and matching. To demonstrate the robustness of the learned descriptors, the 3D descriptors of cross-domain descriptors learned by HAS-Net are applied in 3D global registration.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Liu, W., Wang, C., Bian, X., et al.: Learning to match ground camera image and uav 3-d model-rendered image based on siamese network with attention mechanism. IEEE Geosci. Remote Sens. Lett. 17(9), 1608–1612 (2019)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Article Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Chapter Google Scholar
Tola, E., Lepetit, V., Fua, P.: Daisy: An efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intell. 32(5), 815–830 (2009)
Article Google Scholar
Simo-Serra, E., Trulls, E., Ferraz, L., et al.: Discriminative learning of deep convolutional feature point descriptors. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 118–126 (2015)
Google Scholar
Liu, W., Shen, X., Wang, C., et al.: H-Net: neural network for cross-domain image patch matching. In: IJCAI, pp. 856–853 (2018)
Google Scholar
Tian, Y., Fan, B., Wu, F.: L2-net: deep learning of discriminative patch descriptor in euclidean space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 661–669 (2017)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for average precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 596–605 (2018)
Google Scholar
Keller, M., Chen, Z., Maffra, F., et al.: Learning deep descriptors with scale-aware triplet networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2762–2770 (2018)
Google Scholar
Rusu, R.B., Blodow, N., Marton, Z. C., et al.: Aligning point cloud views using persistent feature histograms. In: EEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391 (2008)
Google Scholar
Rusu, R.B., Marton, Z.C., Blodow, N., et al.: Learning informative point classes for the acquisition of object model maps. In: 2008 10th International Conference on Control, Automation, Robotics and Vision, pp. 643–650 (2008)
Google Scholar
Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In: 2009 IEEE International Conference on Robotics and Automation, pp. 3212–3217 (2009)
Google Scholar
Guo, Y., Sohel, F., Bennamoun, M., et al.: Rotational projection statistics for 3d local surface description and object recognition. Int. J. Comput. Vision 105(1), 63–86 (2013)
Article MathSciNet Google Scholar
Qi, C.R., Su, H., Mo, K., et al.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., et al.: Guibas: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Google Scholar
Jiang, M., Wu, Y., Zhao, T., et al.: Pointsift: A sift like network module for 3d point cloud semantic segmentation. arXiv preprint arXiv:1807.00652 (2018)
Li, Y., Bu, R., Sun, M., et al.: Pointcnn: convolution on x-transformed points. In: Advances in Neural Information Processing Systems, pp. 820–830 (2018)
Google Scholar
Liu, W., Lai, B., Wang, C., et al.: Ground camera image and large-scale 3D image-based point cloud registration based on learning domain invariant feature descriptors. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 14, 997–1009 (2021)
Article Google Scholar
Feng, M., Hu, S., Ang, M.H., et al.: 2d3d-matchnet: Learning to match keypoints across 2d image and 3d point cloud. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 4790–4796 (2019)
Google Scholar
Liu, W., Lai, B., Wang, C., et al.: Learning to match 2d images and 3d lidar point clouds for outdoor augmented reality. In: 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (IEEE VR), pp. 655–656 (2020)
Google Scholar
Pham, Q.-H., Uy, M.A., Hua, B.-S., et al.: LCd: learned cross-domain descriptors for 2d–3d matching. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 11 856–11 864 (2020)
Google Scholar
Zeng, A., Song, S., Nießner, M., et al.: 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1802–1811 (2017)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp. 2017–2025 (2015)
Google Scholar
Mishchuk, A., Mishkin, D., Radenovic, F., et al.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 4826–4837 (2017)
Google Scholar

Download references

Acknowledgements

This work was funded by China Postdoctoral Science Foundation (No. 2021M690094).

Author information

Authors and Affiliations

Fujian Key Laboratory of Sensing and Computing for Smart Cities, School of Informatics, Xiamen University, Xiamen, 361005, China
Baiqi Lai, Weiquan Liu, Cheng Wang, Xuesheng Bian, Yanfei Su, Xiuhong Lin, Zhimin Yuan, Siqi Shen & Ming Cheng

Authors

Baiqi Lai
View author publications
You can also search for this author in PubMed Google Scholar
Weiquan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuesheng Bian
View author publications
You can also search for this author in PubMed Google Scholar
Yanfei Su
View author publications
You can also search for this author in PubMed Google Scholar
Xiuhong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Zhimin Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Siqi Shen
View author publications
You can also search for this author in PubMed Google Scholar
Ming Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiquan Liu .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Yuxin Peng
Tsinghua University, Beijing, China
Shi-Min Hu
Tampere University, Tampere, Finland
Moncef Gabbouj
Zhejiang University, Hangzhou, China
Kun Zhou
Technion – Israel Institute of Technology, Haifa, Israel
Michael Elad
Tsinghua University, Beijing, China
Kun Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lai, B. et al. (2021). Learning Cross-Domain Descriptors for 2D-3D Matching with Hard Triplet Loss and Spatial Transformer Network. In: Peng, Y., Hu, SM., Gabbouj, M., Zhou, K., Elad, M., Xu, K. (eds) Image and Graphics. ICIG 2021. Lecture Notes in Computer Science(), vol 12890. Springer, Cham. https://doi.org/10.1007/978-3-030-87361-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-87361-5_2
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87360-8
Online ISBN: 978-3-030-87361-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics