Skip to main content

Learning Cross-Domain Descriptors for 2D-3D Matching with Hard Triplet Loss and Spatial Transformer Network

  • Conference paper
  • First Online:
  • 2558 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12890))

Abstract

The 2D-3D matching determine the spatial relationship between 2D and 3D space, which can be used for Augmented Reality (AR) and robot pose estimation, and provides support for multi-sensor fusion. Specifically, the cross-domain descriptor extraction between 2D images and 3D point clouds is a solution to achieve 2D-3D matching. Essentially, the 3D point cloud volumes and 2D image patches can be sampled based on the keypoints of 3D point clouds and 2D images, which are used to learn the cross-domain descriptors for 2D-3D matching. However, it is difficult to achieve 2D-3D matching by using handcrafted descriptors; meanwhile, the cross-domain descriptors based on learning is vulnerable to translation, scale, rotation of cross-domain data. In this paper, we propose a novel network, HAS-Net, for learning cross-domain descriptors to achieve 2D image patch and 3D point cloud volume matching. The HAS-Net introduces the spatial transformer network (STN) to overcome the translation, scale, rotation and more generic warping of 2D image patches. In addition, the HAS-Net uses the negative sample sampling strategy of hard triplet loss to solve the uncertainty of randomly sampling negative samples during training, thereby improving the ability to distinguish hardest samples. Experiments demonstrate the superiority of HAS-Net on the 2D-3D retrieval and matching. To demonstrate the robustness of the learned descriptors, the 3D descriptors of cross-domain descriptors learned by HAS-Net are applied in 3D global registration.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Liu, W., Wang, C., Bian, X., et al.: Learning to match ground camera image and uav 3-d model-rendered image based on siamese network with attention mechanism. IEEE Geosci. Remote Sens. Lett. 17(9), 1608–1612 (2019)

    Article  Google Scholar 

  2. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)

    Article  Google Scholar 

  3. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32

    Chapter  Google Scholar 

  4. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56

    Chapter  Google Scholar 

  5. Tola, E., Lepetit, V., Fua, P.: Daisy: An efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intell. 32(5), 815–830 (2009)

    Article  Google Scholar 

  6. Simo-Serra, E., Trulls, E., Ferraz, L., et al.: Discriminative learning of deep convolutional feature point descriptors. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 118–126 (2015)

    Google Scholar 

  7. Liu, W., Shen, X., Wang, C., et al.: H-Net: neural network for cross-domain image patch matching. In: IJCAI, pp. 856–853 (2018)

    Google Scholar 

  8. Tian, Y., Fan, B., Wu, F.: L2-net: deep learning of discriminative patch descriptor in euclidean space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 661–669 (2017)

    Google Scholar 

  9. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)

    Google Scholar 

  10. He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for average precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 596–605 (2018)

    Google Scholar 

  11. Keller, M., Chen, Z., Maffra, F., et al.: Learning deep descriptors with scale-aware triplet networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2762–2770 (2018)

    Google Scholar 

  12. Rusu, R.B., Blodow, N., Marton, Z. C., et al.: Aligning point cloud views using persistent feature histograms. In: EEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391 (2008)

    Google Scholar 

  13. Rusu, R.B., Marton, Z.C., Blodow, N., et al.: Learning informative point classes for the acquisition of object model maps. In: 2008 10th International Conference on Control, Automation, Robotics and Vision, pp. 643–650 (2008)

    Google Scholar 

  14. Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In: 2009 IEEE International Conference on Robotics and Automation, pp. 3212–3217 (2009)

    Google Scholar 

  15. Guo, Y., Sohel, F., Bennamoun, M., et al.: Rotational projection statistics for 3d local surface description and object recognition. Int. J. Comput. Vision 105(1), 63–86 (2013)

    Article  MathSciNet  Google Scholar 

  16. Qi, C.R., Su, H., Mo, K., et al.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 652–660 (2017)

    Google Scholar 

  17. Qi, C.R., Yi, L., Su, H., et al.: Guibas: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

    Google Scholar 

  18. Jiang, M., Wu, Y., Zhao, T., et al.: Pointsift: A sift like network module for 3d point cloud semantic segmentation. arXiv preprint arXiv:1807.00652 (2018)

  19. Li, Y., Bu, R., Sun, M., et al.: Pointcnn: convolution on x-transformed points. In: Advances in Neural Information Processing Systems, pp. 820–830 (2018)

    Google Scholar 

  20. Liu, W., Lai, B., Wang, C., et al.: Ground camera image and large-scale 3D image-based point cloud registration based on learning domain invariant feature descriptors. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 14, 997–1009 (2021)

    Article  Google Scholar 

  21. Feng, M., Hu, S., Ang, M.H., et al.: 2d3d-matchnet: Learning to match keypoints across 2d image and 3d point cloud. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 4790–4796 (2019)

    Google Scholar 

  22. Liu, W., Lai, B., Wang, C., et al.: Learning to match 2d images and 3d lidar point clouds for outdoor augmented reality. In: 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (IEEE VR), pp. 655–656 (2020)

    Google Scholar 

  23. Pham, Q.-H., Uy, M.A., Hua, B.-S., et al.: LCd: learned cross-domain descriptors for 2d–3d matching. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 11 856–11 864 (2020)

    Google Scholar 

  24. Zeng, A., Song, S., Nießner, M., et al.: 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1802–1811 (2017)

    Google Scholar 

  25. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp. 2017–2025 (2015)

    Google Scholar 

  26. Mishchuk, A., Mishkin, D., Radenovic, F., et al.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 4826–4837 (2017)

    Google Scholar 

Download references

Acknowledgements

This work was funded by China Postdoctoral Science Foundation (No. 2021M690094).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiquan Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lai, B. et al. (2021). Learning Cross-Domain Descriptors for 2D-3D Matching with Hard Triplet Loss and Spatial Transformer Network. In: Peng, Y., Hu, SM., Gabbouj, M., Zhou, K., Elad, M., Xu, K. (eds) Image and Graphics. ICIG 2021. Lecture Notes in Computer Science(), vol 12890. Springer, Cham. https://doi.org/10.1007/978-3-030-87361-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87361-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87360-8

  • Online ISBN: 978-3-030-87361-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics