3DTDesc: learning local features using 2D and 3D cues

Xing, Xiaoxia; Cai, Yinghao; Lu, Tao; Yang, Yiping; Wen, Dayong

doi:10.1007/s00138-021-01176-8

3DTDesc: learning local features using 2D and 3D cues

Original Paper
Published: 03 March 2021

Volume 32, article number 53, (2021)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Xiaoxia Xing ORCID: orcid.org/0000-0002-1680-093X^1,2,
Yinghao Cai¹,
Tao Lu³,
Yiping Yang¹ &
…
Dayong Wen¹

327 Accesses
1 Citation
Explore all metrics

A Correction to this article was published on 09 April 2021

This article has been updated

Abstract

Pairwise frame registration with sparse geometric local features on real-world depth images is not particularly robust due to the low resolution and incomplete nature of the 3D scan data. Moreover, there might be many regions with similar geometric information. In this paper, we present 3DTDesc, a data-driven descriptor which closely combines both 2D texture and 3D geometric information for frame registration. The proposed descriptor is learned directly from color point clouds, which is time-efficient and provides robust and accurate geometric feature matching in a variety of settings. The texture information and the geometric information closely interact in the fusing network, which are complements of each other in situations of textureless regions or regions with similar geometric information and different texture information. We also propose a multi-scale 3DTDesc to further improve the performance of the feature matching. The effectiveness and efficiency of our proposed 3DTDesc are demonstrated by extensive experimental results on challenging RGB-D datasets and various ablation studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization

GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

S2DNet: Learning Image Features for Accurate Sparse-to-Dense Matching

Change history

09 April 2021
A Correction to this paper has been published: https://doi.org/10.1007/s00138-021-01200-x

References

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), pp. 886–893. IEEE, (2005)
Rublee, E., Rabaud, V.: Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571, (2011)
Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P. Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: IEEE International Conference on Computer Vision, pp. 118–126, (2015)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: Lift: Learned invariant feature transform. In: European Conference on Computer Vision, pp. 467–483. Springer, (2016)
Tian, Y., Fan, B., Wu, F.: L2-net: deep learning of discriminative patch descriptor in euclidean space. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6128–6136, (2017)
Xiao, J., Owens, A., Torralba, A.: Sun3d: a database of big spaces reconstructed using sfm and object labels. In: IEEE International Conference on Computer Vision, pp. 1625–1632, (2013)
Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In IEEE International Conference on Robotics and Automation, pp. 1848–1853, (2009)
Salti, S., Tombari, F., DiStefano, L.: Shot: unique signatures of histograms for surface and texture description. Comput. Vis. Image Underst. 125, 251–264 (2014)
Article Google Scholar
Johnson, A.E., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 433–449 (1999)
Article Google Scholar
Charles, R.Q., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 77–85, (2017)
Khoury, M., Zhou, Q.Y., Koltun, V.: Learning compact geometric features. In: IEEE International Conference on Computer Vision, pp. 153–161, (2017)
Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3dmatch: learning local geometric descriptors from rgb-d reconstructions. In: IEEE Conference on Computer Vision and Pattern Recognition, (2017)
Tombari, F., Salti, S., Stefano, L.D.: A combined texture-shape descriptor for enhanced 3d feature matching. In: IEEE International Conference on Image Processing, pp. 809–812, (2011)
Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, pages 918–927, (2018)
Bay, H., Tuytelaars, T., Gool, L.V.: Surf: speeded up robust features. In: European Conference on Computer Vision, pp. 404–417, (2006)
Hadsell, R., Chopra, S., Lecun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pp. 1735–1742, (2006)
Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: European Conference on Computer Vision, pp. 224–237, (2004)
Rusu, R.B., Blodow, N., Marton, Z.C., and Beetz, M.: Aligning point cloud views using persistent feature histograms. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391, (2008)
Zaharescu, A., Boyer, E., Varanasi, K., Horaud, R.: Surface feature detection and description with applications to mesh matching. IEEE Conf. Comput. Vis. Pattern Recognit. 58(6), 373–380 (2009)
Google Scholar
Tombari, F., Salti, S., Stefano, L.D.: A combined texture-shape descriptor for enhanced 3d feature matching. In: 2011 18th IEEE International Conference on Image Processing, pp. 809–812, (2011)
Alexandre, L.A.: 3d descriptors for object and category recognition: a comparative evaluation. Presented at the (2012)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920, (2015)
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90, (2016)
Maturana, D., Scherer, S.: Voxnet: A 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE, (2015)
Bromley, J., Guyon, I., Lecun, Y., Shah , R.: Signature verification using a ”siamese” time delay neural network. In: International Conference on Neural Information Processing Systems, pp. 737–744, (1993)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108, (2017)
Kazhdan, M., Funkhouser, T., Rusinkiewicz S.: Rotation invariant spherical harmonic representation of 3d shape descriptors. In: Proceedings of Eurographics/ACM Siggraph Symposiumon Geometry Processing, 43(2), pp. 156–164, (2003)
Deng, H., Birdal, T., Ilic, S.: Ppfnet: Global context aware local features for robust 3d point matching. In: IEEE Conference on Computer Vision and Pattern Recognition, (2018)
Xie, J., Hsu, Y.F., Feris, R.S., Sun, M.T.: Fine registration of 3d point clouds fusing structural and photometric information using an rgb-d camera. J. Vis. Commun. Image Represent. 32(C), 194–204 (2015)
Article Google Scholar
Villota, J.C.P., Silva, F.L.D., Jacomini, R.S., Costa, A.H.R.: Pairwise registration in indoor environments using adaptive combination of 2d and 3d cues. In: Image and Vision Computing, (2017)
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Article Google Scholar
Kim, H., Hilton, A.: Influence of colour and feature geometry on multi-modal 3d point clouds data registration. In: International Conference on 3d Vision, pp. 202–209, (2014)
Xing, X., Cai, Y., Lu, T., Cai, S., Yang, Y., Wen, D.: 3dtnet: Learning local features using 2d and 3d cues. In: 2018 International Conference on 3D Vision (3DV), pp. 435–443. IEEE, (2018)
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in rgb-d images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2930–2937, (2013)
Lai, K., Bo, L., Fox, D., Unsupervised feature learning for 3d scene labeling. In: IEEE International Conference on Robotics and Automation, pp. 3050–3057, (2014)
Halber, M., Funkhouser, T.: Fine-to-coarse global registration of rgb-d scans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1755–1764, (2017)
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. In: arXiv preprint arXiv:1603.04467,(2016)
Vedaldi, A., Fulkerson, B.: Vlfeat: An open and portable library of computer vision algorithms. In: Proceedings of the 18th International Conference on Multimedea, (2010)
Brown, M., Hua, G., Winder, S.: Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 43–57 (2011)
Article Google Scholar
Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: Unifying feature and metric learning for patch-based matching. In: Computer Vision and Pattern Recognition, pp. 3279–3286, (2015)
Choi, S., Zhou, Q.Y., Koltun, V.: Robust reconstruction of indoor scenes. In: Computer Vision and Pattern Recognition, pp. 5556–5565, (2015)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China #U1913201.

Author information

Authors and Affiliations

Institute of Automation, University of Chinese Academy of Sciences, No. 95 Zhongguancun East Road, Haidian District, Beijing, 100190, China
Xiaoxia Xing, Yinghao Cai, Yiping Yang & Dayong Wen
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Xiaoxia Xing
Institute of Automation, University of Chinese Academy of Sciences, No. 95 Zhongguancun East Road, Haidian District, Beijing, 100190, China
Tao Lu

Authors

Xiaoxia Xing
View author publications
You can also search for this author in PubMed Google Scholar
Yinghao Cai
View author publications
You can also search for this author in PubMed Google Scholar
Tao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yiping Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dayong Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoxia Xing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xing, X., Cai, Y., Lu, T. et al. 3DTDesc: learning local features using 2D and 3D cues. Machine Vision and Applications 32, 53 (2021). https://doi.org/10.1007/s00138-021-01176-8

Download citation

Received: 22 July 2019
Revised: 15 November 2020
Accepted: 03 February 2021
Published: 03 March 2021
DOI: https://doi.org/10.1007/s00138-021-01176-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3DTDesc: learning local features using 2D and 3D cues

Abstract

Access this article

Similar content being viewed by others

DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization

GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

S2DNet: Learning Image Features for Accurate Sparse-to-Dense Matching

Change history

09 April 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3DTDesc: learning local features using 2D and 3D cues

Abstract

Access this article

Similar content being viewed by others

DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization

GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

S2DNet: Learning Image Features for Accurate Sparse-to-Dense Matching

Change history

09 April 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation