Skip to main content
Log in

3DTDesc: learning local features using 2D and 3D cues

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

A Correction to this article was published on 09 April 2021

This article has been updated

Abstract

Pairwise frame registration with sparse geometric local features on real-world depth images is not particularly robust due to the low resolution and incomplete nature of the 3D scan data. Moreover, there might be many regions with similar geometric information. In this paper, we present 3DTDesc, a data-driven descriptor which closely combines both 2D texture and 3D geometric information for frame registration. The proposed descriptor is learned directly from color point clouds, which is time-efficient and provides robust and accurate geometric feature matching in a variety of settings. The texture information and the geometric information closely interact in the fusing network, which are complements of each other in situations of textureless regions or regions with similar geometric information and different texture information. We also propose a multi-scale 3DTDesc to further improve the performance of the feature matching. The effectiveness and efficiency of our proposed 3DTDesc are demonstrated by extensive experimental results on challenging RGB-D datasets and various ablation studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Change history

References

  1. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  2. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), pp. 886–893. IEEE, (2005)

  3. Rublee, E., Rabaud, V.: Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571, (2011)

  4. Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P. Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: IEEE International Conference on Computer Vision, pp. 118–126, (2015)

  5. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: Lift: Learned invariant feature transform. In: European Conference on Computer Vision, pp. 467–483. Springer, (2016)

  6. Tian, Y., Fan, B., Wu, F.: L2-net: deep learning of discriminative patch descriptor in euclidean space. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6128–6136, (2017)

  7. Xiao, J., Owens, A., Torralba, A.: Sun3d: a database of big spaces reconstructed using sfm and object labels. In: IEEE International Conference on Computer Vision, pp. 1625–1632, (2013)

  8. Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In IEEE International Conference on Robotics and Automation, pp. 1848–1853, (2009)

  9. Salti, S., Tombari, F., DiStefano, L.: Shot: unique signatures of histograms for surface and texture description. Comput. Vis. Image Underst. 125, 251–264 (2014)

    Article  Google Scholar 

  10. Johnson, A.E., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 433–449 (1999)

    Article  Google Scholar 

  11. Charles, R.Q., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 77–85, (2017)

  12. Khoury, M., Zhou, Q.Y., Koltun, V.: Learning compact geometric features. In: IEEE International Conference on Computer Vision, pp. 153–161, (2017)

  13. Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3dmatch: learning local geometric descriptors from rgb-d reconstructions. In: IEEE Conference on Computer Vision and Pattern Recognition, (2017)

  14. Tombari, F., Salti, S., Stefano, L.D.: A combined texture-shape descriptor for enhanced 3d feature matching. In: IEEE International Conference on Image Processing, pp. 809–812, (2011)

  15. Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, pages 918–927, (2018)

  16. Bay, H., Tuytelaars, T., Gool, L.V.: Surf: speeded up robust features. In: European Conference on Computer Vision, pp. 404–417, (2006)

  17. Hadsell, R., Chopra, S., Lecun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pp. 1735–1742, (2006)

  18. Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: European Conference on Computer Vision, pp. 224–237, (2004)

  19. Rusu, R.B., Blodow, N., Marton, Z.C., and Beetz, M.: Aligning point cloud views using persistent feature histograms. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391, (2008)

  20. Zaharescu, A., Boyer, E., Varanasi, K., Horaud, R.: Surface feature detection and description with applications to mesh matching. IEEE Conf. Comput. Vis. Pattern Recognit. 58(6), 373–380 (2009)

    Google Scholar 

  21. Tombari, F., Salti, S., Stefano, L.D.: A combined texture-shape descriptor for enhanced 3d feature matching. In: 2011 18th IEEE International Conference on Image Processing, pp. 809–812, (2011)

  22. Alexandre, L.A.: 3d descriptors for object and category recognition: a comparative evaluation. Presented at the (2012)

  23. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920, (2015)

  24. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90, (2016)

  25. Maturana, D., Scherer, S.: Voxnet: A 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE, (2015)

  26. Bromley, J., Guyon, I., Lecun, Y., Shah , R.: Signature verification using a ”siamese” time delay neural network. In: International Conference on Neural Information Processing Systems, pp. 737–744, (1993)

  27. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108, (2017)

  28. Kazhdan, M., Funkhouser, T., Rusinkiewicz S.: Rotation invariant spherical harmonic representation of 3d shape descriptors. In: Proceedings of Eurographics/ACM Siggraph Symposiumon Geometry Processing, 43(2), pp. 156–164, (2003)

  29. Deng, H., Birdal, T., Ilic, S.: Ppfnet: Global context aware local features for robust 3d point matching. In: IEEE Conference on Computer Vision and Pattern Recognition, (2018)

  30. Xie, J., Hsu, Y.F., Feris, R.S., Sun, M.T.: Fine registration of 3d point clouds fusing structural and photometric information using an rgb-d camera. J. Vis. Commun. Image Represent. 32(C), 194–204 (2015)

    Article  Google Scholar 

  31. Villota, J.C.P., Silva, F.L.D., Jacomini, R.S., Costa, A.H.R.: Pairwise registration in indoor environments using adaptive combination of 2d and 3d cues. In: Image and Vision Computing, (2017)

  32. Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)

    Article  Google Scholar 

  33. Kim, H., Hilton, A.: Influence of colour and feature geometry on multi-modal 3d point clouds data registration. In: International Conference on 3d Vision, pp. 202–209, (2014)

  34. Xing, X., Cai, Y., Lu, T., Cai, S., Yang, Y., Wen, D.: 3dtnet: Learning local features using 2d and 3d cues. In: 2018 International Conference on 3D Vision (3DV), pp. 435–443. IEEE, (2018)

  35. Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in rgb-d images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2930–2937, (2013)

  36. Lai, K., Bo, L., Fox, D., Unsupervised feature learning for 3d scene labeling. In: IEEE International Conference on Robotics and Automation, pp. 3050–3057, (2014)

  37. Halber, M., Funkhouser, T.: Fine-to-coarse global registration of rgb-d scans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1755–1764, (2017)

  38. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. In: arXiv preprint arXiv:1603.04467,(2016)

  39. Vedaldi, A., Fulkerson, B.: Vlfeat: An open and portable library of computer vision algorithms. In: Proceedings of the 18th International Conference on Multimedea, (2010)

  40. Brown, M., Hua, G., Winder, S.: Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 43–57 (2011)

    Article  Google Scholar 

  41. Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: Unifying feature and metric learning for patch-based matching. In: Computer Vision and Pattern Recognition, pp. 3279–3286, (2015)

  42. Choi, S., Zhou, Q.Y., Koltun, V.: Robust reconstruction of indoor scenes. In: Computer Vision and Pattern Recognition, pp. 5556–5565, (2015)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China #U1913201.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoxia Xing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xing, X., Cai, Y., Lu, T. et al. 3DTDesc: learning local features using 2D and 3D cues. Machine Vision and Applications 32, 53 (2021). https://doi.org/10.1007/s00138-021-01176-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01176-8

Keywords

Navigation