Abstract
Recognizing 3D objects based on local feature descriptors, in point cloud scenes with occlusion and clutter, is a very challenging task. Most existing 3D local feature descriptors rely on normal information to encode local features, however, they ignore the normal-sign-ambiguity issue, which greatly limits their descriptiveness and robustness. This paper proposes a method called VOxelization in Invariant Distance space for 3D object recognition. First, we propose a VOID descriptor that is invariant to normal-sign-ambiguity, and is also rotation-invariant, distinctive, robust, and efficient. Second, a VOID-based 3D object recognition method considering the self-similarity between local features is proposed to enhance the recognition performance. Five standard datasets are employed to validate our proposed method as well as comparison with the state-of-the-arts. The results suggest that: (1) VOID descriptor is invariant to normal-sign-ambiguity, distinctive, and robust; (2) VOID-based 3D object recognition achieves outstanding recognition performance, i.e., 99.47%, 93.07% and 99.18%, on the U3OR, Queen’s and Ca’ Foscari Venezia datasets, respectively.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Liu, Z., et al.: A feature-preserving framework for point cloud denoising. Comput. Aided Des. 127, 102857 (2020). https://doi.org/10.1016/j.cad.2020.102857
Que, Z., Lu, G., Xu, D.: VoxelContext-net: an octree based framework for point cloud compression. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6038–6047. IEEE (2021) https://doi.org/10.1109/CVPR46437.2021.00598
Fu, C., Li, G., Song, R., Gao, W., Liu, S.: OctAttention: octree-based large-scale contexts model for point cloud compression. arXiv preprint arXiv:2202.06028, (2022) https://doi.org/10.48550/arXiv.2202.06028
Bayramoglu, N., Alatan, A.A.: Shape index SIFT: range image recognition using local features. In: Proc. International Conference on Pattern Recognition, pp. 352–355. IEEE (2010) https://doi.org/10.1109/ICPR.2010.95
Funkhouser, T., et al.: A search engine for 3D models. ACM Trans. Graph. 22(1), 83–105 (2003). https://doi.org/10.1145/588272.588279
Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. 21(4), 807–832 (2002). https://doi.org/10.1145/571647.571648
Paquet, E., Rioux, M., Murching, A., Naveen, T., Tabatabai, A.: Description of shape information for 2-D and 3-D objects. Signal Process. Image Commun. 16(1–2), 103–122 (2000). https://doi.org/10.1016/S0923-5965(00)00020-5
Rusu, R.B., Bradski, G., Thibaux, R., Hsu, J.: Fast 3d recognition and pose using the viewpoint feature histogram. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2155–2162. IEEE (2010) https://doi.org/10.1109/IROS.2010.5651280
Petrelli, A., Di Stefano, L.: On the repeatability of the local reference frame for partial shape matching. In: Proc. IEEE/CVF International Conference on Computer Vision, pp. 2244–2251. IEEE (2011) https://doi.org/10.1109/ICCV.2011.6126503
Guo, Y., Sohel, F., Bennamoun, M., Lu, M., Wan, J.: Rotational projection statistics for 3D local surface description and object recognition. Int. J. Comput. Vis. 105(1), 63–86 (2013). https://doi.org/10.1007/s11263-013-0627-y
Tombari, F., Salti, S., Di Stefano, L., Unique signatures of histograms for local surface description. In: Proc. European Conference on Computer Vision, pp. 356–369. Springer (2010) https://doi.org/10.1007/978-3-642-15558-1_26
Taati, B., Greenspan, M.: Local shape descriptor selection for object recognition in range data. Comput. Vis. Image Underst. 115(5), 681–694 (2011). https://doi.org/10.1109/IEMBS.2011.6090506
Bariya, P., Nishino, K.: Scale-hierarchical 3d object recognition in cluttered scenes. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1657–1664. IEEE (2010) https://doi.org/10.1109/CVPR.2010.5539774
Malassiotis, S., Strintzis, M.G.: Snapshots: a novel local surface descriptor and matching algorithm for robust 3D surface alignment. IEEE Trans. Pattern Anal. Mach. Intell. 29(7), 1285–1290 (2007). https://doi.org/10.1109/TPAMI.2007.1060
Johnson, A.E., Hebert, M.: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 433–449 (1999). https://doi.org/10.1109/34.765655
Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: Proc. IEEE International Conference on Robotics and Automation, pp. 3212–3217. IEEE (2009) https://doi.org/10.1109/ROBOT.2009.5152473
Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J.: 3D object recognition in cluttered scenes with local surface features: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2270–2287 (2014). https://doi.org/10.1109/TPAMI.2014.2316828
Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3dmatch: learning local geometric descriptors from rgb-d reconstructions. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1802–1811. (2017) https://doi.org/10.1109/CVPR.2017.29
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 652–660. (2017) https://doi.org/10.1109/CVPR.2017.16
Yang, J., Zhao, C., Xian, K., Zhu, A., Cao, Z.: Learning to fuse local geometric features for 3D rigid data matching. Inf. Fusion 61, 24–35 (2020). https://doi.org/10.1016/j.inffus.2020.03.008
Berkmann, J., Caelli, T.: Computation of surface geometry and segmentation using covariance techniques. IEEE Trans. Pattern Anal. Mach. Intell. 16(11), 1114–1116 (1994). https://doi.org/10.1109/34.334391
Novatnack, J., Nishino, K.: Scale-dependent/invariant local 3D shape descriptors for fully automatic registration of multiple sets of range images. In: Proc. European Conference on Computer Vision, pp. 440–453. Springer (2008) https://doi.org/10.1007/978-3-540-88690-7_33
Yang, J., Zhang, Q., Xian, K., Xiao, Y., Cao, Z.: Rotational contour signatures for both real-valued and binary feature representations of 3D local shape. Comput. Vis. Image Underst. 160, 133–147 (2017). https://doi.org/10.1016/j.cviu.2017.02.004
Yang, J., Zhang, Q., Xiao, Y., Cao, Z.: TOLDI: an effective and robust approach for 3D local shape description. Pattern Recogn. 65, 175–187 (2017). https://doi.org/10.1016/j.patcog.2016.11.019
Tao, W., Hua, X., Yu, K., Chen, X., Zhao, B.: A pipeline for 3-D object recognition based on local shape description in cluttered scenes. Proc. IEEE Trans. Geosci. Remote Sens. 59(1), 801–816 (2020). https://doi.org/10.1109/TGRS.2020.2998683
Zhou, W., Ma, C., Yao, T., Chang, P., Zhang, Q., Kuijper, A.: Histograms of Gaussian normal distribution for 3D feature matching in cluttered scenes. Vis. Comput. 35(4), 489–505 (2019). https://doi.org/10.1007/s00371-018-1478-x
Yang, J., Xiao, Y., Cao, Z.: Toward the repeatability and robustness of the local reference frame for 3D shape matching: an evaluation. IEEE Trans. Image Process. 27(8), 3766–3781 (2018). https://doi.org/10.1109/TIP.2018.2827330
Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE (2008) https://doi.org/10.1109/IROS.2008.4650967
Yang, J., Cao, Z., Zhang, Q.: A fast and robust local descriptor for 3D point cloud registration. Inf. Sci. 346, 163–179 (2016). https://doi.org/10.1016/j.ins.2016.01.095
Flint, A., Dick, A., Van den Hengel, A.: Local 3D structure recognition in range images. IET Comput. Vis. 2(4), 208–217 (2008). https://doi.org/10.1049/iet-cvi:20080037
Taati, B., Bondy, M., Jasiobedzki, P., Greenspan, M.: Variable dimensional local shape descriptors for object recognition in range data. In: Proc. IEEE/CVF International Conference on Computer Vision, pp. 1–8. IEEE (2007) https://doi.org/10.1109/ICCV.2007.4408830
Zhao, H., Tang, M., Ding, H.: HoPPF: a novel local surface descriptor for 3D object recognition. Pattern Recogn. 103, 107272 (2020). https://doi.org/10.1016/j.patcog.2020.107272
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413, (2017)
Deng, H., Birdal, T., Ilic, S.: Ppf-foldnet: unsupervised learning of rotation invariant 3d local descriptors. In: Proc. European Conference on Computer Vision, pp. 602–618. (2018) https://doi.org/10.1007/978-3-030-01228-1_37
Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 206–215. (2018) https://doi.org/10.1109/cvpr.2018.00029
Deng, H., Birdal, T., Ilic, S.: Ppfnet: global context aware local features for robust 3d point matching. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 195–205. (2018) https://doi.org/10.1109/CVPR.2018.00028
Ao, S., Hu, Q., Yang, B., Markham, A., Guo, Y.: SpinNet: learning a general surface descriptor for 3D point cloud registration. In: Proc. Computer Vision and Pattern Recognition, pp. 11753–11762. (2021)
Bu, S., Han, P., Liu, Z., Li, K., Han, J.: Shift-invariant ring feature for 3D shape. Vis. Comput. 30(6), 867–876 (2014). https://doi.org/10.1007/s00371-014-0970-1
Li, L., Fu, H., Ovsjanikov, M.: UPDesc: unsupervised point descriptor learning for robust registration. arXiv preprint arXiv:2108.02740 (2021)
Zan, G., Zhou, C., Wegner, J.D., Wieser, A.: The perfect match: 3D point cloud matching with smoothed densities. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2019) https://doi.org/10.1109/CVPR.2019.00569
Huang, S., Xie, Y., Zhu, S.-C., Zhu, Y.: Spatio-temporal self-supervised representation learning for 3D point clouds. In: Proc. IEEE/CVF International Conference on Computer Vision, pp. 6535–6545. (2021)
Liu, H., Cong, Y., Yang, C., Tang, Y.: Efficient 3D object recognition via geometric information preservation. Pattern Recogn. 92, 135–145 (2019). https://doi.org/10.1016/j.patcog.2019.03.025
Bariya, P., Novatnack, J., Schwartz, G., Nishino, K.: 3D geometric scale variability in range images: features and descriptors. Int. J. Comput. Vis. 99(2), 232–255 (2012). https://doi.org/10.1007/s11263-012-0526-7
Lim, J., Lee, K.: 3D object recognition using scale-invariant features. Vis. Comput. 35(1), 71–84 (2019). https://doi.org/10.1007/s00371-017-1453-y
Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Proc. European Conference on Computer Vision. Springer, pp. 224–237. (2004) https://doi.org/10.1007/978-3-540-24672-5_18
Shang, L., Greenspan, M.: Real-time object recognition in sparse range images using error surface embedding. Int. J. Comput. Vis. 89(2–3), 211–228 (2010). https://doi.org/10.1007/s11263-009-0276-3
Klasing, K., Althoff, D., Wollherr, D., Buss, M.: Comparison of surface normal estimation methods for range sensing applications. In: Proc. IEEE International Conference on Robotics and Automation, pp. 3206–3211. IEEE (2009) https://doi.org/10.1109/ROBOT.2009.5152493
Mian, A.S., Bennamoun, M., Owens, R.A.: A novel representation and feature matching algorithm for automatic pairwise registration of range images. Int. J. Comput. Vis. 66(1), 19–40 (2006). https://doi.org/10.1007/s11263-005-3221-0
Yang, J., Xiao, Y., Cao, Z.: Aligning 2.5 D scene fragments with distinctive local geometric features and voting-based correspondences. IEEE Trans. Circuits Syst. Video Technol. 29(3), 714–729 (2018). https://doi.org/10.1109/TCSVT.2018.2813083
Horn, A.: Doubly stochastic matrices and the diagonal of a rotation matrix. Am. J. Math. 76(3), 620–630 (1954). https://doi.org/10.2307/2372705
Tombari, F., Salti, S., Di Stefano, L.: Performance evaluation of 3D keypoint detectors. Int. J. Comput. Vis. 102(1), 198–220 (2013). https://doi.org/10.1007/s11263-012-0545-4
Mian, A.S., Bennamoun, M., Owens, R.: Three-dimensional model-based object recognition and segmentation in cluttered scenes. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1584–1601 (2006). https://doi.org/10.1109/TPAMI.2006.213
Mian, A., Bennamoun, M., Owens, R.: On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes. Int. J. Comput. Vis. 89(2), 348–361 (2010). https://doi.org/10.1007/s11263-009-0296-z
Rusu, R.B., Cousins, S.: 3d is here: point cloud library (pcl). In: Proc. IEEE International Conference on Robotics and Automation, pp. 1–4. IEEE (2011) https://doi.org/10.1109/ICRA.2011.5980567
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China (No. 62002295 and 62006025), the Ningbo Natural Science Foundation (No. 202003N4058), the China Postdoctoral Science Foundation (No. 2020M673319), the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2021JQ-290 and 2020JQ-210), the State Key Laboratory of Rail Transit Engineering Informatization (FSDI) [Contract No. SKLKZ21-02], and the Fundamental Research Funds for the Central Universities (No. 3102019QD1002).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, J., Fan, S., Huang, Z. et al. VOID: 3D object recognition based on voxelization in invariant distance space. Vis Comput 39, 3073–3089 (2023). https://doi.org/10.1007/s00371-022-02514-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02514-1