Skip to main content
Log in

FuseNet: a multi-modal feature fusion network for 3D shape classification

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Recently, the primary focus of research in 3D shape classification has been on point cloud and multi-view methods. However, the multi-view approaches inevitably lose the structural information of 3D shapes due to the camera angle limitation. The point cloud methods use a neural network to maximize the pooling of all points to obtain a global feature, resulting in the loss of local detailed information. The disadvantages of multi-view and point cloud methods affect the performance of 3D shape classification. This paper proposes a novel FuseNet model, which integrates multi-view and point cloud information and significantly improves the accuracy of 3D model classification. First, we propose a multi-view and point cloud part to obtain the raw features of different convolution layers of multi-view and point clouds. Second, we adopt a multi-view pooling method for feature fusion of multiple views to integrate features of different convolution layers more effectively, and we propose an attention-based multi-view and point cloud fusion block for integrating features of point cloud and multiple views. Finally, we extensively tested our method on three benchmark datasets: the ModelNet10, ModelNet40, and ShapeNet Core55. Our method’s experimental results demonstrate superior or comparable classification performance to previously established state-of-the-art techniques for 3D shape classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Bai, S., Bai, X., Zhou, Z., Zhang, Z., Jan Latecki, L.: GIFT: a real-time and scalable 3d shape search engine. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5023–5032 (2016)

  2. Bai, S., Bai, X., Zhou, Z., Zhang, Z., Tian, Q., Latecki, L.J.: GIFT: towards scalable 3d shape retrieval. IEEE Trans. Multimed. 19(6), 1257–1271 (2017)

    Article  MATH  Google Scholar 

  3. Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of the 14th European Conference on Computer Vision—ECCV 2016, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 354–370. Springer (2016)

  4. Chen, L., Zhang, Q.: DDGCN: graph convolution network based on direction and distance for point cloud learning. Vis. Comput. 39(3), 863–873 (2023)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chen, X., Chen, Y., Gupta, K., Zhou, J., Najjaran, H.: SliceNet: a proficient model for real-time 3d shape-based recognition. Neurocomputing 316, 144–155 (2018)

    Article  MATH  Google Scholar 

  6. Fang, Y., Xu, C., Zhou, C., Cui, Z., Hu, C.: Direction-induced convolution for point cloud analysis. Multimed. Syst. pp. 1–12 (2022)

  7. Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 264–272 (2018)

  8. Furuya, T., Ohbuchi, R.: Deep aggregation of local 3d geometric features for 3d model retrieval. In: BMVC, vol. 7, p. 8 (2016)

  9. Goyal, A., Law, H., Liu, B., Newell, A., Deng, J.: Revisiting point cloud shape classification with a simple and effective baseline. In: International Conference on Machine Learning, pp. 3809–3820. PMLR (2021)

  10. Hagbi, N., Bergig, O., El-Sana, J., Billinghurst, M.: Shape recognition and pose estimation for mobile augmented reality. IEEE Trans. Vis. Comput. Graph. 17(10), 1369–1379 (2010)

    Article  Google Scholar 

  11. Hamdi, A., Giancola, S., Ghanem, B.: MVTN: Multi-view transformation network for 3d shape recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1–11 (2021)

  12. Han, Z., Lu, H., Liu, Z., Vong, C.M., Liu, Y.S., Zwicker, M., Han, J., Chen, C.P.: 3D2SeqViews: aggregating sequential views for 3d global feature learning by CNN with hierarchical attention aggregation. IEEE Trans. Image Process. 28(8), 3986–3999 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  13. Han, Z., Shang, M., Liu, Y.S., Zwicker, M.: View inter-prediction GAN: unsupervised representation learning for 3d shapes by learning global shape memories to support local view predictions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8376–8384 (2019)

  14. Han, Z., Shang, M., Liu, Z., Vong, C.M., Liu, Y.S., Zwicker, M., Han, J., Chen, C.P.: SeqViews2SeqLabels: learning 3d global features via aggregating sequential views by RNN with attention. IEEE Trans. Image Process. 28(2), 658–672 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hassan, R., Fraz, M., Rajput, A., Shahzad, M.: Residual learning with annularly convolutional neural networks for classification and segmentation of 3d point clouds. Neurocomputing 526, 96–108 (2023)

    Article  MATH  Google Scholar 

  16. Hegde, V., Zadeh, R.: Fusionnet: 3d object classification using multiple data representations. arXiv:1607.05695 (2016)

  17. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  18. Huang, X., Nong, L., Zhang, W.: A multimodal fusion network based on hypergraph for 3d shape retrieval. In: 2022 IEEE 22nd International Conference on Communication Technology (ICCT), pp. 1682–1687. IEEE (2022)

  19. Kanezaki, A., Matsushita, Y., Nishida, Y.: Rotationnet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5010–5019 (2018)

  20. Khan, S.H., Guo, Y., Hayat, M., Barnes, N.: Unsupervised primitive discovery for improved 3d generative modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9739–9748 (2019)

  21. Klokov, R., Lempitsky, V.: Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 863–872 (2017)

  22. Kumawat, S., Raman, S.: LP-3DCNN: unveiling local phase in 3d convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4903–4912 (2019)

  23. Li, B., Johan, H.: 3d model retrieval using hybrid features and class information. Multimed. Tools Appl. 62, 821–846 (2013)

    Article  MATH  Google Scholar 

  24. Li, J., Saydam, S., Xu, Y., Liu, B., Li, B., Lin, X., Zhang, W.: Class-aware tiny object recognition over large-scale 3d point clouds. Neurocomputing 529, 166–181 (2023)

    Article  Google Scholar 

  25. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on x-transformed points. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

  26. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  27. Liu, A.A., Zhang, Y., Zhang, C., Li, W., Lv, B., Lei, L., Li, X.: Prototype-based semantic consistency learning for unsupervised 2d image-based 3d shape retrieval. Multimed. Syst. 29(4), 1995–2007 (2023)

    Article  MATH  Google Scholar 

  28. Liu, H., Tian, S.: Deep 3d point cloud classification and segmentation network based on gatenet. The Visual Computer pp. 1–11 (2023)

  29. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision–ECCV 2016, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer (2016)

  30. Ma, C., Guo, Y., Yang, J., An, W.: Learning multi-view representation with LSTM for 3-d shape recognition and retrieval. IEEE Trans. Multimed. 21(5), 1169–1182 (2018)

    Article  MATH  Google Scholar 

  31. Ma, X., Qin, C., You, H., Ran, H., Fu, Y.: Rethinking network design and local geometry in point cloud: a simple residual MLP framework. arXiv:2202.07123 (2022)

  32. Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)

  33. Meng, H.Y., Gao, L., Lai, Y.K., Manocha, D.: Vv-net: Voxel vae net with group convolutions for point cloud segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8500–8508 (2019)

  34. Mitra, N.J., Guibas, L.J., Pauly, M.: Partial and approximate symmetry detection for 3d geometry. ACM Trans. Graph. (ToG) 25(3), 560–568 (2006)

    Article  MATH  Google Scholar 

  35. Pylvanainen, T., Roimela, K., Vedantham, R., Itaranta, J., Grzeszczuk, R.: Automatic alignment and multi-view segmentation of street view data using 3d shape priors. In: Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), vol. 737, pp. 738–739 (2010)

  36. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, pp. 652–660 (2017)

  37. Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2016)

  38. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

  39. Richards-Rissetto, H., Remondino, F., Agugiaro, G., Von Schwerin, J., Robertsson, J., Girardi, G.: Kinect and 3d GIS in archaeology. In: 2012 18th International Conference on Virtual Systems and Multimedia, pp. 331–337. IEEE (2012)

  40. Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3d registration. In: 2009 IEEE International Conference on Robotics and Automation, pp. 3212–3217. IEEE (2009)

  41. Savva, M., Yu, F., Su, H., Aono, M., Chen, B., Cohen-Or, D., Deng, W., Su, H., Bai, S., Bai, X., et al.: Shrec16 track: largescale 3d shape retrieval from shapenet core55. In: Proceedings of the Eurographics Workshop on 3D Object Retrieval, vol. 10 (2016)

  42. Schnabel, R., Wahl, R., Klein, R.: Efficient RANSAC for point-cloud shape detection. In: Computer graphics forum, vol. 26, pp. 214–226. Wiley Online Library (2007)

  43. Sfikas, K., Pratikakis, I., Theoharis, T.: Ensemble of panorama-based convolutional neural networks for 3d model classification and retrieval. Comput. Graph. 71, 208–218 (2018)

    Article  MATH  Google Scholar 

  44. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  45. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)

  46. Sun, H., Wang, Y., Wang, P., Cai, X., Li, D.: Viewformer: view set attention for multi-view 3d shape understanding. arXiv:2305.00161 (2023)

  47. Wang, C., Pelillo, M., Siddiqi, K.: Dominant set clustering and pooling for multi-view 3d object recognition. arXiv:1906.01592 (2019)

  48. Wang, L., Xu, H., Kang, W.: Mvcontrast: unsupervised pretraining for multi-view 3d object recognition. Mach. Intell. Res. 20(6), 872–883 (2023)

    Article  MATH  Google Scholar 

  49. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)

    Article  MATH  Google Scholar 

  50. Wei, X., Yu, R., Sun, J.: View-GCN: View-based graph convolutional network for 3d shape analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1850–1859 (2020)

  51. Wu, C., Zheng, J., Pfrommer, J., Beyerer, J.: Attention-based point cloud edge sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5333–5343 (2023)

  52. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

  53. Xu, R., Mi, Q., Ma, W., Zha, H.: View-relation constrained global representation learning for multi-view-based 3d object recognition. Appl. Intell. 53(7), 7741–7750 (2023)

    Article  MATH  Google Scholar 

  54. Yavartanoo, M., Kim, E.Y., Lee, K.M.: Spnet: Deep 3d object classification and retrieval using stereographic projection. In: Asian Conference on Computer Vision, pp. 691–706. Springer (2018)

  55. You, H., Feng, Y., Ji, R., Gao, Y.: Pvnet: a joint convolutional network of point cloud and multi-view for 3d shape recognition. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 1310–1318 (2018)

  56. Zhang, Z., Lin, H., Zhao, X., Ji, R., Gao, Y.: Inductive multi-hypergraph learning and its application on view-based 3d object classification. IEEE Trans. Image Process. 27(12), 5957–5968 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  57. Zhang, Z., Yu, Y., Da, F.: VGPCNet: viewport group point clouds network for 3D shape recognition. Appl. Intell. 53(16), 19060–19073 (2023)

    Article  Google Scholar 

  58. Zhao, Y., Jiao, J., Li, N., Deng, Z.: MANet: multimodal attention network based point-view fusion for 3d shape recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 134–141. IEEE (2021)

  59. Zhi, S., Liu, Y., Li, X., Guo, Y.: Toward real-time 3d object recognition: a lightweight volumetric CNN framework using multitask learning. Comput. Graph. 71, 199–207 (2018)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their helpful suggestions. This work was supported by the National Natural Science Foundation of China (Grant No. 62106227), the China Postdoctoral Science Foundation (Grant No. 2023M743132), and the ”Teacher Professional Development Project” for Domestic Visiting Scholars in 2023 (Project No. FX2023007).

Author information

Authors and Affiliations

Authors

Contributions

Xin Zhao contributed to conceptualization, methodology, and writing—original draft. Yinhuang Chen performed english polishing and error checking. Chengzhuan Yang performed writing—reviewing and editing, funding, and supervision. Lincong Fang contributed to investigation, software, and data curation.

Corresponding author

Correspondence to Chengzhuan Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Chen, Y., Yang, C. et al. FuseNet: a multi-modal feature fusion network for 3D shape classification. Vis Comput 41, 2973–2985 (2025). https://doi.org/10.1007/s00371-024-03581-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-024-03581-2

Keywords