Abstract
Recently, the primary focus of research in 3D shape classification has been on point cloud and multi-view methods. However, the multi-view approaches inevitably lose the structural information of 3D shapes due to the camera angle limitation. The point cloud methods use a neural network to maximize the pooling of all points to obtain a global feature, resulting in the loss of local detailed information. The disadvantages of multi-view and point cloud methods affect the performance of 3D shape classification. This paper proposes a novel FuseNet model, which integrates multi-view and point cloud information and significantly improves the accuracy of 3D model classification. First, we propose a multi-view and point cloud part to obtain the raw features of different convolution layers of multi-view and point clouds. Second, we adopt a multi-view pooling method for feature fusion of multiple views to integrate features of different convolution layers more effectively, and we propose an attention-based multi-view and point cloud fusion block for integrating features of point cloud and multiple views. Finally, we extensively tested our method on three benchmark datasets: the ModelNet10, ModelNet40, and ShapeNet Core55. Our method’s experimental results demonstrate superior or comparable classification performance to previously established state-of-the-art techniques for 3D shape classification.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bai, S., Bai, X., Zhou, Z., Zhang, Z., Jan Latecki, L.: GIFT: a real-time and scalable 3d shape search engine. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5023–5032 (2016)
Bai, S., Bai, X., Zhou, Z., Zhang, Z., Tian, Q., Latecki, L.J.: GIFT: towards scalable 3d shape retrieval. IEEE Trans. Multimed. 19(6), 1257–1271 (2017)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of the 14th European Conference on Computer Vision—ECCV 2016, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 354–370. Springer (2016)
Chen, L., Zhang, Q.: DDGCN: graph convolution network based on direction and distance for point cloud learning. Vis. Comput. 39(3), 863–873 (2023)
Chen, X., Chen, Y., Gupta, K., Zhou, J., Najjaran, H.: SliceNet: a proficient model for real-time 3d shape-based recognition. Neurocomputing 316, 144–155 (2018)
Fang, Y., Xu, C., Zhou, C., Cui, Z., Hu, C.: Direction-induced convolution for point cloud analysis. Multimed. Syst. pp. 1–12 (2022)
Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 264–272 (2018)
Furuya, T., Ohbuchi, R.: Deep aggregation of local 3d geometric features for 3d model retrieval. In: BMVC, vol. 7, p. 8 (2016)
Goyal, A., Law, H., Liu, B., Newell, A., Deng, J.: Revisiting point cloud shape classification with a simple and effective baseline. In: International Conference on Machine Learning, pp. 3809–3820. PMLR (2021)
Hagbi, N., Bergig, O., El-Sana, J., Billinghurst, M.: Shape recognition and pose estimation for mobile augmented reality. IEEE Trans. Vis. Comput. Graph. 17(10), 1369–1379 (2010)
Hamdi, A., Giancola, S., Ghanem, B.: MVTN: Multi-view transformation network for 3d shape recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1–11 (2021)
Han, Z., Lu, H., Liu, Z., Vong, C.M., Liu, Y.S., Zwicker, M., Han, J., Chen, C.P.: 3D2SeqViews: aggregating sequential views for 3d global feature learning by CNN with hierarchical attention aggregation. IEEE Trans. Image Process. 28(8), 3986–3999 (2019)
Han, Z., Shang, M., Liu, Y.S., Zwicker, M.: View inter-prediction GAN: unsupervised representation learning for 3d shapes by learning global shape memories to support local view predictions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8376–8384 (2019)
Han, Z., Shang, M., Liu, Z., Vong, C.M., Liu, Y.S., Zwicker, M., Han, J., Chen, C.P.: SeqViews2SeqLabels: learning 3d global features via aggregating sequential views by RNN with attention. IEEE Trans. Image Process. 28(2), 658–672 (2018)
Hassan, R., Fraz, M., Rajput, A., Shahzad, M.: Residual learning with annularly convolutional neural networks for classification and segmentation of 3d point clouds. Neurocomputing 526, 96–108 (2023)
Hegde, V., Zadeh, R.: Fusionnet: 3d object classification using multiple data representations. arXiv:1607.05695 (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Huang, X., Nong, L., Zhang, W.: A multimodal fusion network based on hypergraph for 3d shape retrieval. In: 2022 IEEE 22nd International Conference on Communication Technology (ICCT), pp. 1682–1687. IEEE (2022)
Kanezaki, A., Matsushita, Y., Nishida, Y.: Rotationnet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5010–5019 (2018)
Khan, S.H., Guo, Y., Hayat, M., Barnes, N.: Unsupervised primitive discovery for improved 3d generative modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9739–9748 (2019)
Klokov, R., Lempitsky, V.: Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 863–872 (2017)
Kumawat, S., Raman, S.: LP-3DCNN: unveiling local phase in 3d convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4903–4912 (2019)
Li, B., Johan, H.: 3d model retrieval using hybrid features and class information. Multimed. Tools Appl. 62, 821–846 (2013)
Li, J., Saydam, S., Xu, Y., Liu, B., Li, B., Lin, X., Zhang, W.: Class-aware tiny object recognition over large-scale 3d point clouds. Neurocomputing 529, 166–181 (2023)
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on x-transformed points. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Liu, A.A., Zhang, Y., Zhang, C., Li, W., Lv, B., Lei, L., Li, X.: Prototype-based semantic consistency learning for unsupervised 2d image-based 3d shape retrieval. Multimed. Syst. 29(4), 1995–2007 (2023)
Liu, H., Tian, S.: Deep 3d point cloud classification and segmentation network based on gatenet. The Visual Computer pp. 1–11 (2023)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision–ECCV 2016, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer (2016)
Ma, C., Guo, Y., Yang, J., An, W.: Learning multi-view representation with LSTM for 3-d shape recognition and retrieval. IEEE Trans. Multimed. 21(5), 1169–1182 (2018)
Ma, X., Qin, C., You, H., Ran, H., Fu, Y.: Rethinking network design and local geometry in point cloud: a simple residual MLP framework. arXiv:2202.07123 (2022)
Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)
Meng, H.Y., Gao, L., Lai, Y.K., Manocha, D.: Vv-net: Voxel vae net with group convolutions for point cloud segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8500–8508 (2019)
Mitra, N.J., Guibas, L.J., Pauly, M.: Partial and approximate symmetry detection for 3d geometry. ACM Trans. Graph. (ToG) 25(3), 560–568 (2006)
Pylvanainen, T., Roimela, K., Vedantham, R., Itaranta, J., Grzeszczuk, R.: Automatic alignment and multi-view segmentation of street view data using 3d shape priors. In: Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), vol. 737, pp. 738–739 (2010)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, pp. 652–660 (2017)
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2016)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Richards-Rissetto, H., Remondino, F., Agugiaro, G., Von Schwerin, J., Robertsson, J., Girardi, G.: Kinect and 3d GIS in archaeology. In: 2012 18th International Conference on Virtual Systems and Multimedia, pp. 331–337. IEEE (2012)
Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3d registration. In: 2009 IEEE International Conference on Robotics and Automation, pp. 3212–3217. IEEE (2009)
Savva, M., Yu, F., Su, H., Aono, M., Chen, B., Cohen-Or, D., Deng, W., Su, H., Bai, S., Bai, X., et al.: Shrec16 track: largescale 3d shape retrieval from shapenet core55. In: Proceedings of the Eurographics Workshop on 3D Object Retrieval, vol. 10 (2016)
Schnabel, R., Wahl, R., Klein, R.: Efficient RANSAC for point-cloud shape detection. In: Computer graphics forum, vol. 26, pp. 214–226. Wiley Online Library (2007)
Sfikas, K., Pratikakis, I., Theoharis, T.: Ensemble of panorama-based convolutional neural networks for 3d model classification and retrieval. Comput. Graph. 71, 208–218 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Sun, H., Wang, Y., Wang, P., Cai, X., Li, D.: Viewformer: view set attention for multi-view 3d shape understanding. arXiv:2305.00161 (2023)
Wang, C., Pelillo, M., Siddiqi, K.: Dominant set clustering and pooling for multi-view 3d object recognition. arXiv:1906.01592 (2019)
Wang, L., Xu, H., Kang, W.: Mvcontrast: unsupervised pretraining for multi-view 3d object recognition. Mach. Intell. Res. 20(6), 872–883 (2023)
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)
Wei, X., Yu, R., Sun, J.: View-GCN: View-based graph convolutional network for 3d shape analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1850–1859 (2020)
Wu, C., Zheng, J., Pfrommer, J., Beyerer, J.: Attention-based point cloud edge sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5333–5343 (2023)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Xu, R., Mi, Q., Ma, W., Zha, H.: View-relation constrained global representation learning for multi-view-based 3d object recognition. Appl. Intell. 53(7), 7741–7750 (2023)
Yavartanoo, M., Kim, E.Y., Lee, K.M.: Spnet: Deep 3d object classification and retrieval using stereographic projection. In: Asian Conference on Computer Vision, pp. 691–706. Springer (2018)
You, H., Feng, Y., Ji, R., Gao, Y.: Pvnet: a joint convolutional network of point cloud and multi-view for 3d shape recognition. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 1310–1318 (2018)
Zhang, Z., Lin, H., Zhao, X., Ji, R., Gao, Y.: Inductive multi-hypergraph learning and its application on view-based 3d object classification. IEEE Trans. Image Process. 27(12), 5957–5968 (2018)
Zhang, Z., Yu, Y., Da, F.: VGPCNet: viewport group point clouds network for 3D shape recognition. Appl. Intell. 53(16), 19060–19073 (2023)
Zhao, Y., Jiao, J., Li, N., Deng, Z.: MANet: multimodal attention network based point-view fusion for 3d shape recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 134–141. IEEE (2021)
Zhi, S., Liu, Y., Li, X., Guo, Y.: Toward real-time 3d object recognition: a lightweight volumetric CNN framework using multitask learning. Comput. Graph. 71, 199–207 (2018)
Acknowledgements
We would like to thank the anonymous reviewers for their helpful suggestions. This work was supported by the National Natural Science Foundation of China (Grant No. 62106227), the China Postdoctoral Science Foundation (Grant No. 2023M743132), and the ”Teacher Professional Development Project” for Domestic Visiting Scholars in 2023 (Project No. FX2023007).
Author information
Authors and Affiliations
Contributions
Xin Zhao contributed to conceptualization, methodology, and writing—original draft. Yinhuang Chen performed english polishing and error checking. Chengzhuan Yang performed writing—reviewing and editing, funding, and supervision. Lincong Fang contributed to investigation, software, and data curation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, X., Chen, Y., Yang, C. et al. FuseNet: a multi-modal feature fusion network for 3D shape classification. Vis Comput 41, 2973–2985 (2025). https://doi.org/10.1007/s00371-024-03581-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-024-03581-2