V $$^2$$ MLP: an accurate and simple multi-view MLP network for fine-grained 3D shape recognition

Zheng, Liang; Bai, Jing; Bai, Shaojin; Li, Wenjing; Peng, Bin; Zhou, Tao

doi:10.1007/s00371-023-03191-4

V$^2$MLP: an accurate and simple multi-view MLP network for fine-grained 3D shape recognition

Original article
Published: 21 December 2023

Volume 40, pages 6655–6670, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Liang Zheng¹,
Jing Bai ORCID: orcid.org/0000-0003-4247-6210^1,2,
Shaojin Bai¹,
Wenjing Li¹,
Bin Peng¹ &
…
Tao Zhou^1,2

206 Accesses
Explore all metrics

Abstract

Fine-grained 3D shape recognition (FGSR) is crucial for real-world applications. Existing methods face challenges in achieving high accuracy for FGSR due to high similarity within sub-categories and low dissimilarity between them, especially in the absence of part location or attribute annotations. In this paper, we propose V$^2$MLP, a multi-view representation-oriented MLP network dedicated to FGSR, using only class labels as supervision. V$^2$MLP comprises two key modules: the cross-view interaction MLP (CVI-MLP) and the cross-view fusion MLP (CVF-MLP). The CVI-MLP module captures contextual information, including local and global contexts through cross-view interactions, to extract discriminative view features that reinforce subtle differences between sub-categories. Meanwhile, the CVF-MLP module performs cross-view aggregation from space and view dimensions to obtain the final 3D shape features, minimizing information loss during the view feature fusion process. Extensive experiments on three categories from the FG3D dataset demonstrate the effectiveness of V$^2$MLP in learning discriminative features for 3D shapes, achieving state-of-the-art accuracy for FGSR. Additionally, V$^2$MLP performs competitively for meta-category recognition on the ModelNet40 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition

Article 10 May 2023

$\hbox {C}^2$DFL: cross-view cross-layer discriminative feature learning for fine-grained 3D shape classification

Article 22 March 2025

3D shape classification based on global and local features extraction with collaborative learning

Article 28 September 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability statement

The data that support the findings of this study are available from the corresponding author, [Jing Bai], upon reasonable request.

References

Xiong, S., Tziafas, G., Kasaei, H.: Enhancing fine-grained 3D object recognition using hybrid multi-modal vision transformer-CNN models. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023) (2023)
Wu, R., Bai, J., Li, W., Jiang, J.: DCNet: exploring fine-grained vision classification for 3D point clouds. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02816-y
Article Google Scholar
Shao, H., Bai, J., Wu, R., Jiang, J., Liang, H.: FGPNet: a weakly supervised fine-grained 3D point clouds classification network. Pattern Recogn. 139, 109509 (2023). https://doi.org/10.1016/j.patcog.2023.109509
Article Google Scholar
Liu, X., Han, Z., Liu, Y.-S., Zwicker, M.: Fine-grained 3D shape classification with hierarchical part-view attention. IEEE Trans. Image Process. 30, 1744–1758 (2021)
Article Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Kanezaki, A., Matsushita, Y., Nishida, Y.: RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5010–5019 (2018)
Wei, X., Yu, R., Sun, J.: View-GCN: view-based graph convolutional network for 3D shape analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1850–1859 (2020)
Shilane, P., Min, P., Kazhdan, M., Funkhouser, T.: The Princeton shape benchmark. In: Proceedings Shape Modeling Applications, pp. 167–178 (2004). IEEE
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: European Conference on Computer Vision, pp. 834–849 (2014). Springer
Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., Metaxas, D.: Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1143–1152 (2016)
Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850 (2015)
Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)
Article Google Scholar
Liu, F., Zou, C., Deng, X., Zuo, R., Lai, Y.-K., Ma, C., Liu, Y.-J., Wang, H.: Scenesketcher: Fine-grained image retrieval with scene sketches. In: European Conference on Computer Vision, pp. 718–734 (2020). Springer
Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Zhu, Y., Liu, G.: Fine-grained action recognition using multi-view attentions. Vis. Comput. 36(9), 1771–1781 (2020)
Article Google Scholar
Lyu, C., Hu, G., Wang, D.: Attention to fine-grained information: hierarchical multi-scale network for retinal vessel segmentation. Vis. Comput. 38, 345–355 (2022)
Article Google Scholar
Li, M., Lei, L., Sun, H., Li, X., Kuang, G.: Fine-grained visual classification via multilayer bilinear pooling with object localization. Vis. Comput. 38, 811–820 (2022)
Article Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928 (2015). IEEE
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). IEEE
Savva, M., Yu, F., Su, H., Aono, M., Chen, B., Cohen-Or, D., Deng, W., Su, H., Bai, S., Bai, X., et al.: SHREC’16 track: largescale 3D shape retrieval from ShapeNet Core55. In: Proceedings of the Eurographics Workshop on 3D Object Retrieval, vol. 10 (2016)
Johns, E., Leutenegger, S., Davison, A.J.: Pairwise decomposition of image sequences for active multi-view recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3813–3822 (2016)
Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: Group-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 264–272 (2018)
Yang, Z., Wang, L.: Learning relationships for multi-view 3D object recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7505–7514 (2019)
Yu, T., Meng, J., Yuan, J.: Multi-view harmonized bilinear network for 3D object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 186–194 (2018)
Chen, S., Zheng, L., Zhang, Y., Sun, Z., Xu, K.: VERAM: view-enhanced recurrent attention model for 3D shape classification. IEEE Trans. Vis. Comput. Graphics 25(12), 3244–3257 (2018)
Article Google Scholar
Dai, G., Xie, J., Fang, Y.: Siamese CNN-BiLSTM architecture for 3D shape representation learning. In: IJCAI, pp. 670–676 (2018)
Ma, C., Guo, Y., Yang, J., An, W.: Learning multi-view representation with LSTM for 3-D shape recognition and retrieval. IEEE Trans. Multimedia 21(5), 1169–1182 (2018)
Article Google Scholar
Tolstikhin, I.O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J., et al.: MLP-Mixer: an all-MLP architecture for vision. Ad. Neural Inf. Process. Syst. 34, 24261–24272 (2021)
Google Scholar
Liu, H., Dai, Z., So, D., Le, Q.V.: Pay attention to MLPs. Adv. Neural Inf. Process. Syst. 34, 9204–9215 (2021)
Google Scholar
Tang, Y., Han, K., Guo, J., Xu, C., Li, Y., Xu, C., Wang, Y.: An image patch is a wave: phase-aware vision MLP. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10935–10944 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Guo, M.-H., Cai, J.-X., Liu, Z.-N., Mu, T.-J., Martin, R.R., Hu, S.-M.: PCT: point cloud transformer. Comput. Vis. Media 7(2), 187–199 (2021)
Article Google Scholar
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16259–16268 (2021)
Ma, X., Qin, C., You, H., Ran, H., Fu, Y.: Rethinking network design and local geometry in point cloud: a simple residual MLP framework. In: International Conference on Learning Representations (2022)
Han, Z., Lu, H., Liu, Z., Vong, C.-M., Liu, Y.-S., Zwicker, M., Han, J., Chen, C.P.: 3D2SeqViews: aggregating sequential views for 3D global feature learning by CNN with hierarchical attention aggregation. IEEE Trans. Image Process. 28(8), 3986–3999 (2019)
Article MathSciNet Google Scholar
Han, Z., Wang, X., Vong, C.M., Liu, Y.-S., Zwicker, M., Chen, C.L.P.: 3DViewGraph: learning global features for 3D shapes from a graph of unordered views with attention. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 758–765 (2019). https://doi.org/10.24963/ijcai.2019/107
Han, Z., Shang, M., Liu, Z., Vong, C.-M., Liu, Y.-S., Zwicker, M., Han, J., Chen, C.P.: SeqViews2SeqLabels: learning 3D global features via aggregating sequential views by RNN with attention. IEEE Trans. Image Process. 28(2), 658–672 (2018)
Article MathSciNet Google Scholar
Han, Z., Liu, X., Liu, Y.-S., Zwicker, M.: Parts4Feature: learning 3D global features from generally semantic parts in multiple views. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 766–773 (2019). https://doi.org/10.24963/ijcai.2019/108
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China (62162001, 61762003), The Natural Science Foundation of Ningxia Province of China (2022AAC02041), The Ningxia Excellent Talent Program.

Author information

Authors and Affiliations

School of Computer Science and Engineering, North Minzu University, Yinchuan, 750021, China
Liang Zheng, Jing Bai, Shaojin Bai, Wenjing Li, Bin Peng & Tao Zhou
The Key Laboratory of Images processing and Pattern Laboratory, Commission: IPPRLab, North Minzu University, Yinchuan, 750021, China
Jing Bai & Tao Zhou

Authors

Liang Zheng
View author publications
You can also search for this author inPubMed Google Scholar
Jing Bai
View author publications
You can also search for this author inPubMed Google Scholar
Shaojin Bai
View author publications
You can also search for this author inPubMed Google Scholar
Wenjing Li
View author publications
You can also search for this author inPubMed Google Scholar
Bin Peng
View author publications
You can also search for this author inPubMed Google Scholar
Tao Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jing Bai.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Consent for publication

This manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under consideration for publication elsewhere, in whole or in part. All the authors listed have approved the manuscript that is enclosed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A Visualization of complete classification confusion matrix

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zheng, L., Bai, J., Bai, S. et al. V$^2$MLP: an accurate and simple multi-view MLP network for fine-grained 3D shape recognition. Vis Comput 40, 6655–6670 (2024). https://doi.org/10.1007/s00371-023-03191-4

Download citation

Accepted: 29 October 2023
Published: 21 December 2023
Issue Date: September 2024
DOI: https://doi.org/10.1007/s00371-023-03191-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

V\(^2\)MLP: an accurate and simple multi-view MLP network for fine-grained 3D shape recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition

\(\hbox {C}^2\)DFL: cross-view cross-layer discriminative feature learning for fine-grained 3D shape classification

3D shape classification based on global and local features extraction with collaborative learning

Data availability statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Consent for publication

Additional information

Publisher's Note

Appendix

1.1 A Visualization of complete classification confusion matrix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

V\(^2\)MLP: an accurate and simple multi-view MLP network for fine-grained 3D shape recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition

\(\hbox {C}^2\)DFL: cross-view cross-layer discriminative feature learning for fine-grained 3D shape classification

3D shape classification based on global and local features extraction with collaborative learning

Explore related subjects

Data availability statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Consent for publication

Additional information

Publisher's Note

Appendix

Appendix

1.1 A Visualization of complete classification confusion matrix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now