Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification

Zhou, He-Yu; Liu, An-An; Zhang, Chen-Yu; Zhu, Ping; Zhang, Qian-Yi; Kankanhalli, Mohan

doi:10.1007/s11263-023-01905-8

Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification

Published: 05 October 2023

Volume 132, pages 673–688, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

He-Yu Zhou¹,
An-An Liu¹,
Chen-Yu Zhang¹,
Ping Zhu¹,
Qian-Yi Zhang¹ &
…
Mohan Kankanhalli²

841 Accesses
Explore all metrics

Abstract

Nowadays, driven by the increasing concern on 3D techniques, resulting in the large-scale 3D data, 3D model classification has attracted enormous attention from both research and industry communities. Most of the current methods highly depend on sufficient labeled 3D models, which substantially restricts their scalability to novel classes with few annotated training data since it can increase the chance of overfitting. Besides, they only leverage single-modal information (either point cloud or multi-view information), and few works integrate these complementary information for 3D model representation. To overcome these problems, we propose a multi-modal meta-transfer fusion network (M$^{3}$TF), the key of which is to perform few-shot multi-modal representation for 3D model classification. Specifically, we first convert the original 3D data into both multi-view and point cloud modalities, and pre-train individual encoding networks on a large-scale dataset to obtain the optimal initial parameters, which is beneficial to few-shot learning tasks. Then, to enable the network to adjust to few-shot learning tasks, we update the parameters in Scaling and Shifting operation (SS), multi-modal representation fusion (MMRF) and the 3D model classifier to obtain optimal initialization parameters. Since the large-scale training parameters in feature extractors will increase the chance of overfitting, we freeze the feature extractor and introduce a SS operation to adjust its weights. Specifically, SS can reduce the number of training parameters up to 20%, which can effectively avoid overfitting. MMRF can adaptively integrate the multi-modal information based on their significance to the 3D model for a more robust 3D representation. Since there is no available dataset for evaluation, we build three 3D CAD datasets, Meta-ModalNet, Meta-ShapeNet and Meta-RGBD, for this new task and implement the representative methods for fair comparisons. Extensive experimental results can demonstrate the superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantically guided projection for zero-shot 3D model classification and retrieval

Article 16 July 2022

A Closer Look at Few-Shot 3D Point Cloud Classification

Article 15 December 2022

Multi-modal Relation Distillation for Unified 3D Representation Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bai, S., Bai, X., Zhou, Z., et al. (2017). Gift: Towards scalable 3d shape retrieval. IEEE Transactions on Multimedia, 19(6), 1257–1271.
Article Google Scholar
Bertinetto, L., Henriques, JF., Torr, PH., & et al. (2018). Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136
Chang, AX., Funkhouser, T., Guibas, L., et al. (2015). Shapenet: An information-rich 3d model repository
Chen, DY., Tian, XP., Shen, YT., et al. (2003). On visual similarity based 3d model retrieval. In: Computer graphics forum, pp 223–232
Chen, WY., Liu, YC., Kira, Z., et al. (2019a). A closer look at few-shot classification. arXiv preprint arXiv:1904.04232
Chen, X., Ma, H., Wan, J., & et al. (2017). Multi-view 3d object detection network for autonomous driving. In: CVPR, pp 1907–1915
Chen, Z., Fu, Y., Zhang, Y., et al. (2019). Multi-level semantic feature augmentation for one-shot learning. IEEE Transactions on Image Processing, 28(9), 4594–4605.
Article MathSciNet ADS Google Scholar
Dan, Z. (2017). Study on interior decoration system design based on 3d scene modeling technology. In: ICSGEA, pp 380–383
Deng, J., Dong, W., Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In: CVPR, pp 248–255
Dyn, N., Levine, D., & Gregory, J. A. (1990). A butterfly subdivision scheme for surface interpolation with tension control. ACM Transactions on Graphics (TOG), 9(2), 160–169.
Article Google Scholar
Feng, Y., Zhang, Z., Zhao, X., et al. (2018). Gvcnn: Group-view convolutional neural networks for 3d shape recognition. In: CVPR, pp 264–272
Finn, C., Abbeel, P., Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, pp 1126–1135
Finn, C., Xu, K., Levine, S. (2018). Probabilistic model-agnostic meta-learning. In: NIPs, pp 9516–9527
Gao, Y., Tang, J., Hong, R., et al. (2011). Camera constraint-free view-based 3-d object retrieval. IEEE Transactions on Image Processing, 21(4), 2269–2281.
Article MathSciNet PubMed ADS Google Scholar
Garcia, V., & Bruna, J. (2017). Few-shot learning with graph neural networks. arXiv preprint arXiv:1711.04043
Grant, E., Finn, C., Levine, S., & et al. (2018). Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930
He, K., Zhang, X., Ren, S., & et al. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: ICCV, pp 1026–1034
Hegde, V., & Zadeh, R. (2016). Fusionnet: 3d object classification using multiple data representations. arXiv preprint arXiv:1607.05695
Hinton, GE., & Plaut, DC. (1987). Using fast weights to deblur old memories. In: CCSs, pp 177–186
Hou, R., Chang, H., Bingpeng, M., & et al. (2019). Cross attention network for few-shot classification. In: NIPs, pp 4003–4014
Hu, SX., Moreno, PG., Xiao, Y., & et al. (2020). Empirical bayes transductive meta-learning with synthetic gradients. arXiv preprint arXiv:2004.12696
Jaritz, M., Vu, TH., Charette, Rd., & et al. (2020). xmuda: Cross-modal unsupervised domain adaptation for 3d semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12,605–12,614
Jiang, J., Bao, D., Chen, Z., & et al. (2019). Mlvcnn: Multi-loop-view convolutional neural network for 3d shape retrieval. In: AAAI, pp 8513–8520
Lee, Y., & Choi, S. (2018). Gradient-based meta-learning with learned layerwise metric and subspace. arXiv preprint arXiv:1801.05558
Li, H., Dong, W., Mei, X., & et al. (2019). Lgm-net: Learning to generate matching networks for few-shot learning. arXiv preprint arXiv:1905.06331
Li, J., Chen, BM., & Hee Lee, G. (2018). So-net: Self-organizing network for point cloud analysis. In: CVPR, pp 9397–9406
Lin, TY., Maire, M., Belongie, SJ., & et al. (2014). Microsoft COCO: Common objects in context. In: ECCV, pp 740–755
Liu, A., Xiang, S., Li, W., & et al. (2018a). Cross-domain 3d model retrieval via visual domain adaptation. In: IJCAI, pp 828–834
Liu, X., Han, Z., Liu, YS., & et al. (2019). Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In: AAAI, pp 8778–8785
Liu, Y., Lee, J., Park, M., & et al. (2018b). Learning to propagate labels: Transductive propagation network for few-shot learning. arXiv preprint arXiv:1805.10002
Mishra, N., Rohaninejad, M., Chen, X., & et al, (2017). A simple neural attentive meta-learner. arXiv preprint arXiv:1707.03141
Munkhdalai, T., & Yu, H. (2017). Meta networks. In: ICML, pp 2554–2563
Nichol, A., & Schulman, J. (2018). Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999
Nie, W., Liang, Q., Liu, AA., & et al. (2019). Mmjn: Multi-modal joint networks for 3d shape recognition. In: MM, pp 908–916
Ohbuchi, R., Osada, K., Furuya, T., & et al. (2008). Salient local visual features for shape-based 3d model retrieval. In: ICSMAMA, pp 93–102
Oreshkin, B., Lopez, PR., & Lacoste, A. (2018). Tadam: Task dependent adaptive metric for improved few-shot learning. In: NIPs, pp 721–731
Pham, Q. H., Tran, M. K., Li, W., et al. (2018). Shrec’18: Rgb-d object-to-cad retrieval. Proc 3DOR, 2, 2.
Google Scholar
Phong, B. T. (1975). Illumination for computer generated pictures. Communications of the ACM, 18(6), 311–317.
Article Google Scholar
Poria, S., Cambria, E., Hazarika, D., & et al. (2017). Multi-level multiple attentions for contextual multimodal sentiment analysis. In: ICDM, pp 1033–1038
Qi, CR., Su, H., Niener, M., & et al. (2016). Volumetric and multi-view cnns for object classification on 3d data. In: CVPR, pp 5648–5656
Qi, CR., Su, H., Mo, K., & et al. (2017a). Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR, pp 652–660
Qi, CR., Yi, L., Su, H., & et al. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: NIPs, pp 5099–5108
Ranzuglia, G., Callieri, M., Dellepiane, M., & et al. (2013). Meshlab as a complete tool for the integration of photos and color with high resolution 3d geometry data. In: CAA, pp 406–416
Ravi, S., & Larochelle, H. (2017). Optimization as a model for few-shot learning. In: ICLR
Santoro, A., Bartunov, S., Botvinick, M., & et al. (2016). Meta-learning with memory-augmented neural networks. In: ICML, pp 1842–1850
Sharma, C., & Kaul, M. (2020). Self-supervised few-shot learning on point clouds. Advances in Neural Information Processing Systems, 33, 7212–7221.
Google Scholar
Shen, Y., Feng, C., Yang, Y., et al. (2018). Mining point cloud local structures by kernel correlation and graph pooling. In: CVPR, pp 4548–4557
Shih, J. L., Lee, C. H., & Wang, J. T. (2007). A new 3d model retrieval approach based on the elevation descriptor. Pattern Recognition, 40(1), 283–295.
Article ADS Google Scholar
Shu, Z., Xin, S., Xu, H., et al. (2016). 3D model classification via principal thickness images. Computer-Aided Design, 78, 199–208.
Article Google Scholar
Snell, J., Swersky, K., &Zemel, R. (2017). Prototypical networks for few-shot learning. In: NIPs, pp 4077–4087
Song, R., Zhang, W., Zhao, Y., et al. (2022). Unsupervised multi-view CNN for salient view selection and 3d interest point detection. International Journal of Computer Vision, 130(5), 1210–1227.
Article Google Scholar
Su, H., Maji, S., Kalogerakis, E., & et al. (2015). Multi-view convolutional neural networks for 3d shape recognition. In: ICCV, pp 945–953
Su, JC., Gadelha, M., Wang, R., & et al. (2018). A deeper look at 3d shape classifiers. In: ECCV
Sun, Q., Liu, Y., Chua, TS., & et al. (2019). Meta-transfer learning for few-shot learning. In: CVPR, pp 403–412
Sung, F., Yang, Y., Zhang, L., & et al. (2018). Learning to compare: Relation network for few-shot learning. In: CVPR, pp 1199–1208
Uy, MA., Pham, QH., Hua, BS., & et al. (2019). Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1588–1597
Vinyals, O., Blundell, C., Lillicrap, T., & et al. (2016). Matching networks for one shot learning. In: NIPs, pp 3630–3638
Wang, Y., Sun, Y., Liu, Z., et al. (2019). Dynamic graph CNN for learning on point clouds. TOG, 38(5), 1–12.
Article Google Scholar
Wang, Y., Sun, Y., Liu, Z., et al. (2019). Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 38(5), 1–12.
Article Google Scholar
Wang, YX., Girshick, R., Hebert, M., & et al. (2018). Low-shot learning from imaginary data. In: CVPR, pp 7278–7286
Wei, X., Yu, R., & Sun, J. (2020). View-gcn: View-based graph convolutional network for 3d shape analysis. In: CVPR, pp 1847–1856
Wu, Z., Song, S., Khosla, A., & et al. (2015). 3d shapenets: A deep representation for volumetric shapes. In: CVPR, pp 1912–1920
Xu, Y., Fan, T., Xu, M., & et al. (2018). Spidercnn: Deep learning on point sets with parameterized convolutional filters. In: ECCV, pp 87–102
Ye, C., Zhu, H., Liao, Y., & et al. (2022). What makes for effective few-shot point cloud classification? In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1829–1838
Ye, H., Hu, H., & Zhan, D. (2021). Learning adaptive classifiers synthesis for generalized few-shot learning. International Journal of Computer Vision, 129(6), 1930–1953.
Article Google Scholar
You, H., Feng, Y., Ji, R., & et al. (2018). Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In: MM, pp 1310–1318
You, H., Feng, Y., Zhao, X., & et al. (2019). Pvrnet: Point-view relation neural network for 3d shape recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 9119–9126
Yu, T., Meng, J., & Yuan, J. (2018). Multi-view harmonized bilinear network for 3d object recognition. In: CVPR, pp 186–194
Zhang, Z., Hua, B., & Yeung, S. (2022). Riconv++: Effective rotation invariant convolutions for 3d point clouds deep learning. International Journal of Computer Vision, 130(5), 1228–1243.
Article Google Scholar
Zhou, H., Liu, A. A., Nie, W., et al. (2020). Multi-view saliency guided deep neural network for 3-d object retrieval and classification. IEEE Transactions on Multimedia, 22(6), 1496–1506.
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (U21B2024).

Author information

Authors and Affiliations

The School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
He-Yu Zhou, An-An Liu, Chen-Yu Zhang, Ping Zhu & Qian-Yi Zhang
The School of Computing, National University of Singapore, Singapore, 117543, Singapore
Mohan Kankanhalli

Authors

He-Yu Zhou
View author publications
You can also search for this author inPubMed Google Scholar
An-An Liu
View author publications
You can also search for this author inPubMed Google Scholar
Chen-Yu Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Ping Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Qian-Yi Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Mohan Kankanhalli
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to An-An Liu.

Additional information

Communicated by Jianfei Cai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, HY., Liu, AA., Zhang, CY. et al. Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification. Int J Comput Vis 132, 673–688 (2024). https://doi.org/10.1007/s11263-023-01905-8

Download citation

Received: 02 June 2022
Accepted: 07 September 2023
Published: 05 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11263-023-01905-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantically guided projection for zero-shot 3D model classification and retrieval

A Closer Look at Few-Shot 3D Point Cloud Classification

Multi-modal Relation Distillation for Unified 3D Representation Learning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now