Skip to main content
Log in

Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Nowadays, driven by the increasing concern on 3D techniques, resulting in the large-scale 3D data, 3D model classification has attracted enormous attention from both research and industry communities. Most of the current methods highly depend on sufficient labeled 3D models, which substantially restricts their scalability to novel classes with few annotated training data since it can increase the chance of overfitting. Besides, they only leverage single-modal information (either point cloud or multi-view information), and few works integrate these complementary information for 3D model representation. To overcome these problems, we propose a multi-modal meta-transfer fusion network (M\(^{3}\)TF), the key of which is to perform few-shot multi-modal representation for 3D model classification. Specifically, we first convert the original 3D data into both multi-view and point cloud modalities, and pre-train individual encoding networks on a large-scale dataset to obtain the optimal initial parameters, which is beneficial to few-shot learning tasks. Then, to enable the network to adjust to few-shot learning tasks, we update the parameters in Scaling and Shifting operation (SS), multi-modal representation fusion (MMRF) and the 3D model classifier to obtain optimal initialization parameters. Since the large-scale training parameters in feature extractors will increase the chance of overfitting, we freeze the feature extractor and introduce a SS operation to adjust its weights. Specifically, SS can reduce the number of training parameters up to 20%, which can effectively avoid overfitting. MMRF can adaptively integrate the multi-modal information based on their significance to the 3D model for a more robust 3D representation. Since there is no available dataset for evaluation, we build three 3D CAD datasets, Meta-ModalNet, Meta-ShapeNet and Meta-RGBD, for this new task and implement the representative methods for fair comparisons. Extensive experimental results can demonstrate the superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bai, S., Bai, X., Zhou, Z., et al. (2017). Gift: Towards scalable 3d shape retrieval. IEEE Transactions on Multimedia, 19(6), 1257–1271.

    Article  Google Scholar 

  • Bertinetto, L., Henriques, JF., Torr, PH., & et al. (2018). Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136

  • Chang, AX., Funkhouser, T., Guibas, L., et al. (2015). Shapenet: An information-rich 3d model repository

  • Chen, DY., Tian, XP., Shen, YT., et al. (2003). On visual similarity based 3d model retrieval. In: Computer graphics forum, pp 223–232

  • Chen, WY., Liu, YC., Kira, Z., et al. (2019a). A closer look at few-shot classification. arXiv preprint arXiv:1904.04232

  • Chen, X., Ma, H., Wan, J., & et al. (2017). Multi-view 3d object detection network for autonomous driving. In: CVPR, pp 1907–1915

  • Chen, Z., Fu, Y., Zhang, Y., et al. (2019). Multi-level semantic feature augmentation for one-shot learning. IEEE Transactions on Image Processing, 28(9), 4594–4605.

    Article  MathSciNet  ADS  Google Scholar 

  • Dan, Z. (2017). Study on interior decoration system design based on 3d scene modeling technology. In: ICSGEA, pp 380–383

  • Deng, J., Dong, W., Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In: CVPR, pp 248–255

  • Dyn, N., Levine, D., & Gregory, J. A. (1990). A butterfly subdivision scheme for surface interpolation with tension control. ACM Transactions on Graphics (TOG), 9(2), 160–169.

    Article  Google Scholar 

  • Feng, Y., Zhang, Z., Zhao, X., et al. (2018). Gvcnn: Group-view convolutional neural networks for 3d shape recognition. In: CVPR, pp 264–272

  • Finn, C., Abbeel, P., Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, pp 1126–1135

  • Finn, C., Xu, K., Levine, S. (2018). Probabilistic model-agnostic meta-learning. In: NIPs, pp 9516–9527

  • Gao, Y., Tang, J., Hong, R., et al. (2011). Camera constraint-free view-based 3-d object retrieval. IEEE Transactions on Image Processing, 21(4), 2269–2281.

    Article  MathSciNet  PubMed  ADS  Google Scholar 

  • Garcia, V., & Bruna, J. (2017). Few-shot learning with graph neural networks. arXiv preprint arXiv:1711.04043

  • Grant, E., Finn, C., Levine, S., & et al. (2018). Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930

  • He, K., Zhang, X., Ren, S., & et al. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: ICCV, pp 1026–1034

  • Hegde, V., & Zadeh, R. (2016). Fusionnet: 3d object classification using multiple data representations. arXiv preprint arXiv:1607.05695

  • Hinton, GE., & Plaut, DC. (1987). Using fast weights to deblur old memories. In: CCSs, pp 177–186

  • Hou, R., Chang, H., Bingpeng, M., & et al. (2019). Cross attention network for few-shot classification. In: NIPs, pp 4003–4014

  • Hu, SX., Moreno, PG., Xiao, Y., & et al. (2020). Empirical bayes transductive meta-learning with synthetic gradients. arXiv preprint arXiv:2004.12696

  • Jaritz, M., Vu, TH., Charette, Rd., & et al. (2020). xmuda: Cross-modal unsupervised domain adaptation for 3d semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12,605–12,614

  • Jiang, J., Bao, D., Chen, Z., & et al. (2019). Mlvcnn: Multi-loop-view convolutional neural network for 3d shape retrieval. In: AAAI, pp 8513–8520

  • Lee, Y., & Choi, S. (2018). Gradient-based meta-learning with learned layerwise metric and subspace. arXiv preprint arXiv:1801.05558

  • Li, H., Dong, W., Mei, X., & et al. (2019). Lgm-net: Learning to generate matching networks for few-shot learning. arXiv preprint arXiv:1905.06331

  • Li, J., Chen, BM., & Hee Lee, G. (2018). So-net: Self-organizing network for point cloud analysis. In: CVPR, pp 9397–9406

  • Lin, TY., Maire, M., Belongie, SJ., & et al. (2014). Microsoft COCO: Common objects in context. In: ECCV, pp 740–755

  • Liu, A., Xiang, S., Li, W., & et al. (2018a). Cross-domain 3d model retrieval via visual domain adaptation. In: IJCAI, pp 828–834

  • Liu, X., Han, Z., Liu, YS., & et al. (2019). Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In: AAAI, pp 8778–8785

  • Liu, Y., Lee, J., Park, M., & et al. (2018b). Learning to propagate labels: Transductive propagation network for few-shot learning. arXiv preprint arXiv:1805.10002

  • Mishra, N., Rohaninejad, M., Chen, X., & et al, (2017). A simple neural attentive meta-learner. arXiv preprint arXiv:1707.03141

  • Munkhdalai, T., & Yu, H. (2017). Meta networks. In: ICML, pp 2554–2563

  • Nichol, A., & Schulman, J. (2018). Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999

  • Nie, W., Liang, Q., Liu, AA., & et al. (2019). Mmjn: Multi-modal joint networks for 3d shape recognition. In: MM, pp 908–916

  • Ohbuchi, R., Osada, K., Furuya, T., & et al. (2008). Salient local visual features for shape-based 3d model retrieval. In: ICSMAMA, pp 93–102

  • Oreshkin, B., Lopez, PR., & Lacoste, A. (2018). Tadam: Task dependent adaptive metric for improved few-shot learning. In: NIPs, pp 721–731

  • Pham, Q. H., Tran, M. K., Li, W., et al. (2018). Shrec’18: Rgb-d object-to-cad retrieval. Proc 3DOR, 2, 2.

    Google Scholar 

  • Phong, B. T. (1975). Illumination for computer generated pictures. Communications of the ACM, 18(6), 311–317.

    Article  Google Scholar 

  • Poria, S., Cambria, E., Hazarika, D., & et al. (2017). Multi-level multiple attentions for contextual multimodal sentiment analysis. In: ICDM, pp 1033–1038

  • Qi, CR., Su, H., Niener, M., & et al. (2016). Volumetric and multi-view cnns for object classification on 3d data. In: CVPR, pp 5648–5656

  • Qi, CR., Su, H., Mo, K., & et al. (2017a). Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR, pp 652–660

  • Qi, CR., Yi, L., Su, H., & et al. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: NIPs, pp 5099–5108

  • Ranzuglia, G., Callieri, M., Dellepiane, M., & et al. (2013). Meshlab as a complete tool for the integration of photos and color with high resolution 3d geometry data. In: CAA, pp 406–416

  • Ravi, S., & Larochelle, H. (2017). Optimization as a model for few-shot learning. In: ICLR

  • Santoro, A., Bartunov, S., Botvinick, M., & et al. (2016). Meta-learning with memory-augmented neural networks. In: ICML, pp 1842–1850

  • Sharma, C., & Kaul, M. (2020). Self-supervised few-shot learning on point clouds. Advances in Neural Information Processing Systems, 33, 7212–7221.

    Google Scholar 

  • Shen, Y., Feng, C., Yang, Y., et al. (2018). Mining point cloud local structures by kernel correlation and graph pooling. In: CVPR, pp 4548–4557

  • Shih, J. L., Lee, C. H., & Wang, J. T. (2007). A new 3d model retrieval approach based on the elevation descriptor. Pattern Recognition, 40(1), 283–295.

    Article  ADS  Google Scholar 

  • Shu, Z., Xin, S., Xu, H., et al. (2016). 3D model classification via principal thickness images. Computer-Aided Design, 78, 199–208.

    Article  Google Scholar 

  • Snell, J., Swersky, K., &Zemel, R. (2017). Prototypical networks for few-shot learning. In: NIPs, pp 4077–4087

  • Song, R., Zhang, W., Zhao, Y., et al. (2022). Unsupervised multi-view CNN for salient view selection and 3d interest point detection. International Journal of Computer Vision, 130(5), 1210–1227.

    Article  Google Scholar 

  • Su, H., Maji, S., Kalogerakis, E., & et al. (2015). Multi-view convolutional neural networks for 3d shape recognition. In: ICCV, pp 945–953

  • Su, JC., Gadelha, M., Wang, R., & et al. (2018). A deeper look at 3d shape classifiers. In: ECCV

  • Sun, Q., Liu, Y., Chua, TS., & et al. (2019). Meta-transfer learning for few-shot learning. In: CVPR, pp 403–412

  • Sung, F., Yang, Y., Zhang, L., & et al. (2018). Learning to compare: Relation network for few-shot learning. In: CVPR, pp 1199–1208

  • Uy, MA., Pham, QH., Hua, BS., & et al. (2019). Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1588–1597

  • Vinyals, O., Blundell, C., Lillicrap, T., & et al. (2016). Matching networks for one shot learning. In: NIPs, pp 3630–3638

  • Wang, Y., Sun, Y., Liu, Z., et al. (2019). Dynamic graph CNN for learning on point clouds. TOG, 38(5), 1–12.

    Article  Google Scholar 

  • Wang, Y., Sun, Y., Liu, Z., et al. (2019). Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 38(5), 1–12.

    Article  Google Scholar 

  • Wang, YX., Girshick, R., Hebert, M., & et al. (2018). Low-shot learning from imaginary data. In: CVPR, pp 7278–7286

  • Wei, X., Yu, R., & Sun, J. (2020). View-gcn: View-based graph convolutional network for 3d shape analysis. In: CVPR, pp 1847–1856

  • Wu, Z., Song, S., Khosla, A., & et al. (2015). 3d shapenets: A deep representation for volumetric shapes. In: CVPR, pp 1912–1920

  • Xu, Y., Fan, T., Xu, M., & et al. (2018). Spidercnn: Deep learning on point sets with parameterized convolutional filters. In: ECCV, pp 87–102

  • Ye, C., Zhu, H., Liao, Y., & et al. (2022). What makes for effective few-shot point cloud classification? In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1829–1838

  • Ye, H., Hu, H., & Zhan, D. (2021). Learning adaptive classifiers synthesis for generalized few-shot learning. International Journal of Computer Vision, 129(6), 1930–1953.

    Article  Google Scholar 

  • You, H., Feng, Y., Ji, R., & et al. (2018). Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In: MM, pp 1310–1318

  • You, H., Feng, Y., Zhao, X., & et al. (2019). Pvrnet: Point-view relation neural network for 3d shape recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 9119–9126

  • Yu, T., Meng, J., & Yuan, J. (2018). Multi-view harmonized bilinear network for 3d object recognition. In: CVPR, pp 186–194

  • Zhang, Z., Hua, B., & Yeung, S. (2022). Riconv++: Effective rotation invariant convolutions for 3d point clouds deep learning. International Journal of Computer Vision, 130(5), 1228–1243.

    Article  Google Scholar 

  • Zhou, H., Liu, A. A., Nie, W., et al. (2020). Multi-view saliency guided deep neural network for 3-d object retrieval and classification. IEEE Transactions on Multimedia, 22(6), 1496–1506.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (U21B2024).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to An-An Liu.

Additional information

Communicated by Jianfei Cai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, HY., Liu, AA., Zhang, CY. et al. Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification. Int J Comput Vis 132, 673–688 (2024). https://doi.org/10.1007/s11263-023-01905-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01905-8

Keywords