Abstract
Nowadays, driven by the increasing concern on 3D techniques, resulting in the large-scale 3D data, 3D model classification has attracted enormous attention from both research and industry communities. Most of the current methods highly depend on sufficient labeled 3D models, which substantially restricts their scalability to novel classes with few annotated training data since it can increase the chance of overfitting. Besides, they only leverage single-modal information (either point cloud or multi-view information), and few works integrate these complementary information for 3D model representation. To overcome these problems, we propose a multi-modal meta-transfer fusion network (M\(^{3}\)TF), the key of which is to perform few-shot multi-modal representation for 3D model classification. Specifically, we first convert the original 3D data into both multi-view and point cloud modalities, and pre-train individual encoding networks on a large-scale dataset to obtain the optimal initial parameters, which is beneficial to few-shot learning tasks. Then, to enable the network to adjust to few-shot learning tasks, we update the parameters in Scaling and Shifting operation (SS), multi-modal representation fusion (MMRF) and the 3D model classifier to obtain optimal initialization parameters. Since the large-scale training parameters in feature extractors will increase the chance of overfitting, we freeze the feature extractor and introduce a SS operation to adjust its weights. Specifically, SS can reduce the number of training parameters up to 20%, which can effectively avoid overfitting. MMRF can adaptively integrate the multi-modal information based on their significance to the 3D model for a more robust 3D representation. Since there is no available dataset for evaluation, we build three 3D CAD datasets, Meta-ModalNet, Meta-ShapeNet and Meta-RGBD, for this new task and implement the representative methods for fair comparisons. Extensive experimental results can demonstrate the superiority of the proposed method.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bai, S., Bai, X., Zhou, Z., et al. (2017). Gift: Towards scalable 3d shape retrieval. IEEE Transactions on Multimedia, 19(6), 1257–1271.
Bertinetto, L., Henriques, JF., Torr, PH., & et al. (2018). Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136
Chang, AX., Funkhouser, T., Guibas, L., et al. (2015). Shapenet: An information-rich 3d model repository
Chen, DY., Tian, XP., Shen, YT., et al. (2003). On visual similarity based 3d model retrieval. In: Computer graphics forum, pp 223–232
Chen, WY., Liu, YC., Kira, Z., et al. (2019a). A closer look at few-shot classification. arXiv preprint arXiv:1904.04232
Chen, X., Ma, H., Wan, J., & et al. (2017). Multi-view 3d object detection network for autonomous driving. In: CVPR, pp 1907–1915
Chen, Z., Fu, Y., Zhang, Y., et al. (2019). Multi-level semantic feature augmentation for one-shot learning. IEEE Transactions on Image Processing, 28(9), 4594–4605.
Dan, Z. (2017). Study on interior decoration system design based on 3d scene modeling technology. In: ICSGEA, pp 380–383
Deng, J., Dong, W., Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In: CVPR, pp 248–255
Dyn, N., Levine, D., & Gregory, J. A. (1990). A butterfly subdivision scheme for surface interpolation with tension control. ACM Transactions on Graphics (TOG), 9(2), 160–169.
Feng, Y., Zhang, Z., Zhao, X., et al. (2018). Gvcnn: Group-view convolutional neural networks for 3d shape recognition. In: CVPR, pp 264–272
Finn, C., Abbeel, P., Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, pp 1126–1135
Finn, C., Xu, K., Levine, S. (2018). Probabilistic model-agnostic meta-learning. In: NIPs, pp 9516–9527
Gao, Y., Tang, J., Hong, R., et al. (2011). Camera constraint-free view-based 3-d object retrieval. IEEE Transactions on Image Processing, 21(4), 2269–2281.
Garcia, V., & Bruna, J. (2017). Few-shot learning with graph neural networks. arXiv preprint arXiv:1711.04043
Grant, E., Finn, C., Levine, S., & et al. (2018). Recasting gradient-based meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930
He, K., Zhang, X., Ren, S., & et al. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: ICCV, pp 1026–1034
Hegde, V., & Zadeh, R. (2016). Fusionnet: 3d object classification using multiple data representations. arXiv preprint arXiv:1607.05695
Hinton, GE., & Plaut, DC. (1987). Using fast weights to deblur old memories. In: CCSs, pp 177–186
Hou, R., Chang, H., Bingpeng, M., & et al. (2019). Cross attention network for few-shot classification. In: NIPs, pp 4003–4014
Hu, SX., Moreno, PG., Xiao, Y., & et al. (2020). Empirical bayes transductive meta-learning with synthetic gradients. arXiv preprint arXiv:2004.12696
Jaritz, M., Vu, TH., Charette, Rd., & et al. (2020). xmuda: Cross-modal unsupervised domain adaptation for 3d semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12,605–12,614
Jiang, J., Bao, D., Chen, Z., & et al. (2019). Mlvcnn: Multi-loop-view convolutional neural network for 3d shape retrieval. In: AAAI, pp 8513–8520
Lee, Y., & Choi, S. (2018). Gradient-based meta-learning with learned layerwise metric and subspace. arXiv preprint arXiv:1801.05558
Li, H., Dong, W., Mei, X., & et al. (2019). Lgm-net: Learning to generate matching networks for few-shot learning. arXiv preprint arXiv:1905.06331
Li, J., Chen, BM., & Hee Lee, G. (2018). So-net: Self-organizing network for point cloud analysis. In: CVPR, pp 9397–9406
Lin, TY., Maire, M., Belongie, SJ., & et al. (2014). Microsoft COCO: Common objects in context. In: ECCV, pp 740–755
Liu, A., Xiang, S., Li, W., & et al. (2018a). Cross-domain 3d model retrieval via visual domain adaptation. In: IJCAI, pp 828–834
Liu, X., Han, Z., Liu, YS., & et al. (2019). Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In: AAAI, pp 8778–8785
Liu, Y., Lee, J., Park, M., & et al. (2018b). Learning to propagate labels: Transductive propagation network for few-shot learning. arXiv preprint arXiv:1805.10002
Mishra, N., Rohaninejad, M., Chen, X., & et al, (2017). A simple neural attentive meta-learner. arXiv preprint arXiv:1707.03141
Munkhdalai, T., & Yu, H. (2017). Meta networks. In: ICML, pp 2554–2563
Nichol, A., & Schulman, J. (2018). Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999
Nie, W., Liang, Q., Liu, AA., & et al. (2019). Mmjn: Multi-modal joint networks for 3d shape recognition. In: MM, pp 908–916
Ohbuchi, R., Osada, K., Furuya, T., & et al. (2008). Salient local visual features for shape-based 3d model retrieval. In: ICSMAMA, pp 93–102
Oreshkin, B., Lopez, PR., & Lacoste, A. (2018). Tadam: Task dependent adaptive metric for improved few-shot learning. In: NIPs, pp 721–731
Pham, Q. H., Tran, M. K., Li, W., et al. (2018). Shrec’18: Rgb-d object-to-cad retrieval. Proc 3DOR, 2, 2.
Phong, B. T. (1975). Illumination for computer generated pictures. Communications of the ACM, 18(6), 311–317.
Poria, S., Cambria, E., Hazarika, D., & et al. (2017). Multi-level multiple attentions for contextual multimodal sentiment analysis. In: ICDM, pp 1033–1038
Qi, CR., Su, H., Niener, M., & et al. (2016). Volumetric and multi-view cnns for object classification on 3d data. In: CVPR, pp 5648–5656
Qi, CR., Su, H., Mo, K., & et al. (2017a). Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR, pp 652–660
Qi, CR., Yi, L., Su, H., & et al. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: NIPs, pp 5099–5108
Ranzuglia, G., Callieri, M., Dellepiane, M., & et al. (2013). Meshlab as a complete tool for the integration of photos and color with high resolution 3d geometry data. In: CAA, pp 406–416
Ravi, S., & Larochelle, H. (2017). Optimization as a model for few-shot learning. In: ICLR
Santoro, A., Bartunov, S., Botvinick, M., & et al. (2016). Meta-learning with memory-augmented neural networks. In: ICML, pp 1842–1850
Sharma, C., & Kaul, M. (2020). Self-supervised few-shot learning on point clouds. Advances in Neural Information Processing Systems, 33, 7212–7221.
Shen, Y., Feng, C., Yang, Y., et al. (2018). Mining point cloud local structures by kernel correlation and graph pooling. In: CVPR, pp 4548–4557
Shih, J. L., Lee, C. H., & Wang, J. T. (2007). A new 3d model retrieval approach based on the elevation descriptor. Pattern Recognition, 40(1), 283–295.
Shu, Z., Xin, S., Xu, H., et al. (2016). 3D model classification via principal thickness images. Computer-Aided Design, 78, 199–208.
Snell, J., Swersky, K., &Zemel, R. (2017). Prototypical networks for few-shot learning. In: NIPs, pp 4077–4087
Song, R., Zhang, W., Zhao, Y., et al. (2022). Unsupervised multi-view CNN for salient view selection and 3d interest point detection. International Journal of Computer Vision, 130(5), 1210–1227.
Su, H., Maji, S., Kalogerakis, E., & et al. (2015). Multi-view convolutional neural networks for 3d shape recognition. In: ICCV, pp 945–953
Su, JC., Gadelha, M., Wang, R., & et al. (2018). A deeper look at 3d shape classifiers. In: ECCV
Sun, Q., Liu, Y., Chua, TS., & et al. (2019). Meta-transfer learning for few-shot learning. In: CVPR, pp 403–412
Sung, F., Yang, Y., Zhang, L., & et al. (2018). Learning to compare: Relation network for few-shot learning. In: CVPR, pp 1199–1208
Uy, MA., Pham, QH., Hua, BS., & et al. (2019). Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1588–1597
Vinyals, O., Blundell, C., Lillicrap, T., & et al. (2016). Matching networks for one shot learning. In: NIPs, pp 3630–3638
Wang, Y., Sun, Y., Liu, Z., et al. (2019). Dynamic graph CNN for learning on point clouds. TOG, 38(5), 1–12.
Wang, Y., Sun, Y., Liu, Z., et al. (2019). Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 38(5), 1–12.
Wang, YX., Girshick, R., Hebert, M., & et al. (2018). Low-shot learning from imaginary data. In: CVPR, pp 7278–7286
Wei, X., Yu, R., & Sun, J. (2020). View-gcn: View-based graph convolutional network for 3d shape analysis. In: CVPR, pp 1847–1856
Wu, Z., Song, S., Khosla, A., & et al. (2015). 3d shapenets: A deep representation for volumetric shapes. In: CVPR, pp 1912–1920
Xu, Y., Fan, T., Xu, M., & et al. (2018). Spidercnn: Deep learning on point sets with parameterized convolutional filters. In: ECCV, pp 87–102
Ye, C., Zhu, H., Liao, Y., & et al. (2022). What makes for effective few-shot point cloud classification? In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1829–1838
Ye, H., Hu, H., & Zhan, D. (2021). Learning adaptive classifiers synthesis for generalized few-shot learning. International Journal of Computer Vision, 129(6), 1930–1953.
You, H., Feng, Y., Ji, R., & et al. (2018). Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In: MM, pp 1310–1318
You, H., Feng, Y., Zhao, X., & et al. (2019). Pvrnet: Point-view relation neural network for 3d shape recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 9119–9126
Yu, T., Meng, J., & Yuan, J. (2018). Multi-view harmonized bilinear network for 3d object recognition. In: CVPR, pp 186–194
Zhang, Z., Hua, B., & Yeung, S. (2022). Riconv++: Effective rotation invariant convolutions for 3d point clouds deep learning. International Journal of Computer Vision, 130(5), 1228–1243.
Zhou, H., Liu, A. A., Nie, W., et al. (2020). Multi-view saliency guided deep neural network for 3-d object retrieval and classification. IEEE Transactions on Multimedia, 22(6), 1496–1506.
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (U21B2024).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jianfei Cai.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, HY., Liu, AA., Zhang, CY. et al. Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification. Int J Comput Vis 132, 673–688 (2024). https://doi.org/10.1007/s11263-023-01905-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01905-8