Skip to main content
Log in

Meta Attention-Generation Network for Cross-Granularity Few-Shot Learning

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Fine-grained classification with few labeled samples has urgent needs in practice since fine-grained samples are more difficult and expensive to collect and annotate. Standard few-shot learning (FSL) focuses on generalising across seen and unseen classes, where the classes are at the same level of granularity. Therefore, when applying existing FSL methods to tackle this problem, large amounts of labeled samples for some fine-grained classes are required. Since samples of coarse-grained classes are much cheaper and easier to obtain, it is desired to learn knowledge from coarse-grained categories that can be transferred to fine-grained classes with a few samples. In this paper, we propose a novel learning problem called cross-granularity few-shot learning (CG-FSL), where sufficient samples of coarse-grained classes are available for training, but in the test stage, the goal is to classify the fine-grained subclasses. This learning paradigm follows the laws of cognitive neurology. We first give an analysis of CG-FSL through the Structural Causal Model (SCM) and figure out that the standard FSL model learned at the coarse-grained level is actually a confounder. We thus perform backdoor adjustment to decouple the interferences and consequently derive a causal CG-FSL model called Meta Attention-Generation Network (MAGN), which is trained in a bilevel optimization manner. We construct benchmarks from several fine-grained image datasets for the CG-FSL problem and empirically show that our model significantly outperforms standard FSL methods and baseline CG-FSL methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availibility Statement

The CUB-200 dataset analysed during the current study is available at https://resolver.caltech.edu/CaltechAUTHORS:20111026-155425465. The Stanford Car dataset is available at http://ai.stanford.edu/jkrause/cars/car_dataset.html. The Stanford Dog dataset is available at http://vision.stanford.edu/aditya86/ImageNetDogs/main.html. The FGVC-Aircraft dataset is available at https://www.robots.ox.ac.uk/vgg/data/fgvc-aircraft/. The Oxford Flower dataset is available at https://www.robots.ox.ac.uk/vgg/data/flowers/102/. The Veg200 dataset is available at https://github.com/ustc-vim/vegfru. The Meta-iNat is available at https://github.com/visipedia/inat-comp/tree/master/2017. The Meta-Datas et is available at https://github.com/google-research/me ta-dataset. The tieredImageNet is available at https://ba ir.berkeley.edu/blog/2017/07/18/learning-to-learn/. The miniImageNet is available at https://github.com/twitter-research/meta-learning-lstm.

References

  • Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W,. Pfau, D., Schaul, T., Shillingford, B., De Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In: Advances in neural information processing systems, pp. 3981–3989.

  • Cai, Q., Pan, Y., Yao, T., Yan, C., & Mei, T. (2018). Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4080–4088

  • Chen, J., Niu, L., Liu, L., & Zhang, L. (2020a). Weak-shot fine-grained classification via similarity transfer. arXiv preprint arXiv:2009.09197

  • Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020b). A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709

  • Chen, W. Y., Liu, Y. C., Kira, Z., Wang, Y. C. F., & Huang, J. B. (2019). A closer look at few-shot classification. arXiv preprint arXiv:1904.04232

  • Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014). Describing textures in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3606–3613

  • Doersch, C., Gupta, A., & Zisserman, A. (2020). Crosstransformers: Spatially-aware few-shot transfer. Advances in Neural Information Processing Systems, 33, 21981–21993.

    Google Scholar 

  • Dong, C., Li, W., Huo, J., Gu, Z., & Gao, Y. (2020). Learning task-aware local representations for few-shot learning, pp. 716–722

  • Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400

  • Flennerhag, S., Schroecker, Y., Zahavy, T., van Hasselt, H., Silver, D., & Singh, S. (2021). Bootstrapped meta-learning. arXiv preprint arXiv:2109.04504

  • Fu, J., Zheng, H., & Mei, T. (2017). Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4438–4446.

  • Garcia V, Bruna J (2017) Few-shot learning with graph neural networks

  • Hou, R., Chang, H., Bingpeng, M., Shan, S., & Chen, X. (2019). Cross attention network for few-shot classification. Advances in Neural Information Processing Systems, pp 4003–4014

  • Hou, S., Feng, Y., & Wang, Z. (2017). Vegfru: A domain-specific dataset for fine-grained visual categorization. In: 2017 IEEE international conference on computer vision (ICCV)

  • Hu, S. X., Moreno, P. G., Xiao, Y., Shen, X., Obozinski, G., Lawrence, N. D., & Damianou, A. C. (2020). Empirical bayes transductive meta-learning with synthetic gradients. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, OpenReview.net

  • Huang, Z., & Li, Y. (2020). Interpretable and accurate fine-grained recognition via region grouping. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8662–8672

  • Jonas, J., Rowley, H., Kawashima, J., Kim, T., Fox-Gieg, N. (2016) The quick, draw!—a.i. experiment. In: quickdraw.withgoogle.com

  • Judea, P., Madelyn, G., Nicholas, P. (2020). Causal inference in statistics: A primer

  • Khosla, A., Jayadevaprakash, N., Yao, B., & Li, F. F. (2011). Novel dataset for fine-grained image categorization: Stanford dogs. In: Proceedings of CVPR workshop on fine-grained visual categorization (FGVC), vol 2

  • Koch, G., Zemel, R., Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop, Lille, vol 2

  • Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561

  • Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.

    Article  MathSciNet  MATH  Google Scholar 

  • Larochelle, S. (2017). Optimization as a model for few-shot learning

  • Le, D., Nguyen, K. D., Nguyen, K., Tran, Q. H., Nguyen, R., & Hua, B. S. (2021). Poodle: Improving few-shot learning via penalizing out-of-distribution samples. Advances in Neural Information Processing Systems, 34, 23942–23955.

    Google Scholar 

  • Li, H., Dong, W., Mei, X., Ma, C., Huang, F., & Hu, B. G. (2019a). Lgm-net: Learning to generate matching networks for few-shot learning. arXiv preprint arXiv:1905.06331

  • Li, W., Wang, L., Xu, J., Huo, J., Gao, Y., & Luo, J. (2019b). Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7260–7268

  • Li, X., Wu, J., Sun, Z., Ma, Z., Cao, J., & Xue, J. H. (2020). Bsnet: Bi-similarity network for few-shot fine-grained image classification. IEEE Transactions on Image Processing

  • Li, Z., Yang, Y., Liu, X., Zhou, F., Wen, S., & Xu, W. (2017a). Dynamic computational time for visual attention. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1199–1209

  • Li, Z, Zhou, F., Chen, F., & Li, H. (2017b). Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835

  • Lin, D., Shen, X., Lu, C., Jia, J. (2015). Deep lac: Deep localization, alignment and classification for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1666–1674

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755

  • Liu, L., Zhou, T., Long, G., Jiang, J., Zhang, C. (2020). Many-class few-shot learning on multi-granularity class hierarchy. IEEE Transactions on Knowledge and Data Engineering

  • Liu, Y., Lee, J., Park, M., Kim, S., Yang, E., Hwang, S. J., & Yang, Y. (2018). Learning to propagate labels: Transductive propagation network for few-shot learning. arXiv preprint arXiv:1805.10002

  • Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. HAL - INRIA

  • Munkhdalai, T., & Yu, H. (2017). Meta networks. Proceedings of Machine Learning Research, 70, 2554.

    Google Scholar 

  • Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes.

  • Oreshkin, B.N., Rodriguez, P., & Lacoste, A. (2020). Tadam: Task dependent adaptive metric for improved few-shot learning. In: NeurIPS

  • Phoo, C. P., & Hariharan, B. (2021). Coarsely-labeled data for better few-shot transfer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9052–9061

  • Ren, M., Triantafillou, E., Ravi, S., Snell, J., & Swersky, K. (2018a). Meta-learning for semi-supervised few-shot classification

  • Ren, M., Triantafillou, E., Ravi, S., Snell, J., Swersky, K., Tenenbaum, J. B., Larochelle, H., & Zemel, R. S. (2018b). Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Rusu, A.A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., Hadsell, R. (2019). Meta-learning with latent embedding optimization. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net

  • Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In: International conference on machine learning, pp 1842–1850

  • Schroeder, B., Cui, Y. (2018). Fgvcx fungi classification challenge 2018. Available online: github com/visipedia/fgvcx_fungi_comp. Accessed 14 July 2021

  • Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., & Igel, C. (2013). Detection of traffic signs in real-world images: The german traffic sign detection benchmark. In: International joint conference on neural networks

  • Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 1143–1151

  • Simon, M., Rodner, E., & Denzler, J. (2014). Part detector discovery in deep convolutional neural networks. In: Asian conference on computer vision, Springer, pp 162–177

  • Snell, J. Swersky, K. & Zemel, R. (2017). Prototypical networks for few-shot learning. In: Advances in neural information processing systems, pp 4077–4087

  • Su, J. C., Maji, S., & Hariharan, B. (2020). When does self-supervision improve few-shot learning? In: European conference on computer vision, Springer, pp 645–666

  • Su, J. C., Maji, S., & Hariharan, B. (2020). When does self-supervision improve few-shot learning? In: European conference on computer vision, Springer, pp 645–666

  • Sun, Q., Liu, Y., Chua, T., Schiele, B. (2019). Meta-transfer learning for few-shot learning. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, computer vision foundation/IEEE, p.p 403–412

  • Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M. (2018). Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1199–1208

  • Tang, L., Wertheimer, D., & Hariharan, B. (2020). Revisiting pose-normalization for fine-grained few-shot recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14352–14361

  • Triantafillou, E., Zhu, T., Dumoulin, V., Lamblin, P., Evci, U., Xu, K., Goroshin, R., Gelada, C., Swersky, K., & Manzagol, P. A. (2019a). Meta-dataset: A dataset of datasets for learning to learn from few examples

  • Triantafillou, E., Zhu, T., Dumoulin, V., Lamblin, P., Evci, U., Xu, K., Goroshin, R., Gelada, C, Swersky, K., Manzagol PA, et al. (2019b). Meta-dataset: A dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096

  • Tseng, H. Y., Lee, H. Y., Huang, J. B., & Yang, M. H. (2020). Cross-domain few-shot classification via learned feature-wise transformation

  • Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., Belongie, S. (2018). The inaturalist species classification and detection dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8769–8778

  • Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al. (2016) Matching networks for one shot learning. Advances in neural Information Processing Systems, pp. 3630–3638

  • Wang, Y., Morariu, V. I., & Davis, L. S. (2018). Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4148–4157

  • Wang, Y., Chao, W. L., Weinberger, K. Q.,, & van der Maaten, L. (2019). Simpleshot: Revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623

  • Wang, Y., Zhang, L., Yao, Y., & Fu, Y. (2021). How to trust unlabeled data instance credibility inference for few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence

  • Wang, Z., Wang, S., Yang, S., Li, H., Li, J., & Li, Z. (2020). Weakly supervised fine-grained image classification via guassian mixture model oriented discriminative learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9749–9758

  • Wei, X. S., Xie, C. W., Wu, J., & Shen, C. (2018). Mask-cnn: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognition, 76, 704–714.

    Article  Google Scholar 

  • Wei , X. S., Wu, J., & Cui, Q. (2019). Deep learning for fine-grained image analysis: A survey. arXiv preprint arXiv:1907.03069

  • Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., & Perona, P. (2010). Caltech-ucsd birds 200

  • Wertheimer, D., Tang, L., & Hariharan, B. (2021). Few-shot classification with feature map reconstruction networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8012–8021

  • Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 842–850

  • Xie, J., Long, F., Lv, J., Wang, Q., & Li, P. (2022). Joint distribution matters: Deep brownian distance covariance for few-shot classification. arXiv preprint arXiv:2204.04567

  • Ye, H. J., & Chao, W. L. (2021). How to train your maml to excel in few-shot classification. arXiv preprint arXiv:2106.16245

  • Ye, H. J., Hu, H., Zhan, D. C., Sha, F. (2020). Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8808–8817

  • Yue, Z., Zhang, H., Sun, Q., Hua, X.S. (2020). Interventional few-shot learning. arXiv preprint arXiv:2009.13000

  • Zhang, C., Cai, Y., Lin, G., Shen, C. (2020). Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12203–12213

  • Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., Metaxas, D. (2016a). Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1143–1152

  • Zhang, N., Donahue, J., Girshick, R., Darrell, T. (2014). Part-based r-cnns for fine-grained category detection. In: European conference on computer vision, Springer, pp 834–849

  • Zhang X, Xiong H, Zhou W, Lin W, Tian Q (2016b) Picking deep filter responses for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1134–1142

  • Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 5209–5217

  • Zheng, H., Fu, J., Zha, Z.J., Luo, J. (2019). Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5012–5021

  • Zhu, Y., Li, R., Yang, Y., & Ye, N. (2020). Learning cascade attention for fine-grained image classification. Neural Networks, 122, 174–182.

    Article  Google Scholar 

  • Zhu, Y., Liu, C., Jiang, S. (2020b). Multi-attention meta learning for few-shot fine-grained image recognition. In: Twenty-ninth international joint conference on artificial intelligence and seventeenth pacific rim international conference on artificial intelligence, pp. 1090–1096

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China No. 61976206 and No. 61832017, Beijing Outstanding Young Scientist Program NO. BJJWZYJH012019100020098, Foshan HKUST Projects (FSUST21-FYTRI01A, FSUST21-FY TRI02A), Beijing Academy of Artificial Intelligence (BAAI), the Fundamental Research Funds for the Central Universities, the Research Funds of Renmin University of China 21XNLG05, and Public Computing Cloud, Renmin University of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Su.

Additional information

Communicated by Bumsub Ham.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiang, W., Li, J., Su, B. et al. Meta Attention-Generation Network for Cross-Granularity Few-Shot Learning. Int J Comput Vis 131, 1211–1233 (2023). https://doi.org/10.1007/s11263-023-01760-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01760-7

Keywords

Navigation