Abstract
Fine-grained classification with few labeled samples has urgent needs in practice since fine-grained samples are more difficult and expensive to collect and annotate. Standard few-shot learning (FSL) focuses on generalising across seen and unseen classes, where the classes are at the same level of granularity. Therefore, when applying existing FSL methods to tackle this problem, large amounts of labeled samples for some fine-grained classes are required. Since samples of coarse-grained classes are much cheaper and easier to obtain, it is desired to learn knowledge from coarse-grained categories that can be transferred to fine-grained classes with a few samples. In this paper, we propose a novel learning problem called cross-granularity few-shot learning (CG-FSL), where sufficient samples of coarse-grained classes are available for training, but in the test stage, the goal is to classify the fine-grained subclasses. This learning paradigm follows the laws of cognitive neurology. We first give an analysis of CG-FSL through the Structural Causal Model (SCM) and figure out that the standard FSL model learned at the coarse-grained level is actually a confounder. We thus perform backdoor adjustment to decouple the interferences and consequently derive a causal CG-FSL model called Meta Attention-Generation Network (MAGN), which is trained in a bilevel optimization manner. We construct benchmarks from several fine-grained image datasets for the CG-FSL problem and empirically show that our model significantly outperforms standard FSL methods and baseline CG-FSL methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availibility Statement
The CUB-200 dataset analysed during the current study is available at https://resolver.caltech.edu/CaltechAUTHORS:20111026-155425465. The Stanford Car dataset is available at http://ai.stanford.edu/jkrause/cars/car_dataset.html. The Stanford Dog dataset is available at http://vision.stanford.edu/aditya86/ImageNetDogs/main.html. The FGVC-Aircraft dataset is available at https://www.robots.ox.ac.uk/vgg/data/fgvc-aircraft/. The Oxford Flower dataset is available at https://www.robots.ox.ac.uk/vgg/data/flowers/102/. The Veg200 dataset is available at https://github.com/ustc-vim/vegfru. The Meta-iNat is available at https://github.com/visipedia/inat-comp/tree/master/2017. The Meta-Datas et is available at https://github.com/google-research/me ta-dataset. The tieredImageNet is available at https://ba ir.berkeley.edu/blog/2017/07/18/learning-to-learn/. The miniImageNet is available at https://github.com/twitter-research/meta-learning-lstm.
References
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W,. Pfau, D., Schaul, T., Shillingford, B., De Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In: Advances in neural information processing systems, pp. 3981–3989.
Cai, Q., Pan, Y., Yao, T., Yan, C., & Mei, T. (2018). Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4080–4088
Chen, J., Niu, L., Liu, L., & Zhang, L. (2020a). Weak-shot fine-grained classification via similarity transfer. arXiv preprint arXiv:2009.09197
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020b). A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709
Chen, W. Y., Liu, Y. C., Kira, Z., Wang, Y. C. F., & Huang, J. B. (2019). A closer look at few-shot classification. arXiv preprint arXiv:1904.04232
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014). Describing textures in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3606–3613
Doersch, C., Gupta, A., & Zisserman, A. (2020). Crosstransformers: Spatially-aware few-shot transfer. Advances in Neural Information Processing Systems, 33, 21981–21993.
Dong, C., Li, W., Huo, J., Gu, Z., & Gao, Y. (2020). Learning task-aware local representations for few-shot learning, pp. 716–722
Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400
Flennerhag, S., Schroecker, Y., Zahavy, T., van Hasselt, H., Silver, D., & Singh, S. (2021). Bootstrapped meta-learning. arXiv preprint arXiv:2109.04504
Fu, J., Zheng, H., & Mei, T. (2017). Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4438–4446.
Garcia V, Bruna J (2017) Few-shot learning with graph neural networks
Hou, R., Chang, H., Bingpeng, M., Shan, S., & Chen, X. (2019). Cross attention network for few-shot classification. Advances in Neural Information Processing Systems, pp 4003–4014
Hou, S., Feng, Y., & Wang, Z. (2017). Vegfru: A domain-specific dataset for fine-grained visual categorization. In: 2017 IEEE international conference on computer vision (ICCV)
Hu, S. X., Moreno, P. G., Xiao, Y., Shen, X., Obozinski, G., Lawrence, N. D., & Damianou, A. C. (2020). Empirical bayes transductive meta-learning with synthetic gradients. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, OpenReview.net
Huang, Z., & Li, Y. (2020). Interpretable and accurate fine-grained recognition via region grouping. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8662–8672
Jonas, J., Rowley, H., Kawashima, J., Kim, T., Fox-Gieg, N. (2016) The quick, draw!—a.i. experiment. In: quickdraw.withgoogle.com
Judea, P., Madelyn, G., Nicholas, P. (2020). Causal inference in statistics: A primer
Khosla, A., Jayadevaprakash, N., Yao, B., & Li, F. F. (2011). Novel dataset for fine-grained image categorization: Stanford dogs. In: Proceedings of CVPR workshop on fine-grained visual categorization (FGVC), vol 2
Koch, G., Zemel, R., Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop, Lille, vol 2
Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.
Larochelle, S. (2017). Optimization as a model for few-shot learning
Le, D., Nguyen, K. D., Nguyen, K., Tran, Q. H., Nguyen, R., & Hua, B. S. (2021). Poodle: Improving few-shot learning via penalizing out-of-distribution samples. Advances in Neural Information Processing Systems, 34, 23942–23955.
Li, H., Dong, W., Mei, X., Ma, C., Huang, F., & Hu, B. G. (2019a). Lgm-net: Learning to generate matching networks for few-shot learning. arXiv preprint arXiv:1905.06331
Li, W., Wang, L., Xu, J., Huo, J., Gao, Y., & Luo, J. (2019b). Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7260–7268
Li, X., Wu, J., Sun, Z., Ma, Z., Cao, J., & Xue, J. H. (2020). Bsnet: Bi-similarity network for few-shot fine-grained image classification. IEEE Transactions on Image Processing
Li, Z., Yang, Y., Liu, X., Zhou, F., Wen, S., & Xu, W. (2017a). Dynamic computational time for visual attention. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1199–1209
Li, Z, Zhou, F., Chen, F., & Li, H. (2017b). Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835
Lin, D., Shen, X., Lu, C., Jia, J. (2015). Deep lac: Deep localization, alignment and classification for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1666–1674
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755
Liu, L., Zhou, T., Long, G., Jiang, J., Zhang, C. (2020). Many-class few-shot learning on multi-granularity class hierarchy. IEEE Transactions on Knowledge and Data Engineering
Liu, Y., Lee, J., Park, M., Kim, S., Yang, E., Hwang, S. J., & Yang, Y. (2018). Learning to propagate labels: Transductive propagation network for few-shot learning. arXiv preprint arXiv:1805.10002
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. HAL - INRIA
Munkhdalai, T., & Yu, H. (2017). Meta networks. Proceedings of Machine Learning Research, 70, 2554.
Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes.
Oreshkin, B.N., Rodriguez, P., & Lacoste, A. (2020). Tadam: Task dependent adaptive metric for improved few-shot learning. In: NeurIPS
Phoo, C. P., & Hariharan, B. (2021). Coarsely-labeled data for better few-shot transfer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9052–9061
Ren, M., Triantafillou, E., Ravi, S., Snell, J., & Swersky, K. (2018a). Meta-learning for semi-supervised few-shot classification
Ren, M., Triantafillou, E., Ravi, S., Snell, J., Swersky, K., Tenenbaum, J. B., Larochelle, H., & Zemel, R. S. (2018b). Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Rusu, A.A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., Hadsell, R. (2019). Meta-learning with latent embedding optimization. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In: International conference on machine learning, pp 1842–1850
Schroeder, B., Cui, Y. (2018). Fgvcx fungi classification challenge 2018. Available online: github com/visipedia/fgvcx_fungi_comp. Accessed 14 July 2021
Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., & Igel, C. (2013). Detection of traffic signs in real-world images: The german traffic sign detection benchmark. In: International joint conference on neural networks
Simon, M., & Rodner, E. (2015). Neural activation constellations: Unsupervised part model discovery with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 1143–1151
Simon, M., Rodner, E., & Denzler, J. (2014). Part detector discovery in deep convolutional neural networks. In: Asian conference on computer vision, Springer, pp 162–177
Snell, J. Swersky, K. & Zemel, R. (2017). Prototypical networks for few-shot learning. In: Advances in neural information processing systems, pp 4077–4087
Su, J. C., Maji, S., & Hariharan, B. (2020). When does self-supervision improve few-shot learning? In: European conference on computer vision, Springer, pp 645–666
Su, J. C., Maji, S., & Hariharan, B. (2020). When does self-supervision improve few-shot learning? In: European conference on computer vision, Springer, pp 645–666
Sun, Q., Liu, Y., Chua, T., Schiele, B. (2019). Meta-transfer learning for few-shot learning. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, computer vision foundation/IEEE, p.p 403–412
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M. (2018). Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1199–1208
Tang, L., Wertheimer, D., & Hariharan, B. (2020). Revisiting pose-normalization for fine-grained few-shot recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14352–14361
Triantafillou, E., Zhu, T., Dumoulin, V., Lamblin, P., Evci, U., Xu, K., Goroshin, R., Gelada, C., Swersky, K., & Manzagol, P. A. (2019a). Meta-dataset: A dataset of datasets for learning to learn from few examples
Triantafillou, E., Zhu, T., Dumoulin, V., Lamblin, P., Evci, U., Xu, K., Goroshin, R., Gelada, C, Swersky, K., Manzagol PA, et al. (2019b). Meta-dataset: A dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096
Tseng, H. Y., Lee, H. Y., Huang, J. B., & Yang, M. H. (2020). Cross-domain few-shot classification via learned feature-wise transformation
Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., Belongie, S. (2018). The inaturalist species classification and detection dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8769–8778
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al. (2016) Matching networks for one shot learning. Advances in neural Information Processing Systems, pp. 3630–3638
Wang, Y., Morariu, V. I., & Davis, L. S. (2018). Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4148–4157
Wang, Y., Chao, W. L., Weinberger, K. Q.,, & van der Maaten, L. (2019). Simpleshot: Revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623
Wang, Y., Zhang, L., Yao, Y., & Fu, Y. (2021). How to trust unlabeled data instance credibility inference for few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence
Wang, Z., Wang, S., Yang, S., Li, H., Li, J., & Li, Z. (2020). Weakly supervised fine-grained image classification via guassian mixture model oriented discriminative learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9749–9758
Wei, X. S., Xie, C. W., Wu, J., & Shen, C. (2018). Mask-cnn: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognition, 76, 704–714.
Wei , X. S., Wu, J., & Cui, Q. (2019). Deep learning for fine-grained image analysis: A survey. arXiv preprint arXiv:1907.03069
Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., & Perona, P. (2010). Caltech-ucsd birds 200
Wertheimer, D., Tang, L., & Hariharan, B. (2021). Few-shot classification with feature map reconstruction networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8012–8021
Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 842–850
Xie, J., Long, F., Lv, J., Wang, Q., & Li, P. (2022). Joint distribution matters: Deep brownian distance covariance for few-shot classification. arXiv preprint arXiv:2204.04567
Ye, H. J., & Chao, W. L. (2021). How to train your maml to excel in few-shot classification. arXiv preprint arXiv:2106.16245
Ye, H. J., Hu, H., Zhan, D. C., Sha, F. (2020). Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8808–8817
Yue, Z., Zhang, H., Sun, Q., Hua, X.S. (2020). Interventional few-shot learning. arXiv preprint arXiv:2009.13000
Zhang, C., Cai, Y., Lin, G., Shen, C. (2020). Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12203–12213
Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., Metaxas, D. (2016a). Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1143–1152
Zhang, N., Donahue, J., Girshick, R., Darrell, T. (2014). Part-based r-cnns for fine-grained category detection. In: European conference on computer vision, Springer, pp 834–849
Zhang X, Xiong H, Zhou W, Lin W, Tian Q (2016b) Picking deep filter responses for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1134–1142
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 5209–5217
Zheng, H., Fu, J., Zha, Z.J., Luo, J. (2019). Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5012–5021
Zhu, Y., Li, R., Yang, Y., & Ye, N. (2020). Learning cascade attention for fine-grained image classification. Neural Networks, 122, 174–182.
Zhu, Y., Liu, C., Jiang, S. (2020b). Multi-attention meta learning for few-shot fine-grained image recognition. In: Twenty-ninth international joint conference on artificial intelligence and seventeenth pacific rim international conference on artificial intelligence, pp. 1090–1096
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China No. 61976206 and No. 61832017, Beijing Outstanding Young Scientist Program NO. BJJWZYJH012019100020098, Foshan HKUST Projects (FSUST21-FYTRI01A, FSUST21-FY TRI02A), Beijing Academy of Artificial Intelligence (BAAI), the Fundamental Research Funds for the Central Universities, the Research Funds of Renmin University of China 21XNLG05, and Public Computing Cloud, Renmin University of China.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Bumsub Ham.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qiang, W., Li, J., Su, B. et al. Meta Attention-Generation Network for Cross-Granularity Few-Shot Learning. Int J Comput Vis 131, 1211–1233 (2023). https://doi.org/10.1007/s11263-023-01760-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01760-7