Abstract
Fine-grained visual classification (FGVC) seeks to identify sub-classes within the same meta-class. Prior efforts mainly mine the features of discriminative parts to enhance classification performance. However, we argue that most of these works ignore the spatial details inside each part and the spatial correlations between parts when extracting local features and fusing global features, inhibiting the further improvement of feature quality, especially for the irregular discriminative parts. To alleviate this issue, we rethink the feature generation route from pixels to parts and to objects, and propose a novel graph-in-graph discriminative feature enhancement network (G\(^{2}\)DFE-Net). Specifically, the G\(^{2}\)DFE-Net consists of two nested graph convolutional networks, where an internal graph is first developed based on the spatial attention strategy to highlight details of the irregular discriminative regions. Then, a KNN-based external graph is introduced to capture the spatial context correlation among independent discriminative parts. With the collaboration of internal and external graph, G\(^{2}\)DFE-Net boosts the class separability and compactness of global feature representation, thereby benefiting the accurate FGVC. We conduct thorough experiments on five benchmark datasets, and both quantitative and qualitative results confirm the superior accuracy of our G\(^{2}\)DFE-Net compared to previous state-of-the-art algorithms. The code is available at https://github.com/WangYuPeng1/G2DFE-Net.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
Data will be made available on request.
References
Xie J, Zhong Y, Zhang J et al (2023) A weakly supervised spatial group attention network for fine-grained visual recognition. Appl Intell 53(20):23301–23315
Yu Y, Wang J, Pedrycz W et al (2024) Multi-level information fusion transformer with background filter for fine-grained image recognition. Appl Intell 1–12
Wang L, He K, Feng X et al (2022) Multilayer feature fusion with parallel convolutional block for fine-grained image classification. Appl Intell 52(3):2872–2883
Lin D, Shen X, Lu C et al (2015) Deep lac: Deep localization, alignment and classification for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1666–1674
Huang S, Xu Z, Tao D et al (2016) Part-stacked cnn for fine-grained visual categorization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1173–1182
Wang J, Li N, Luo Z et al (2021) High-order-interaction for weakly supervised fine-grained visual categorization. Neurocomputing 464:27–36
Xu S, Chang D, Xie J et al (2021) Grad-cam guided channel-spatial attention module for fine-grained visual classification. In: 2021 IEEE 31st international workshop on machine learning for signal Processing (MLSP). IEEE. pp 1–6
Guo C, Lin Y, Chen S et al (2022) From the whole to detail: Progressively sampling discriminative parts for fine-grained recognition. Knowl-Based Syst 235:107651
Hu T, Qi H, Huang Q et al (2019) See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification. arXiv preprint arXiv:1901.09891https://doi.org/10.48550/arXiv.1901.09891
Chen J, Li H, Liang J et al (2022) Attention-based cropping and erasing learning with coarse-to-fine refinement for fine-grained visual classification. Neurocomputing 501:359–369
Rao Y, Chen G, Lu J et al (2021) Counterfactual attention learning for fine-grained visual categorization and re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 1025–1034
Li W, Li S, Yin L et al (2022) A novel visual classification framework on panoramic attention mechanism network. IET Comput Vision 16:479–488
He J, Chen JN, Liu S et al (2022) Transfg: A transformer architecture for fine-grained recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp 852–860
Liu D, Zhao L, Wang Y et al (2023) Learn from each other to classify better: Cross-layer mutual attention learning for fine-grained visual classification. Pattern Recogn 140:109550
Ding Y, Ma Z, Wen S et al (2021) Ap-cnn: Weakly supervised attention pyramid convolutional neural network for fine-grained visual classification. IEEE Trans Image Process 30:2826–2836
Zhuang G, Hu Y, Yan T et al (2024) Gcam: Gaussian and causal-attention model of food fine-grained recognition. Sig Image Video Process 1–12
Guo C, Lin Y, Xu M et al (2023) Inverse transformation sampling-based attentive cutout for fine-grained visual recognition. Vis Comput 39:2597–2608
Wang C, Qian Y, Gong W et al (2022) Cross-layer progressive attention bilinear fusion method for fine-grained visual classification. J Vis Commun Image Represent 82:103414
Xu Q, Li S, Wang J et al (2024) Context-semantic quality awareness network for fine-grained visual categorization. arXiv preprint arXiv:2403.10298
Gori M, Monfardini G, Scarselli F (2005) A new model for learning in graph domains. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005, vol 2. IEEE, pp 729–734
Bruna J, Zaremba W, Szlam A et al (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203
Wang Y, Sun Y, Liu Z et al (2019) Dynamic graph cnn for learning on point clouds. ACM Trans Graphics (tog) 38:1–12
Ying Z, You J, Morris C et al (2018) Hierarchical graph representation learning with differentiable pooling. Adv Neural Inf Process Syst 31
Jia S, Jiang S, Zhang S et al (2022) Graph-in-graph convolutional network for hyperspectral image classification. IEEE Trans. Neural Netw Learn Syst https://doi.org/10.1109/TNNLS.2022.3182715
Ren H, Lu W, Xiao Y et al (2022) Graph convolutional networks in language and vision: A survey. Knowl-Based Syst 251:109250
Wang M, Wu L, Li M et al (2022) Meta-learning based spatial-temporal graph attention network for traffic signal control. Knowl-Based Syst 250:109166
Wang Z, Wu Z, Li X et al (2023) Attention-aware temporal-spatial graph neural network with multi-sensor information fusion for fault diagnosis. Knowl-Based Syst 278:110891
Zhu H, Wang H, Kang D et al (2019) Study of joint temporal-spatial distribution of array output for large-scale photovoltaic plant and its fault diagnosis application. Sol Energy 181:137–147
Bera A, Wharton Z, Liu Y et al (2022) Sr-gnn: Spatial relation-aware graph neural network for fine-grained image categorization. IEEE Trans Image Process 31:6017–6031
Yang X, Wang Y, Chen K et al (2022) Fine-grained object classification via self-supervised pose alignment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7399–7408
Zhao Y, Yan K, Huang F et al (2021) Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 15079–15088
Tang Z, Yang H, Chen CYC (2023) Weakly supervised posture mining for fine-grained classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 23735–23744
Wang S, Wang Z, Li H et al (2024) Accurate fine-grained object recognition with structure-driven relation graph networks. Int J Comput Vision 132(1):137–160
Wu F, Souza A, Zhang T et al (2019) Simplifying graph convolutional networks. In: International conference on machine learning. PMLR, pp 6861–6871
Wen Y, Zhang K, Li Z et al (2016) A discriminative feature learning approach for deep face recognition. In: Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VII 14. Springer, pp 499–515
Wah C, Branson S, Welinder P et al (2011) The caltech-ucsd birds-200-2011 dataset. California Institute of Technology
Maji S, Rahtu E, Kannala J et al (2013) Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151https://doi.org/10.48550/arXiv.1306.5151
Krause J, Stark M, Deng J et al (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Khosla A, Jayadevaprakash N, Yao B et al (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proc. CVPR workshop on fine-grained visual categorization (FGVC). Citeseer
Van Horn G, Branson S, Farrell R et al (2015) Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 595–604
Do T, Tran H, Tjiputra E et al (2022) Fine-grained visual classification using self assessment classifier. arXiv preprint arXiv:2205.10529
Yao H, Miao Q, Zhao P et al (2024) Exploration of class center for fine-grained visual classification. IEEE Trans Circuits Syst Video Technol 1–1. https://doi.org/10.1109/TCSVT.2024.3406443
Ke X, Cai Y, Chen B et al (2023) Granularity-aware distillation and structure modeling region proposal network for fine-grained image classification. Pattern Recogn 137:109305
Yu D, Fang Z, Jiang Y (2024) Foreground feature enhancement and peak & background suppression for fine-grained visual classification. In: International conference on multimedia modeling. Springer, pp 134–146
Song W, Chen D (2024) Posture-guided part learning for fine-grained image categorization. J Electron Imaging 33(3):033013–033013
Zhang T, Chang D, Ma Z et al (2021) Progressive co-attention network for fine-grained visual classification. In: 2021 International Conference on Visual Communications and Image Processing (VCIP). IEEE, pp 1–5
Yang M, Xu Y, Wu Z et al (2022) Symmetrical irregular local features for fine-grained visual classification. Neurocomputing 505:304–314
Zhao P, Li Y, Tang B et al (2023) Feature relocation network for fine-grained image classification. Neural Netw 161:306–317
Ji R, Li J, Zhang L (2023) Siamese self-supervised learning for fine-grained visual classification. Comput Vis Image Underst 229:103658
Zhuang P, Wang Y, Qiao Y (2020) Learning attentive pairwise interaction for fine-grained classification. In: Proceedings of the AAAI conference on artificial intelligence. pp 13130–13137
Du R, Xie J, Ma Z et al (2022) Progressive learning of category-consistent multi-granularity features for fine-grained visual classification. IEEE Trans Pattern Anal Mach Intell 44(12):9521–9535
Lin Z, Zheng Z, Jia J et al (2023) Ml-capsnet meets vb-di-d: A novel distortion-tolerant baseline for perturbed object recognition. Eng Appl Artif Intell 120:105937. https://doi.org/10.1016/j.engappai.2023.105937
Zhu L, Chen T, Yin J et al (2023) Learning gabor texture features for fine-grained recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 1621–1631
Acknowledgements
This article has been supported by the Jiangsu Province Key R&D Program (Modern Agriculture) Key Project (BE2023352), Key Medical Research Projects of Jiangsu Provincial Health Commission (ZD2022068), National Natural Science Foundation of China (61941113).
Author information
Authors and Affiliations
Contributions
Yupeng Wang: Conceptualization, Software, Writing - Original draft preparation; Can Xu: Writing - Review & Editing; Yongli Wang: Methodology, Funding acquisition, Supervision; Weiping Ding: Visualization, Formal analysis and investigation, Supervision; Xiaoli Wang: Methodology, Data curation.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no conflict of interest.
Ethical and informed consent for data used
The study utilized publicly available datasets, ensuring adherence to ethical standards and data privacy regulations.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Xu, C., Wang, Y. et al. Graph-in-graph discriminative feature enhancement network for fine-grained visual classification. Appl Intell 55, 22 (2025). https://doi.org/10.1007/s10489-024-05846-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05846-8