Abstract
Fine-grained visual classification (FGVC) is defined as the finer division of sub-categories within basic categories. The task is both valuable and challenging. Its difficulty primarily arises from its intrinsic slight inter-class variations and substantial intra-class differences. The crucial solution to FGVC lies in identifying local regions with subtle yet discriminative features and effectively representing them. Nevertheless, with the increasing prevalence of deep convolutional neural networks, researchers have primarily prioritized the use of high-level, abstract, semantic features to achieve FGVC, consequently overlooking low-level, detailed information, resulting in poor feature representation capabilities. Thus, we put forward the multi-level navigation network, denoted as MLNN, to enhance feature representation by incorporating both high-level semantics and low-level details. Specifically, MLNN is composed of (1) the feature refinement and attention enhancement module, which enables the network to learn detailed feature representations and further enhance features with attention mechanisms, and (2) the triplet-enhanced multi-level fusion module, which integrates the features of different levels, leading to a more comprehensive feature representation. Experimental outcomes reveal that our approach attains state-of-the-art performance on three widely-accepted benchmark datasets.





Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Code availability
Some or all of the code used during the study is available on request from the corresponding author.
References
Wang W, Cui Y, Li G, Jiang C, Deng S (2020) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32:14613–14622
Xin D, Chen YW, Li J (2020) Fine-grained butterfly classification in ecological images using squeeze-and-excitation and spatial attention modules. Appl Sci 10:1681
Yang G, He Y, Yang Y, Xu B (2020) Fine-grained image classification for crop disease based on attention mechanism. Front Plant Sci 11:600854
Berg T, Belhumeur (2013) PN. Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 955–962
Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: Proceedings of the IEEE international conference on computer vision, pp 1641–1648
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based R-CNNs for fine-grained category detection. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part I 13. Springer, pp 834–849
Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1449–1457
Wang D, Shen Z, Shao J, Zhang W, Xue X, Zhang Z (2015) Multiple granularity descriptors for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2399–2406
Zheng H, Fu J, Zha ZJ, Luo J, Mei T (2019) Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Trans Image Process 29:476–488
He J, Chen JN, Liu S, Kortylewski A, Yang C, Bai Y et al (2022) Transfg: A transformer architecture for fine-grained recognition. Proceedings of the AAAI Conference on Artificial Intelligence 36:852–860
Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4438–4446
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp 5209–5217
Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 805–821
Gao Y, Han X, Wang X, Huang W, Scott M (2020) Channel interaction networks for fine-grained image categorization. Proceedings of the AAAI Conference on Artificial Intelligence 34:10818–10825
Wang L, He K, Feng X, Ma X (2022) Multilayer feature fusion with parallel convolutional block for fine-grained image classification. Appl Intell 52:2872–2883
Huang R, Wang Y, Yang H (2022) Cross-layer attention network for fine-grained visual categorization. arXiv preprint arXiv:2210.08784
Luo W, Yang X, Mo X, Lu Y, Davis LS, Li J, et al (2019) Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8242–8251
Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5157–5166
Chen J, Yu S, Liang J (2023) A Cross-layer Self-attention Learning Network for Fine-grained Classification. In: 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE). IEEE, pp 541–545
Liu M, Zhang C, Bai H, Zhang R, Zhao Y (2021) Cross-part learning for fine-grained image classification. IEEE Trans Image Process 31:748–758
Lei J, Yang X, Yang S (2022) Multiscale progressive complementary fusion network for fine-grained visual classification. IEEE Access 10:62800–62810
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Zhang F, Wang G, Wu M, Huang S (2023) Multi-branch selection fusion fine-grained classification algorithm based on coordinate attention localization. AI Commun 36:205–217
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, pp 234–241
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 845–853
Zhu Q, Li Z, Kuang W, Ma H (2023) A multichannel location-aware interaction network for visual classification. Appl Intell 53:23049–23066
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 815–823
Yu B, Liu T, Gong M, Ding C, Tao D (2018) Correcting the triplet selection bias for triplet loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 71–87
Wah C, Branson S, Welinder P, Perona P, Belongie SJ (2011) The Caltech-UCSD Birds-200-2011 Dataset. California Institute of Technology; CIT Technical Report No. 2011-001. Technical Report. https://api.semanticscholar.org/CorpusID:16119123
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 554–561
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a CNN for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4148–4157
Liu C, Xie H, Zha ZJ, Ma L, Yu L, Zhang Y (2020) Filtration and distillation: Enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 11555–11562
Zheng H, Fu J, Zha ZJ, Luo J (2019) Learning deep bilinear transformation for fine-grained image representation. Advances in Neural Information Processing Systems, vol 32
Ding Y, Zhou Y, Zhu Y, Ye Q, Jiao J (2019) Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6599–6608
Yang S, Yang X, Wu J, Feng B (2024) Significant feature suppression and cross-feature fusion networks for fine-grained visual classification. Sci Rep 14:24051
Du Y, Rui T, Li H, Yang C, Wang D (2023) DeepBP: a bilinear model integrating multi-order statistics for fine-grained recognition. Comput Electr Eng 105:108432
Zhuang P, Wang Y, Qiao Y (2020) Learning attentive pairwise interaction for fine-grained classification. Proceedings of the AAAI Conference on Artificial Intelligence 34:13130–13137
Zhang T, Chang D, Ma Z, Guo J (2021) Progressive co-attention network for fine-grained visual classification. In: 2021 International Conference on Visual Communications and Image Processing (VCIP). IEEE, pp 1–5
Acknowledgements
The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper.
Funding
This work was supported by the National Natural Science Foundation of China (No. 61673396) and the Natural Science Foundation of Shandong Province (No. ZR2022MF260).
Author information
Authors and Affiliations
Contributions
HL and XL were instrumental in devising the concept; HL, XL, and QZ contributed to the development of the methodology. The software development was overseen by XL and QZ. HL, XL, MS, and QZ conducted the formal analysis. XL was in charge of drafting the initial manuscript. HL and MS participated in revising the manuscript and provided editorial input, secured funding, and supervised the project. Additionally, HL, MS, and QZ provided the necessary resources for the study.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, H., Li, X., Shao, M. et al. Multi-level navigation network: advancing fine-grained visual classification. J Supercomput 81, 409 (2025). https://doi.org/10.1007/s11227-025-06933-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-025-06933-4