Skip to main content
Log in

Multi-level navigation network: advancing fine-grained visual classification

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Fine-grained visual classification (FGVC) is defined as the finer division of sub-categories within basic categories. The task is both valuable and challenging. Its difficulty primarily arises from its intrinsic slight inter-class variations and substantial intra-class differences. The crucial solution to FGVC lies in identifying local regions with subtle yet discriminative features and effectively representing them. Nevertheless, with the increasing prevalence of deep convolutional neural networks, researchers have primarily prioritized the use of high-level, abstract, semantic features to achieve FGVC, consequently overlooking low-level, detailed information, resulting in poor feature representation capabilities. Thus, we put forward the multi-level navigation network, denoted as MLNN, to enhance feature representation by incorporating both high-level semantics and low-level details. Specifically, MLNN is composed of (1) the feature refinement and attention enhancement module, which enables the network to learn detailed feature representations and further enhance features with attention mechanisms, and (2) the triplet-enhanced multi-level fusion module, which integrates the features of different levels, leading to a more comprehensive feature representation. Experimental outcomes reveal that our approach attains state-of-the-art performance on three widely-accepted benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Code availability

Some or all of the code used during the study is available on request from the corresponding author.

References

  1. Wang W, Cui Y, Li G, Jiang C, Deng S (2020) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32:14613–14622

    Article  MATH  Google Scholar 

  2. Xin D, Chen YW, Li J (2020) Fine-grained butterfly classification in ecological images using squeeze-and-excitation and spatial attention modules. Appl Sci 10:1681

    Article  MATH  Google Scholar 

  3. Yang G, He Y, Yang Y, Xu B (2020) Fine-grained image classification for crop disease based on attention mechanism. Front Plant Sci 11:600854

    Article  MATH  Google Scholar 

  4. Berg T, Belhumeur (2013) PN. Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 955–962

  5. Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: Proceedings of the IEEE international conference on computer vision, pp 1641–1648

  6. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based R-CNNs for fine-grained category detection. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part I 13. Springer, pp 834–849

  7. Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952

  8. Lin TY, RoyChowdhury A, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1449–1457

  9. Wang D, Shen Z, Shao J, Zhang W, Xue X, Zhang Z (2015) Multiple granularity descriptors for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2399–2406

  10. Zheng H, Fu J, Zha ZJ, Luo J, Mei T (2019) Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Trans Image Process 29:476–488

    Article  MathSciNet  MATH  Google Scholar 

  11. He J, Chen JN, Liu S, Kortylewski A, Yang C, Bai Y et al (2022) Transfg: A transformer architecture for fine-grained recognition. Proceedings of the AAAI Conference on Artificial Intelligence 36:852–860

    Article  Google Scholar 

  12. Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4438–4446

  13. Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp 5209–5217

  14. Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 805–821

  15. Gao Y, Han X, Wang X, Huang W, Scott M (2020) Channel interaction networks for fine-grained image categorization. Proceedings of the AAAI Conference on Artificial Intelligence 34:10818–10825

    Article  MATH  Google Scholar 

  16. Wang L, He K, Feng X, Ma X (2022) Multilayer feature fusion with parallel convolutional block for fine-grained image classification. Appl Intell 52:2872–2883

    Article  MATH  Google Scholar 

  17. Huang R, Wang Y, Yang H (2022) Cross-layer attention network for fine-grained visual categorization. arXiv preprint arXiv:2210.08784

  18. Luo W, Yang X, Mo X, Lu Y, Davis LS, Li J, et al (2019) Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8242–8251

  19. Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5157–5166

  20. Chen J, Yu S, Liang J (2023) A Cross-layer Self-attention Learning Network for Fine-grained Classification. In: 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE). IEEE, pp 541–545

  21. Liu M, Zhang C, Bai H, Zhang R, Zhao Y (2021) Cross-part learning for fine-grained image classification. IEEE Trans Image Process 31:748–758

    Article  MATH  Google Scholar 

  22. Lei J, Yang X, Yang S (2022) Multiscale progressive complementary fusion network for fine-grained visual classification. IEEE Access 10:62800–62810

    Article  MATH  Google Scholar 

  23. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125

  24. Zhang F, Wang G, Wu M, Huang S (2023) Multi-branch selection fusion fine-grained classification algorithm based on coordinate attention localization. AI Commun 36:205–217

    Article  MathSciNet  MATH  Google Scholar 

  25. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440

  26. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, pp 234–241

  27. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 845–853

  28. Zhu Q, Li Z, Kuang W, Ma H (2023) A multichannel location-aware interaction network for visual classification. Appl Intell 53:23049–23066

    Article  MATH  Google Scholar 

  29. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 815–823

  30. Yu B, Liu T, Gong M, Ding C, Tao D (2018) Correcting the triplet selection bias for triplet loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 71–87

  31. Wah C, Branson S, Welinder P, Perona P, Belongie SJ (2011) The Caltech-UCSD Birds-200-2011 Dataset. California Institute of Technology; CIT Technical Report No. 2011-001. Technical Report. https://api.semanticscholar.org/CorpusID:16119123

  32. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 554–561

  33. Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151

  34. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  35. Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a CNN for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4148–4157

  36. Liu C, Xie H, Zha ZJ, Ma L, Yu L, Zhang Y (2020) Filtration and distillation: Enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 11555–11562

  37. Zheng H, Fu J, Zha ZJ, Luo J (2019) Learning deep bilinear transformation for fine-grained image representation. Advances in Neural Information Processing Systems, vol 32

  38. Ding Y, Zhou Y, Zhu Y, Ye Q, Jiao J (2019) Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6599–6608

  39. Yang S, Yang X, Wu J, Feng B (2024) Significant feature suppression and cross-feature fusion networks for fine-grained visual classification. Sci Rep 14:24051

    Article  MATH  Google Scholar 

  40. Du Y, Rui T, Li H, Yang C, Wang D (2023) DeepBP: a bilinear model integrating multi-order statistics for fine-grained recognition. Comput Electr Eng 105:108432

    Article  MATH  Google Scholar 

  41. Zhuang P, Wang Y, Qiao Y (2020) Learning attentive pairwise interaction for fine-grained classification. Proceedings of the AAAI Conference on Artificial Intelligence 34:13130–13137

    Article  MATH  Google Scholar 

  42. Zhang T, Chang D, Ma Z, Guo J (2021) Progressive co-attention network for fine-grained visual classification. In: 2021 International Conference on Visual Communications and Image Processing (VCIP). IEEE, pp 1–5

Download references

Acknowledgements

The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper.

Funding

This work was supported by the National Natural Science Foundation of China (No. 61673396) and the Natural Science Foundation of Shandong Province (No. ZR2022MF260).

Author information

Authors and Affiliations

Authors

Contributions

HL and XL were instrumental in devising the concept; HL, XL, and QZ contributed to the development of the methodology. The software development was overseen by XL and QZ. HL, XL, MS, and QZ conducted the formal analysis. XL was in charge of drafting the initial manuscript. HL and MS participated in revising the manuscript and provided editorial input, secured funding, and supervised the project. Additionally, HL, MS, and QZ provided the necessary resources for the study.

Corresponding author

Correspondence to Xian Li.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, H., Li, X., Shao, M. et al. Multi-level navigation network: advancing fine-grained visual classification. J Supercomput 81, 409 (2025). https://doi.org/10.1007/s11227-025-06933-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-025-06933-4

Keywords