Skip to main content

Advertisement

A cross-granularity feature fusion method for fine-grained image recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Fine-grained image recognition is characterized by high interclass object similarities and large intraclass object variations. Many existing works focus on locating more discriminative parts, but it is difficult to extract multigranular features synchronously and fuse them to make joint decisions about various granular parts. To address these issues, this work proposes a novel cross-granularity feature fusion method. First, a multi-granularity feature generator is used to obtain various granularity features simultaneously for mid-level feature maps via its subgenerators. The subgenerators divide the feature maps into blocks to ensure the relative integrity of the local features, and randomly shuffle the divided blocks to increase the variance of the local regions. Then, a cross-granularity feature fusion strategy achieves the joint decision-making of multiple granularity features in fine-grained images. Therefore, the proposed method can extract various granularity features and promote the synergistic interaction of richer granularity features. The effectiveness of the method is verified through comprehensive experiments on three widely-used fine-grained object recognition benchmark datasets and a chip inner structure dataset. The experimental results show that the proposed method significantly outperforms the baseline and exhibits a comparable performance to that of the SOTA method. Source codes are available at https://github.com/ShanWuJ/CGFF

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The CHIP dataset that support the findings of this study are available from our institution. Restrictions apply to the availability of these data, which were used under license for this study. CHIP is available with the permission of our institution.

Code Availability

Source codes are available at https://github.com/ShanWuJ/CGFF

References

  1. Ye S, Peng Q, Sun W, Xu J, Wang Y, You X, Cheung Y-M (2024) Discriminative suprasphere embedding for fine-grained visual categorization. IEEE Trans Neural Netw Learn Syst 35(4):5092–5102

    Article  MATH  Google Scholar 

  2. Wei X-S, Song Y-Z, Aodha OM, Wu J, Peng Y, Tang J, Yang J, Belongie S (2022) Fine-grained image analysis with deep learning: A survey. IEEE Trans Pattern Anal Mach Intell 44(12):8927–8948

  3. Zhang L, Huang S, Liu W, Tao D (2019) Learning a mixture of granularity-specific experts for fine-grained categorization. In: Proceedings of the IEEE/CVF international conference on computer vision pp 8331–8340

  4. He X, Peng Y (2020) Fine-grained visual-textual representation learning. IEEE Trans Circ Syst Vid Technol 30(2):520–531

    Article  MATH  Google Scholar 

  5. Zheng H, Fu J, Zha Z-J, Luo J, Mei T (2020) Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Trans Image Process 29:476–488

    Article  MathSciNet  MATH  Google Scholar 

  6. Sun G, Cholakkal H, Khan S, Khan F, Shao L (2020) Fine-grained recognition: Accounting for subtle differences between similar classes. In: Proceedings of the AAAI conference on artificial intelligence vol 34, pp 12047–12054

  7. Zhang L, Huang S, Liu W (2022) Learning sequentially diversified representations for fine-grained categorization. Pattern Recognit 121:108219

    Article  MATH  Google Scholar 

  8. Han J, Yao X, Cheng G, Feng X, Xu D (2022) P-cnn: Part-based convolutional neural networks for fine-grained visual categorization. IEEE Trans Pattern Anal Mach Intell 44(2):579–590

    Article  MATH  Google Scholar 

  9. Guo C, Lin Y, Chen S, Zeng Z, Shao M, Li S (2022) From the whole to detail: Progressively sampling discriminative parts for fine-grained recognition. Knowl Based Syst 235:107651

    Article  MATH  Google Scholar 

  10. Wang M, Zhao P, Lu X, Min F, Wang X (2023) Fine-grained visual categorization: A spatial–frequency feature fusion perspective. IEEE Trans Circ Syst Vid Technol 33(6):2798–2812

    Article  MATH  Google Scholar 

  11. Wang S, Li H, Wang Z, Ouyang W (2021) Dynamic position-aware network for fine-grained image recognition. In: Proceedings of the AAAI conference on artificial intelligence vol 35, pp 2791–2799

  12. Feng Y, Lv Y, Zhang H, Li F, He G (2022) Channel interaction mechanism for fine grained image categorization. In: 2022 International conference on image processing and media computing (ICIPMC), pp 115–119

  13. Joung S, Kim S, Kim M, Kim I-J, Sohn K (2021) Learning canonical 3d object representation for fine-grained recognition. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 1015–1025

  14. Luo W, Yang X, Mo X, Lu Y, Davis LS, Li J, Yang J, Lim S-N (2019) Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8242–8251

  15. Huang S, Wang X, Tao D (2021) Stochastic partial swap: Enhanced model generalization and interpretability for fine-grained recognition. In: Proceedings of the IEEE/CVF international conference on computer vision pp 620–629

  16. Yang X, Zeng Z, Yang D (2024) Adaptive mid-level feature attention learning for fine-grained ship classification in optical remote sensing images. IEEE Trans Geosci Remote Sens 62:1–10

    MATH  Google Scholar 

  17. Du R, Chang D, Bhunia AK, Xie J, Ma Z, Song Y-Z, Guo J (2020) Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In: European conference on computer vision, pp 153–168. Springer

  18. Zhang Y, Sun Y, Wang N, Gao Z, Chen F, Wang C, Tang J (2021) Msec: Multi-scale erasure and confusion for fine-grained image classification. Neurocomputing 449:1–14

    Article  MATH  Google Scholar 

  19. An C, Wang X, Wei Z, Zhang K, Huang L (2023) Multi-scale network via progressive multi-granularity attention for fine-grained visual classification. Appl Soft Comput 146:110588

    Article  Google Scholar 

  20. Xu Y, Wu S, Wang B, Yang M, Wu Z, Yao Y, Wei Z (2024) Two-stage fine-grained image classification model based on multi-granularity feature fusion. Pattern Recognit 146:110042

    Article  MATH  Google Scholar 

  21. Stockman G, Shapiro LG (2001) Computer Vision. Prentice Hall PTR, Englewood Cliffs, NJ, USA

    MATH  Google Scholar 

  22. Gonzalez RC, Woods RE (2018) Digital Image Processing, 4th edn. Pearson, New York, USA

    MATH  Google Scholar 

  23. Ji R, Wen L, Zhang L, Du D, Wu Y, Zhao C, Liu X, Huang F (2020) Attention convolutional binary neural tree for fine-grained visual categorization. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10465–10474

  24. Liu H, Li J, Li D, See J, Lin W (2021) Learning scale-consistent attention part network for fine-grained image recognition. IEEE Trans Multimed 24:2902–2913

    Article  MATH  Google Scholar 

  25. He J, Chen J-N, Liu S, Kortylewski A, Yang C, Bai Y, Wang C (2022) Transfg: A transformer architecture for fine-grained recognition. In: Proceedings of the AAAI conference on artificial intelligence vol 36, pp 852–860

  26. Sun H, He X, Peng Y (2022) Sim-trans: Structure information modeling transformer for fine-grained visual categorization. In: Proceedings of the 30th ACM international conference on multimedia, pp 5853–5861

  27. Chang D, Pang K, Zheng Y, Ma Z, Song Y-Z, Guo J (2021) Your “flamingo” is my “bird”: Fine-grained, or not. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR) pp 11471–11480

  28. Zhu Q, Li Z, Kuang W, Ma H (2023) A multichannel location-aware interaction network for visual classification. Appl Intell 53(20):23049–23066

    Article  MATH  Google Scholar 

  29. Chen H, Cheng L, Huang G, Zhang G, Lan J, Yu Z, Pun C-M, Ling W-K (2022) Fine-grained visual classification with multi-scale features based on self-supervised attention filtering mechanism. Appl Intell 52(13):15673–15689

    Article  Google Scholar 

  30. Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Wu M, Guo J, Song Y-Z (2020) The devil is in the channels: Mutual-channel loss for fine-grained image classification. IEEE Trans Image Process 29:4683–4695

    Article  Google Scholar 

  31. Xu M, Qin L, Chen W, Pu S, Zhang L (2023) Multi-view adversarial discriminator: Mine the non-causal factors for object detection in unseen domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8103–8112

  32. Deng W, Marsh J, Gould S, Zheng L (2022) Fine-grained classification via categorical memory networks. IEEE Trans Image Process 31:4186–4196

    Article  MATH  Google Scholar 

  33. Min S, Yao H, Xie H, Zha Z-J, Zhang Y (2020) Multi-objective matrix normalization for fine-grained visual recognition. IEEE Trans Image Process 29:4996–5009

    Article  MATH  Google Scholar 

  34. Sun L, Guan X, Yang Y, Zhang L (2020) Text-embedded bilinear model for fine-grained visual recognition. In: Proceedings of the 28th ACM international conference on multimedia, pp 211–219

  35. Tan M, Yuan F, Yu J, Wang G, Gu X (2022) Fine-grained image classification via multi-scale selective hierarchical biquadratic pooling. ACM Trans Multimed Comput Commun Appl (TOMM) 18(1s):1–23

    Article  Google Scholar 

  36. Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5157–5166

  37. Zhang K, Fan J, Huang S, Qiao Y, Yu X, Qin F (2022) Cekd: Cross ensemble knowledge distillation for augmented fine-grained data. Appl Intell 52(14):16640–16650

    Article  Google Scholar 

  38. Ma L, Zhao F, Hong H, Wang L, Zhu Y (2023) Complementary parts contrastive learning for fine-grained weakly supervised object co-localization. IEEE Trans Circ Syst Vid Technol 1:1

    MATH  Google Scholar 

  39. Wang L, He K, Feng X, Ma X (2022) Multilayer feature fusion with parallel convolutional block for fine-grained image classification. Appl Intell 52(3):2872–2883

    Article  MATH  Google Scholar 

  40. Xu K, Lai R, Gu L, Li Y (2023) Multiresolution discriminative mixup network for fine-grained visual categorization. IEEE Trans Neural Netw Learn Syst 34(7):3488–3500

    Article  MATH  Google Scholar 

  41. Wan R, Zhou J, Huang B, Zeng H, Fan Y (2022) Apmc: Adjacent pixels based measurement coding system for compressively sensed images. IEEE Trans Multimed 24:3558–3569

    Article  Google Scholar 

  42. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  43. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561

  44. Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151

  45. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  46. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255

  47. Liu K, Chen K, Jia K (2022) Convolutional fine-grained classification with self-supervised target relation regularization. IEEE Trans Image Process 31:5570–5584

  48. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

Download references

Acknowledgements

This work was supported by the Fund of the National Key Laboratory for Reliability Physics and Its Application Technology of Electrical Component, under grant 6142806230201, the National Natural Science Foundation of China, under grant 62221005, 62276038, and the Key Cooperation Project of Chongqing Municipal Education Commission, under grant HZ2021008.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Shan Wu contributed to the conception, data curation, formal analysis, methodology, software, validation and drafted the manuscript. Jun Hu contributed to the conception, data curation, methodology, and critically revised the manuscript. Chen Sun contributed to data curation, funding, methodology, resources, supervision, and revised the manuscript. Fujin Zhong contributed to the conception, methodology, and revised the manuscript. Qinghua Zhang contributed to the conception, funding, and revised the manuscript. Guoyin Wang contributed to the conception, methodology, project administration, supervision, funding, and revised the manuscript. All authors read and approved the final manuscript, and agreed to be accountable for all aspects of the work, ensuring that anyquestions related to the accuracy or integrity of the study are appropriately addressed and resolved.

Corresponding author

Correspondence to Jun Hu.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

A local region \(f_{loc}\) of size \((m+k) \times (m+k)\) is represented as (A1) in \(f^{1 \times n\times n}(x)\).

$$\begin{aligned} f_{loc}=\begin{bmatrix} a_{ij} & \cdots & a_{i(j+m-1)} & a_{i(j+m)} & \cdots & a_{i(j+m+k-1)}\\ a_{(i+1)j} & \cdots & a_{(i+1)(j+m-1)} & a_{(i+1)(j+m)} & \cdots & a_{(i+1)(j+m+k-1)}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ a_{(i+m-1)j} & \cdots & a_{(i+m-1)(j+m-1)} & a_{(i+m-1)(j+m)} & \cdots & a_{(i+m-1)(j+m+k-1)}\\ a_{(i+m)j} & \cdots & a_{(i+m)(j+m-1)} & a_{(i+m)(j+m)} & \cdots & a_{(i+m)(j+m+k-1)}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ a_{(i+m+k-1)j} & \cdots & a_{(i+m+k-1)(j+m-1)} & a_{(i+m+k-1)(j+m)} & \cdots & a_{(i+m+k-1)(j+m+k-1)} \end{bmatrix} \end{aligned}$$
(A1)

where \((i+m+k-1) \le n\) and \((j+m+k-1) \le n. i,j\in \{1,2,\dots ,n\}\). The variance of the local region \(f_{loc}\) is given by (A2).

$$\begin{aligned} \sigma ^{2}_{f_{loc}}=\frac{1}{(m+k-1)}\sum \limits _{g=0}^{m+k-1}[\sum \limits _{p=i}^{i+m+k-1}(a_{p(j+g)}-\frac{1}{m+k}\sum \limits _{l=0}^{m+k-1}a_{p(j+l)})^2] \end{aligned}$$
(A2)

To explain it more clearly, \(\sigma ^2_{f_{loc}}\) is transformed as the following (A3).

$$\begin{aligned} \begin{aligned}&\sigma ^2_{f_{loc}}=\frac{1}{(m+k-1)}\sum \limits _{p=i}^{i+m-1}\{ \sum \limits _{g=0}^{m-1}[(\frac{\sum \limits _{l=0}^{m-1}(a_{p(j+g)}-a_{p(j+l)})+ \sum \limits _{l=m}^{m+k-1}(a_{p(j+g)}-a_{p(j+l)})}{m+k})^2]\\&+\sum \limits _{g=m}^{m+k-1}[(\frac{ \sum \limits _{l=0}^{m-1}(a_{p(j+g)}-a_{p_(j+l)})+\sum \limits _{l=m}^{m+k-1}(a_{p(j+g)}-a_{p(j+l)}) }{m+k})^2] \}\\&+\frac{1}{(m+k-1)}\sum \limits _{p=i+m}^{i+m+k-1}\{\sum \limits _{g=0}^{m+k-1} [(\frac{ \sum \limits _{l=0}^{m+k-1}(a_{p(j+g)}-a_{p(j+l)} }{m+k})^2] \} \end{aligned} \end{aligned}$$
(A3)

The resulting local region \(f_{loc}^{'}\) after exchanged is represented as (A4).

$$\begin{aligned} f_{loc}^{'}=\begin{bmatrix} a_{rh} & \cdots & a_{r(h+m-1)} & a_{i(j+m)} & \cdots & a_{i(j+m+k-1)}\\ a_{(r+1)h} & \cdots & a_{(r+1)(h+m-1)} & a_{(i+1)(j+m)} & \cdots & a_{(i+1)(j+m+k-1)}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ a_{(r+m-1)h} & \cdots & a_{(r+m-1)(h+m-1)} & a_{(i+m-1)(j+m)} & \cdots & a_{(i+m-1)(j+m+k-1)}\\ a_{(i+m)j} & \cdots & a_{(i+m)(j+m-1)} & a_{(i+m)(j+m)} & \cdots & a_{(i+m)(j+m+k-1)}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ a_{(i+m+k-1)j} & \cdots & a_{(i+m+k-1)(j+m-1)} & a_{(i+m+k-1)(j+m)} & \cdots & a_{(i+m+k-1)(j+m+k-1)} \end{bmatrix} \end{aligned}$$
(A4)

where \(r,h\in \{1,2,\dots ,n\},r+m-1\le n,h+m-1\le n\). The variance \(\sigma ^2_{f^{'}_{loc}}\) of the local region \(f^{'}_{loc}\) can be obtained through the same method as before (2).

$$\begin{aligned} \begin{aligned}&\sigma ^2_{f^{'}_{loc}}=\frac{1}{(m+k-1)}\sum \limits _{p^{'}=r}^{r+m-1}\{ \sum \limits _{g=0}^{m-1}[ (\frac{ \sum \limits _{l=0}^{m-1}(a_{p^{'}(h+g)}-a_{p^{'}(h+l)})+\sum \limits _{l=m}^{m+k-1}(a_{p^{'}(h+g)}-a_{p^{'}(j+l)}) }{m+k})^2 ]\\&+\sum \limits _{g=m}^{m+k-1}[ (\frac{\sum \limits _{l=0}^{m-1}(a_{p^{'}(j+g)}-a_{p^{'}(h+l)})+\sum \limits _{l=m}^{m+k-1}(a_{p^{'}(j+g)}-a_{p^{'}(j+l)}) }{m+k})^2 ] \}\\&+\frac{1}{(m+k-1)}\sum \limits _{p^{'}=i+m}^{i+m+k-1}\{ \sum \limits _{g=0}^{m+k-1}[(\frac{ \sum \limits _{l=0}^{m+k-1}(a_{p^{'}(j+g)}-a_{p_{'}(j+l)}) }{m+k})^2] \} \end{aligned} \end{aligned}$$
(A5)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, S., Hu, J., Sun, C. et al. A cross-granularity feature fusion method for fine-grained image recognition. Appl Intell 55, 42 (2025). https://doi.org/10.1007/s10489-024-05891-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05891-3

Keywords