A cross-granularity feature fusion method for fine-grained image recognition

Wu, Shan; Hu, Jun; Sun, Chen; Zhong, Fujin; Zhang, Qinghua; Wang, Guoyin

doi:10.1007/s10489-024-05891-3

A cross-granularity feature fusion method for fine-grained image recognition

Published: 28 November 2024

Volume 55, article number 42, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shan Wu¹,
Jun Hu¹,
Chen Sun²,
Fujin Zhong¹,
Qinghua Zhang¹ &
…
Guoyin Wang¹

166 Accesses
Explore all metrics

Abstract

Fine-grained image recognition is characterized by high interclass object similarities and large intraclass object variations. Many existing works focus on locating more discriminative parts, but it is difficult to extract multigranular features synchronously and fuse them to make joint decisions about various granular parts. To address these issues, this work proposes a novel cross-granularity feature fusion method. First, a multi-granularity feature generator is used to obtain various granularity features simultaneously for mid-level feature maps via its subgenerators. The subgenerators divide the feature maps into blocks to ensure the relative integrity of the local features, and randomly shuffle the divided blocks to increase the variance of the local regions. Then, a cross-granularity feature fusion strategy achieves the joint decision-making of multiple granularity features in fine-grained images. Therefore, the proposed method can extract various granularity features and promote the synergistic interaction of richer granularity features. The effectiveness of the method is verified through comprehensive experiments on three widely-used fine-grained object recognition benchmark datasets and a chip inner structure dataset. The experimental results show that the proposed method significantly outperforms the baseline and exhibits a comparable performance to that of the SOTA method. Source codes are available at https://github.com/ShanWuJ/CGFF

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fine-grained image recognition via trusted multi-granularity information fusion

Article 22 October 2022

Refining deep convolutional features for improving fine-grained image recognition

Article Open access 08 April 2017

A Detection Network United Local Feature Points and Components for Fine-Grained Image Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The CHIP dataset that support the findings of this study are available from our institution. Restrictions apply to the availability of these data, which were used under license for this study. CHIP is available with the permission of our institution.

Code Availability

Source codes are available at https://github.com/ShanWuJ/CGFF

References

Ye S, Peng Q, Sun W, Xu J, Wang Y, You X, Cheung Y-M (2024) Discriminative suprasphere embedding for fine-grained visual categorization. IEEE Trans Neural Netw Learn Syst 35(4):5092–5102
Article MATH Google Scholar
Wei X-S, Song Y-Z, Aodha OM, Wu J, Peng Y, Tang J, Yang J, Belongie S (2022) Fine-grained image analysis with deep learning: A survey. IEEE Trans Pattern Anal Mach Intell 44(12):8927–8948
Zhang L, Huang S, Liu W, Tao D (2019) Learning a mixture of granularity-specific experts for fine-grained categorization. In: Proceedings of the IEEE/CVF international conference on computer vision pp 8331–8340
He X, Peng Y (2020) Fine-grained visual-textual representation learning. IEEE Trans Circ Syst Vid Technol 30(2):520–531
Article MATH Google Scholar
Zheng H, Fu J, Zha Z-J, Luo J, Mei T (2020) Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Trans Image Process 29:476–488
Article MathSciNet MATH Google Scholar
Sun G, Cholakkal H, Khan S, Khan F, Shao L (2020) Fine-grained recognition: Accounting for subtle differences between similar classes. In: Proceedings of the AAAI conference on artificial intelligence vol 34, pp 12047–12054
Zhang L, Huang S, Liu W (2022) Learning sequentially diversified representations for fine-grained categorization. Pattern Recognit 121:108219
Article MATH Google Scholar
Han J, Yao X, Cheng G, Feng X, Xu D (2022) P-cnn: Part-based convolutional neural networks for fine-grained visual categorization. IEEE Trans Pattern Anal Mach Intell 44(2):579–590
Article MATH Google Scholar
Guo C, Lin Y, Chen S, Zeng Z, Shao M, Li S (2022) From the whole to detail: Progressively sampling discriminative parts for fine-grained recognition. Knowl Based Syst 235:107651
Article MATH Google Scholar
Wang M, Zhao P, Lu X, Min F, Wang X (2023) Fine-grained visual categorization: A spatial–frequency feature fusion perspective. IEEE Trans Circ Syst Vid Technol 33(6):2798–2812
Article MATH Google Scholar
Wang S, Li H, Wang Z, Ouyang W (2021) Dynamic position-aware network for fine-grained image recognition. In: Proceedings of the AAAI conference on artificial intelligence vol 35, pp 2791–2799
Feng Y, Lv Y, Zhang H, Li F, He G (2022) Channel interaction mechanism for fine grained image categorization. In: 2022 International conference on image processing and media computing (ICIPMC), pp 115–119
Joung S, Kim S, Kim M, Kim I-J, Sohn K (2021) Learning canonical 3d object representation for fine-grained recognition. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 1015–1025
Luo W, Yang X, Mo X, Lu Y, Davis LS, Li J, Yang J, Lim S-N (2019) Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8242–8251
Huang S, Wang X, Tao D (2021) Stochastic partial swap: Enhanced model generalization and interpretability for fine-grained recognition. In: Proceedings of the IEEE/CVF international conference on computer vision pp 620–629
Yang X, Zeng Z, Yang D (2024) Adaptive mid-level feature attention learning for fine-grained ship classification in optical remote sensing images. IEEE Trans Geosci Remote Sens 62:1–10
MATH Google Scholar
Du R, Chang D, Bhunia AK, Xie J, Ma Z, Song Y-Z, Guo J (2020) Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In: European conference on computer vision, pp 153–168. Springer
Zhang Y, Sun Y, Wang N, Gao Z, Chen F, Wang C, Tang J (2021) Msec: Multi-scale erasure and confusion for fine-grained image classification. Neurocomputing 449:1–14
Article MATH Google Scholar
An C, Wang X, Wei Z, Zhang K, Huang L (2023) Multi-scale network via progressive multi-granularity attention for fine-grained visual classification. Appl Soft Comput 146:110588
Article Google Scholar
Xu Y, Wu S, Wang B, Yang M, Wu Z, Yao Y, Wei Z (2024) Two-stage fine-grained image classification model based on multi-granularity feature fusion. Pattern Recognit 146:110042
Article MATH Google Scholar
Stockman G, Shapiro LG (2001) Computer Vision. Prentice Hall PTR, Englewood Cliffs, NJ, USA
MATH Google Scholar
Gonzalez RC, Woods RE (2018) Digital Image Processing, 4th edn. Pearson, New York, USA
MATH Google Scholar
Ji R, Wen L, Zhang L, Du D, Wu Y, Zhao C, Liu X, Huang F (2020) Attention convolutional binary neural tree for fine-grained visual categorization. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10465–10474
Liu H, Li J, Li D, See J, Lin W (2021) Learning scale-consistent attention part network for fine-grained image recognition. IEEE Trans Multimed 24:2902–2913
Article MATH Google Scholar
He J, Chen J-N, Liu S, Kortylewski A, Yang C, Bai Y, Wang C (2022) Transfg: A transformer architecture for fine-grained recognition. In: Proceedings of the AAAI conference on artificial intelligence vol 36, pp 852–860
Sun H, He X, Peng Y (2022) Sim-trans: Structure information modeling transformer for fine-grained visual categorization. In: Proceedings of the 30th ACM international conference on multimedia, pp 5853–5861
Chang D, Pang K, Zheng Y, Ma Z, Song Y-Z, Guo J (2021) Your “flamingo” is my “bird”: Fine-grained, or not. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR) pp 11471–11480
Zhu Q, Li Z, Kuang W, Ma H (2023) A multichannel location-aware interaction network for visual classification. Appl Intell 53(20):23049–23066
Article MATH Google Scholar
Chen H, Cheng L, Huang G, Zhang G, Lan J, Yu Z, Pun C-M, Ling W-K (2022) Fine-grained visual classification with multi-scale features based on self-supervised attention filtering mechanism. Appl Intell 52(13):15673–15689
Article Google Scholar
Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Wu M, Guo J, Song Y-Z (2020) The devil is in the channels: Mutual-channel loss for fine-grained image classification. IEEE Trans Image Process 29:4683–4695
Article Google Scholar
Xu M, Qin L, Chen W, Pu S, Zhang L (2023) Multi-view adversarial discriminator: Mine the non-causal factors for object detection in unseen domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8103–8112
Deng W, Marsh J, Gould S, Zheng L (2022) Fine-grained classification via categorical memory networks. IEEE Trans Image Process 31:4186–4196
Article MATH Google Scholar
Min S, Yao H, Xie H, Zha Z-J, Zhang Y (2020) Multi-objective matrix normalization for fine-grained visual recognition. IEEE Trans Image Process 29:4996–5009
Article MATH Google Scholar
Sun L, Guan X, Yang Y, Zhang L (2020) Text-embedded bilinear model for fine-grained visual recognition. In: Proceedings of the 28th ACM international conference on multimedia, pp 211–219
Tan M, Yuan F, Yu J, Wang G, Gu X (2022) Fine-grained image classification via multi-scale selective hierarchical biquadratic pooling. ACM Trans Multimed Comput Commun Appl (TOMM) 18(1s):1–23
Article Google Scholar
Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5157–5166
Zhang K, Fan J, Huang S, Qiao Y, Yu X, Qin F (2022) Cekd: Cross ensemble knowledge distillation for augmented fine-grained data. Appl Intell 52(14):16640–16650
Article Google Scholar
Ma L, Zhao F, Hong H, Wang L, Zhu Y (2023) Complementary parts contrastive learning for fine-grained weakly supervised object co-localization. IEEE Trans Circ Syst Vid Technol 1:1
MATH Google Scholar
Wang L, He K, Feng X, Ma X (2022) Multilayer feature fusion with parallel convolutional block for fine-grained image classification. Appl Intell 52(3):2872–2883
Article MATH Google Scholar
Xu K, Lai R, Gu L, Li Y (2023) Multiresolution discriminative mixup network for fine-grained visual categorization. IEEE Trans Neural Netw Learn Syst 34(7):3488–3500
Article MATH Google Scholar
Wan R, Zhou J, Huang B, Zeng H, Fan Y (2022) Apmc: Adjacent pixels based measurement coding system for compressively sensed images. IEEE Trans Multimed 24:3558–3569
Article Google Scholar
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Liu K, Chen K, Jia K (2022) Convolutional fine-grained classification with self-supervised target relation regularization. IEEE Trans Image Process 31:5570–5584
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

Download references

Acknowledgements

This work was supported by the Fund of the National Key Laboratory for Reliability Physics and Its Application Technology of Electrical Component, under grant 6142806230201, the National Natural Science Foundation of China, under grant 62221005, 62276038, and the Key Cooperation Project of Chongqing Municipal Education Commission, under grant HZ2021008.

Author information

Authors and Affiliations

Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Shan Wu, Jun Hu, Fujin Zhong, Qinghua Zhang & Guoyin Wang
China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, 511370, China
Chen Sun

Authors

Shan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Hu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Fujin Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Qinghua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guoyin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Shan Wu contributed to the conception, data curation, formal analysis, methodology, software, validation and drafted the manuscript. Jun Hu contributed to the conception, data curation, methodology, and critically revised the manuscript. Chen Sun contributed to data curation, funding, methodology, resources, supervision, and revised the manuscript. Fujin Zhong contributed to the conception, methodology, and revised the manuscript. Qinghua Zhang contributed to the conception, funding, and revised the manuscript. Guoyin Wang contributed to the conception, methodology, project administration, supervision, funding, and revised the manuscript. All authors read and approved the final manuscript, and agreed to be accountable for all aspects of the work, ensuring that anyquestions related to the accuracy or integrity of the study are appropriately addressed and resolved.

Corresponding author

Correspondence to Jun Hu.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

A local region $f_{loc}$ of size $(m+k) \times (m+k)$ is represented as (A1) in $f^{1 \times n\times n}(x)$.

$$\begin{aligned} f_{loc}=\begin{bmatrix} a_{ij} & \cdots & a_{i(j+m-1)} & a_{i(j+m)} & \cdots & a_{i(j+m+k-1)}\\ a_{(i+1)j} & \cdots & a_{(i+1)(j+m-1)} & a_{(i+1)(j+m)} & \cdots & a_{(i+1)(j+m+k-1)}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ a_{(i+m-1)j} & \cdots & a_{(i+m-1)(j+m-1)} & a_{(i+m-1)(j+m)} & \cdots & a_{(i+m-1)(j+m+k-1)}\\ a_{(i+m)j} & \cdots & a_{(i+m)(j+m-1)} & a_{(i+m)(j+m)} & \cdots & a_{(i+m)(j+m+k-1)}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ a_{(i+m+k-1)j} & \cdots & a_{(i+m+k-1)(j+m-1)} & a_{(i+m+k-1)(j+m)} & \cdots & a_{(i+m+k-1)(j+m+k-1)} \end{bmatrix} \end{aligned}$$

(A1)

where $(i+m+k-1) \le n$ and $(j+m+k-1) \le n. i,j\in \{1,2,\dots ,n\}$. The variance of the local region $f_{loc}$ is given by (A2).

$$\begin{aligned} \sigma ^{2}_{f_{loc}}=\frac{1}{(m+k-1)}\sum \limits _{g=0}^{m+k-1}[\sum \limits _{p=i}^{i+m+k-1}(a_{p(j+g)}-\frac{1}{m+k}\sum \limits _{l=0}^{m+k-1}a_{p(j+l)})^2] \end{aligned}$$

(A2)

To explain it more clearly, $\sigma ^2_{f_{loc}}$ is transformed as the following (A3).

$$\begin{aligned} \begin{aligned}&\sigma ^2_{f_{loc}}=\frac{1}{(m+k-1)}\sum \limits _{p=i}^{i+m-1}\{ \sum \limits _{g=0}^{m-1}[(\frac{\sum \limits _{l=0}^{m-1}(a_{p(j+g)}-a_{p(j+l)})+ \sum \limits _{l=m}^{m+k-1}(a_{p(j+g)}-a_{p(j+l)})}{m+k})^2]\\&+\sum \limits _{g=m}^{m+k-1}[(\frac{ \sum \limits _{l=0}^{m-1}(a_{p(j+g)}-a_{p_(j+l)})+\sum \limits _{l=m}^{m+k-1}(a_{p(j+g)}-a_{p(j+l)}) }{m+k})^2] \}\\&+\frac{1}{(m+k-1)}\sum \limits _{p=i+m}^{i+m+k-1}\{\sum \limits _{g=0}^{m+k-1} [(\frac{ \sum \limits _{l=0}^{m+k-1}(a_{p(j+g)}-a_{p(j+l)} }{m+k})^2] \} \end{aligned} \end{aligned}$$

(A3)

The resulting local region $f_{loc}^{'}$ after exchanged is represented as (A4).

$$\begin{aligned} f_{loc}^{'}=\begin{bmatrix} a_{rh} & \cdots & a_{r(h+m-1)} & a_{i(j+m)} & \cdots & a_{i(j+m+k-1)}\\ a_{(r+1)h} & \cdots & a_{(r+1)(h+m-1)} & a_{(i+1)(j+m)} & \cdots & a_{(i+1)(j+m+k-1)}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ a_{(r+m-1)h} & \cdots & a_{(r+m-1)(h+m-1)} & a_{(i+m-1)(j+m)} & \cdots & a_{(i+m-1)(j+m+k-1)}\\ a_{(i+m)j} & \cdots & a_{(i+m)(j+m-1)} & a_{(i+m)(j+m)} & \cdots & a_{(i+m)(j+m+k-1)}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ a_{(i+m+k-1)j} & \cdots & a_{(i+m+k-1)(j+m-1)} & a_{(i+m+k-1)(j+m)} & \cdots & a_{(i+m+k-1)(j+m+k-1)} \end{bmatrix} \end{aligned}$$

(A4)

where $r,h\in \{1,2,\dots ,n\},r+m-1\le n,h+m-1\le n$. The variance $\sigma ^2_{f^{'}_{loc}}$ of the local region $f^{'}_{loc}$ can be obtained through the same method as before (2).

$$\begin{aligned} \begin{aligned}&\sigma ^2_{f^{'}_{loc}}=\frac{1}{(m+k-1)}\sum \limits _{p^{'}=r}^{r+m-1}\{ \sum \limits _{g=0}^{m-1}[ (\frac{ \sum \limits _{l=0}^{m-1}(a_{p^{'}(h+g)}-a_{p^{'}(h+l)})+\sum \limits _{l=m}^{m+k-1}(a_{p^{'}(h+g)}-a_{p^{'}(j+l)}) }{m+k})^2 ]\\&+\sum \limits _{g=m}^{m+k-1}[ (\frac{\sum \limits _{l=0}^{m-1}(a_{p^{'}(j+g)}-a_{p^{'}(h+l)})+\sum \limits _{l=m}^{m+k-1}(a_{p^{'}(j+g)}-a_{p^{'}(j+l)}) }{m+k})^2 ] \}\\&+\frac{1}{(m+k-1)}\sum \limits _{p^{'}=i+m}^{i+m+k-1}\{ \sum \limits _{g=0}^{m+k-1}[(\frac{ \sum \limits _{l=0}^{m+k-1}(a_{p^{'}(j+g)}-a_{p_{'}(j+l)}) }{m+k})^2] \} \end{aligned} \end{aligned}$$

(A5)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, S., Hu, J., Sun, C. et al. A cross-granularity feature fusion method for fine-grained image recognition. Appl Intell 55, 42 (2025). https://doi.org/10.1007/s10489-024-05891-3

Download citation

Accepted: 25 October 2024
Published: 28 November 2024
DOI: https://doi.org/10.1007/s10489-024-05891-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A cross-granularity feature fusion method for fine-grained image recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fine-grained image recognition via trusted multi-granularity information fusion

Refining deep convolutional features for improving fine-grained image recognition

A Detection Network United Local Feature Points and Components for Fine-Grained Image Classification

Data Availability

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A cross-granularity feature fusion method for fine-grained image recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fine-grained image recognition via trusted multi-granularity information fusion

Refining deep convolutional features for improving fine-grained image recognition

A Detection Network United Local Feature Points and Components for Fine-Grained Image Classification

Explore related subjects

Data Availability

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation