A weakly supervised spatial group attention network for fine-grained visual recognition

Xie, Jiangjian; Zhong, Yujie; Zhang, Junguo; Zhang, Changchun; Schuller, Björn W

doi:10.1007/s10489-023-04627-z

A weakly supervised spatial group attention network for fine-grained visual recognition

Published: 07 July 2023

Volume 53, pages 23301–23315, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jiangjian Xie^1,2,3,
Yujie Zhong¹,
Junguo Zhang^1,2,
Changchun Zhang^1,2 &
…
Björn W Schuller^3,4,5

279 Accesses
Explore all metrics

Abstract

The fine-grained visual recognition is to classify several sub-categories affiliated to the same basic-level category, which is highly challenging because the same sub-category with large variance and different sub-categories with small variance. Previously approaches generally localize the targets or parts first, then determine which sub-category the image is attached to. They depend on target or part annotations, which are labor-intensive and a barrier to moving towards practical use. Other methods indirectly extract recognizable areas from the high-level feature maps, ignoring the spatial relationships between the target and its parts, which may cause inaccurate recognition. In this paper, we propose a weakly supervised spatial group attention network (WSSGA-Net) for fine-grained bird recognition. According to the spatial relationships between the target and its parts, we embed the spatial group attention (SGA) module into the WSSGA-Net to highlight the correct semantic feature regions by establishing a semantic feature space enhancement mechanism. In addition, we apply moment exchange (MoEx) to generate new feature maps by exchanging two input image feature moments for data augmentation. Comprehensive experiments indicate that our approach significantly has a better performance than the state-of-the-art approaches on the standard bird image datasets Bird-65, CUB200-2011 and fine-grained dataset Stanford Cars.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Spatial Group-Wise Enhance: Enhancing Semantic Feature Learning in CNN

Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition

Fine-Grained Image Classification Based on Target Acquisition and Feature Fusion

Data availability

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

Zhao Z, Luo Z, Li J, Wang K, Shi B (2018) Applied Sciences 8(10):1906
Article Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778
K. Simonyan, A. Zisserman, arXiv preprint http://arxiv.org/abs/1409.1556arXiv:1409.1556 (2014)
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 2818–2826
Zheng H, Fu J, Zha ZJ, Luo J, Mei T (2019) IEEE Transactions on Image Processing 29:476
Article Google Scholar
Kim T, Hong K, Byun H (2021) Neurocomputing 439:374
Article Google Scholar
N. Zhang, J. Donahue, R. Girshick, T. Darrell, in European Conference on Computer Vision (Springer, 2014), pp. 834–849
R. Girshick, J. Donahue, T. Darrell, J. Malik, in Proceedings of the IEEE conference on computer vision and pattern recognition (2014), pp. 580–587
S. Branson, G. Van Horn, S. Belongie, P. Perona, arXiv preprint arXiv:1406.2952 (2014)
D. Lin, X. Shen, C. Lu, J. Jia, in Proceedings of the IEEE conference on computer vision and pattern recognition (2015), pp. 1666–1674
T.Y. Lin, A. RoyChowdhury, S. Maji, in Proceedings of the IEEE international conference on computer vision (2015), pp. 1449–1457
C. Yu, X. Zhao, Q. Zheng, P. Zhang, X. You, in Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 574–589
Min S, Yao H, Xie H, Zha ZJ, Zhang Y (2020) IEEE Transactions on Image Processing 29:4996
Article Google Scholar
H. Zhang, T. Xu, M. Elhoseiny, X. Huang, S. Zhang, A. Elgammal, D. Metaxas, in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 1143–1152
Z. Yang, T. Luo, D. Wang, Z. Hu, J. Gao, L. Wang, in Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 420–435
Lin Z, Gao W, Huang F, Jia J (2021) Knowledge-Based Systems 232:107480
Article Google Scholar
Guo C, Lin Y, Chen S, Zeng Z, Shao M, Li S (2022) Knowledge-Based Systems 235:107651
Article Google Scholar
J. Fu, H. Zheng, T. Mei, in Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 4438–4446
H. Zheng, J. Fu, T. Mei, J. Luo, in Proceedings of the IEEE international conference on computer vision (2017), pp. 5209–5217
H. Zheng, J. Fu, Z.J. Zha, J. Luo, in Proceedings of the IEEE conference on computer vision and pattern recognition (2019), pp. 5012–5021
T. Hu, H. Qi, Q. Huang, Y. Lu, arXiv preprint arXiv:1901.09891 (2019)
F. Zhang, M. Li, G. Zhai, Y. Liu, in International Conference on Multimedia Modeling (Springer, 2021), pp. 136–147
Liu C, Huang L, Wei Z, Zhang W (2021) Applied Intelligence 51(11):7903
Article Google Scholar
Ding Y, Ma Z, Wen S, Xie J, Chang D, Si Z, Wu M, Ling H (2021) IEEE Transactions on Image Processing 30:2826
Article Google Scholar
Z. Wang, S. Wang, S. Yang, H. Li, J. Li, Z. Li, in Proceedings of the IEEE conference on computer vision and pattern recognition (2020), pp. 9749–9758
C. Gong, D. Wang, M. Li, V. Chandra, Q. Liu, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2021), pp. 1055–1064
S. Yun, D. Han, S.J. Oh, S. Chun, J. Choe, Y. Yoo, in Proceedings of the IEEE/CVF international conference on computer vision (2019), pp. 6023–6032
J. Yoo, N. Ahn, K.A. Sohn, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 8375–8384
E.D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q.V. Le, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2019), pp. 113–123
B. Li, F. Wu, S.N. Lim, S. Belongie, K.Q. Weinberger, in Proceedings of the IEEE conference on computer vision and pattern recognition (2021), pp. 12,383–12,392
X. Li, X. Hu, J. Yang, arXiv preprint arXiv:1905.09646 (2019)
C. Wah, S. Branson, P. Welinder, P. Perona, S. Belongie, california institute of technology (2011)
Y. Cui, F. Zhou, Y. Lin, S. Belongie, in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 1153–1162
Yao H, Zhang S, Zhang Y, Li J, Tian Q (2016) IEEE Transactions on Image Processing 25(10):4858
Article MathSciNet Google Scholar
M. Jaderberg, K. Simonyan, A. Zisserman, et al., Advances in neural information processing systems 28 (2015)
Y. Wang, V.I. Morariu, L.S. Davis, in Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 4148–4157
Bargal SA, Zunino A, Petsiuk V, Zhang J, Saenko K, Murino V, Sclaroff S (2021) IEEE Transactions on Pattern Analysis and Machine Intelligence 43(11):4196
Article Google Scholar
S. Woo, J. Park, J.Y. Lee, I.S. Kweon, in Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 3–19
Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, in Proceedings of the 2020 IEEE conference on computer vision and pattern recognition, IEEE, Seattle, WA, USA (2020), pp. 13–19
L. Yang, R.Y. Zhang, L. Li, X. Xie, in International conference on machine learning (2021), pp. 11,863–11,874
R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, in Proceedings of the IEEE international conference on computer vision (2017), pp. 618–626
R. Du, D. Chang, A.K. Bhunia, J. Xie, Z. Ma, Y.Z. Song, J. Guo, in European Conference on Computer Vision (Springer, 2020), pp. 153–168

Download references

Acknowledgements

This work was jointly supported by the Fundamental Research Funds for the Central Universities [No. 2021ZY70], the Beijing Municipal Natural Science Foundation [No. 6214040], the China Scholarship Council [No. 202106515010], and the Fundamental Research Funds for the Central Universities [No. BLX202129].

Author information

Authors and Affiliations

School of Technology, Beijing Forestry University, Beijing, 100083, P. R. China
Jiangjian Xie, Yujie Zhong, Junguo Zhang & Changchun Zhang
Research Center for Biodiversity Intelligent Monitoring, Beijing Forestry University, Beijing, 100083, P. R. China
Jiangjian Xie, Junguo Zhang & Changchun Zhang
Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, 86159, Germany
Jiangjian Xie & Björn W Schuller
GLAM – Group on Language Audio and Music, Imperial College London, London, SW7 2AZ, Germany
Björn W Schuller
Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, 86159, Germany
Björn W Schuller

Authors

Jiangjian Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Junguo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Changchun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Björn W Schuller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiangjian Xie.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xie, J., Zhong, Y., Zhang, J. et al. A weakly supervised spatial group attention network for fine-grained visual recognition. Appl Intell 53, 23301–23315 (2023). https://doi.org/10.1007/s10489-023-04627-z

Download citation

Accepted: 08 April 2023
Published: 07 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04627-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions