Aggregate attention module for fine-grained image classification

Wang, Xingmei; Shi, Jiahao; Fujita, Hamido; Zhao, Yilin

doi:10.1007/s12652-021-03599-7

Aggregate attention module for fine-grained image classification

Original Research
Published: 20 November 2021

Volume 14, pages 8335–8345, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Xingmei Wang¹,
Jiahao Shi¹,
Hamido Fujita ORCID: orcid.org/0000-0001-5256-210X^2,3,4 &
…
Yilin Zhao¹

553 Accesses
4 Citations
Explore all metrics

Abstract

According to huge intra-class diversity and inter-class differences, fine-grained image classification has been a difficult topic for a long time. Attention mechanism has proven to be useful to aggregate features and discover discriminative local details. However, one issue is that the increasing parameters lead to unnecessary computation. In this paper, an attention mechanism named aggregate attention module is proposed to classify fine-grained images accurately with fewer parameters. Specifically, to balance the trade-off between performance and complexity, the proposed attention module combines channel attention with spatial attention in parallel, which effectively learns the key features and can be extended to other neural models easily. Meanwhile, we design cross-channel loss to explore discriminative fine-grained categories. Compared with state-of-the-art models, experiments reveal that our proposed model can achieve superior accuracy using different fine-grained image benchmarks (CUB-200-2011, FGVC Aircraft and Stanford Cars). To verify the effectiveness, we further evaluate our method through ablation study and visualization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fusing bilinear multi-channel gated vector for fine-grained classification

Article 07 February 2023

Fine-grained visual classification with multi-scale features based on self-supervised attention filtering mechanism

Article 17 March 2022

Attention cutting and padding learning for fine-grained image recognition

Article 06 August 2021

References

Bindhu A, Lakshmipriya K, Maheswari OU (2021) Classification based on underwater degradation using neural network. J Ambient Intell Humaniz Comput 12:7511–7518
Article Google Scholar
Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. In: Proceedings of the British machine vision conference, Publishing, pp 1–14
Chen J, Yang M, Ling J (2021) Attention-based label consistency for semi-supervised deep learning based image classification. Neurocomputing 453:731–741
Article Google Scholar
Cho K, Gulcehre BMC, Bahdanau D, Schwenk FBH, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. EMNLP 10:1724–1734
Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In:  Proceedings of the IEEE conference on computer vision and pattern recognition, pp 886–893
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 4438–4446
Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 317–326
Ge W, Lin X, Yu Y (2019) Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Publishing, pp 3034–3043
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 770–778
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42:2011–2023
Article Google Scholar
Khosla A, Jayadevaprakash N, Yao B, Li F-F (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proceedings of CVPR workshop on fine-grained visual categorization (FGVC), 2(1)
Kong S, Fowlkes C (2017) Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 365–374
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3D object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, Publishing, pp 554–561
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Lam M, Mahasseni B (2017) Fine-grained recognition as hsnet search for informative image parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 2520–2529
Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 3367–3375
Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, Publishing, pp 1449–1457
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, Publishing, pp 1150–1157
Maji S, Kannala J, Rahtu E, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2020) Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis 128:336–359
Article Google Scholar
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39:640–651
Article Google Scholar
Shi WW, Gong YH, Tho XY, Cheng D, Zheng NN (2019) Fine-grained image classification using modified DCNNs trained by cascaded softmax and generalized large-margin losses. IEEE Trans Neural Netw Learn Syst 30:683–694
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 1–14
Solmaz B, Gundogdu E, Yucesoy V, Koç A, Alatan AA (2018) Fine-grained recognition of maritime vessels and land vehicles by deep feature embedding. IET Comput Vis 12:1121–1132
Article Google Scholar
Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European conference on computer vision (ECCV), Publishing, pp 805–821
Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12:97–136
Article Google Scholar
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200-2011 dataset. Techique Report CNS-TR-2011-001
Wang D, Shen Z, Shao J, Zhang W, Xue X, Zhang Z (2015) Multiple granularity descriptors for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision, Publishing, pp 2399–2406
Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 4148–4157
Wang Q, Wu B, Zhu P, Li P, Hu Q, Recognition P (2020a) (CVPR), Publishing, pp 11531–11539
Wang W, Cui Y, Li G, Jiang C, Deng S (2020b) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32:14613–14622
Article Google Scholar
Wei X-S, Xie C-W, Wu J, Shen C (2018) Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714
Article Google Scholar
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision, Publishing, pp 499–515
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), Publishing, pp 3–19
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 842–850
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based R-CNNs for fine-grained category detection. In: European conference on computer vision, Publishing, pp 834–849
Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimedia 19:1245–1256
Article Google Scholar
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, Publishing, pp 5209–5217
Zheng H, Fu J, Zha Z, Luo J (2019) Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Publishing, pp 5012–5021
Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2018) Attention couplenet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28:113–126
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 41876110).

Author information

Authors and Affiliations

College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, China
Xingmei Wang, Jiahao Shi & Yilin Zhao
Faculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, Vietnam
Hamido Fujita
Andalusian Research Institute in Data Science and Computational Intelligence, University of Granada, Granada, Spain
Hamido Fujita
College of Mathematical Sciences, Harbin Engineering University, Harbin, 150001, China
Hamido Fujita

Authors

Xingmei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiahao Shi
View author publications
You can also search for this author in PubMed Google Scholar
Hamido Fujita
View author publications
You can also search for this author in PubMed Google Scholar
Yilin Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamido Fujita.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Shi, J., Fujita, H. et al. Aggregate attention module for fine-grained image classification. J Ambient Intell Human Comput 14, 8335–8345 (2023). https://doi.org/10.1007/s12652-021-03599-7

Download citation

Received: 21 June 2021
Accepted: 03 November 2021
Published: 20 November 2021
Issue Date: July 2023
DOI: https://doi.org/10.1007/s12652-021-03599-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aggregate attention module for fine-grained image classification

Abstract

Access this article

Similar content being viewed by others

Fusing bilinear multi-channel gated vector for fine-grained classification

Fine-grained visual classification with multi-scale features based on self-supervised attention filtering mechanism

Attention cutting and padding learning for fine-grained image recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Aggregate attention module for fine-grained image classification

Abstract

Access this article

Similar content being viewed by others

Fusing bilinear multi-channel gated vector for fine-grained classification

Fine-grained visual classification with multi-scale features based on self-supervised attention filtering mechanism

Attention cutting and padding learning for fine-grained image recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation