Abstract
According to huge intra-class diversity and inter-class differences, fine-grained image classification has been a difficult topic for a long time. Attention mechanism has proven to be useful to aggregate features and discover discriminative local details. However, one issue is that the increasing parameters lead to unnecessary computation. In this paper, an attention mechanism named aggregate attention module is proposed to classify fine-grained images accurately with fewer parameters. Specifically, to balance the trade-off between performance and complexity, the proposed attention module combines channel attention with spatial attention in parallel, which effectively learns the key features and can be extended to other neural models easily. Meanwhile, we design cross-channel loss to explore discriminative fine-grained categories. Compared with state-of-the-art models, experiments reveal that our proposed model can achieve superior accuracy using different fine-grained image benchmarks (CUB-200-2011, FGVC Aircraft and Stanford Cars). To verify the effectiveness, we further evaluate our method through ablation study and visualization.
Similar content being viewed by others
References
Bindhu A, Lakshmipriya K, Maheswari OU (2021) Classification based on underwater degradation using neural network. J Ambient Intell Humaniz Comput 12:7511–7518
Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. In: Proceedings of the British machine vision conference, Publishing, pp 1–14
Chen J, Yang M, Ling J (2021) Attention-based label consistency for semi-supervised deep learning based image classification. Neurocomputing 453:731–741
Cho K, Gulcehre BMC, Bahdanau D, Schwenk FBH, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. EMNLP 10:1724–1734
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 886–893
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 4438–4446
Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 317–326
Ge W, Lin X, Yu Y (2019) Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Publishing, pp 3034–3043
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 770–778
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42:2011–2023
Khosla A, Jayadevaprakash N, Yao B, Li F-F (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proceedings of CVPR workshop on fine-grained visual categorization (FGVC), 2(1)
Kong S, Fowlkes C (2017) Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 365–374
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3D object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, Publishing, pp 554–561
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Lam M, Mahasseni B (2017) Fine-grained recognition as hsnet search for informative image parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 2520–2529
Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 3367–3375
Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, Publishing, pp 1449–1457
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, Publishing, pp 1150–1157
Maji S, Kannala J, Rahtu E, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2020) Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis 128:336–359
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39:640–651
Shi WW, Gong YH, Tho XY, Cheng D, Zheng NN (2019) Fine-grained image classification using modified DCNNs trained by cascaded softmax and generalized large-margin losses. IEEE Trans Neural Netw Learn Syst 30:683–694
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 1–14
Solmaz B, Gundogdu E, Yucesoy V, Koç A, Alatan AA (2018) Fine-grained recognition of maritime vessels and land vehicles by deep feature embedding. IET Comput Vis 12:1121–1132
Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European conference on computer vision (ECCV), Publishing, pp 805–821
Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12:97–136
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200-2011 dataset. Techique Report CNS-TR-2011-001
Wang D, Shen Z, Shao J, Zhang W, Xue X, Zhang Z (2015) Multiple granularity descriptors for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision, Publishing, pp 2399–2406
Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 4148–4157
Wang Q, Wu B, Zhu P, Li P, Hu Q, Recognition P (2020a) (CVPR), Publishing, pp 11531–11539
Wang W, Cui Y, Li G, Jiang C, Deng S (2020b) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32:14613–14622
Wei X-S, Xie C-W, Wu J, Shen C (2018) Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision, Publishing, pp 499–515
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), Publishing, pp 3–19
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 842–850
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based R-CNNs for fine-grained category detection. In: European conference on computer vision, Publishing, pp 834–849
Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimedia 19:1245–1256
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, Publishing, pp 5209–5217
Zheng H, Fu J, Zha Z, Luo J (2019) Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Publishing, pp 5012–5021
Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2018) Attention couplenet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28:113–126
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. 41876110).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, X., Shi, J., Fujita, H. et al. Aggregate attention module for fine-grained image classification. J Ambient Intell Human Comput 14, 8335–8345 (2023). https://doi.org/10.1007/s12652-021-03599-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03599-7