Skip to main content
Log in

Aggregate attention module for fine-grained image classification

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

According to huge intra-class diversity and inter-class differences, fine-grained image classification has been a difficult topic for a long time. Attention mechanism has proven to be useful to aggregate features and discover discriminative local details. However, one issue is that the increasing parameters lead to unnecessary computation. In this paper, an attention mechanism named aggregate attention module is proposed to classify fine-grained images accurately with fewer parameters. Specifically, to balance the trade-off between performance and complexity, the proposed attention module combines channel attention with spatial attention in parallel, which effectively learns the key features and can be extended to other neural models easily. Meanwhile, we design cross-channel loss to explore discriminative fine-grained categories. Compared with state-of-the-art models, experiments reveal that our proposed model can achieve superior accuracy using different fine-grained image benchmarks (CUB-200-2011, FGVC Aircraft and Stanford Cars). To verify the effectiveness, we further evaluate our method through ablation study and visualization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Bindhu A, Lakshmipriya K, Maheswari OU (2021) Classification based on underwater degradation using neural network. J Ambient Intell Humaniz Comput 12:7511–7518

    Article  Google Scholar 

  • Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. In: Proceedings of the British machine vision conference, Publishing, pp 1–14

  • Chen J, Yang M, Ling J (2021) Attention-based label consistency for semi-supervised deep learning based image classification. Neurocomputing 453:731–741

    Article  Google Scholar 

  • Cho K, Gulcehre BMC, Bahdanau D, Schwenk FBH, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. EMNLP 10:1724–1734

    Google Scholar 

  • Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In:  Proceedings of the IEEE conference on computer vision and pattern recognition, pp 886–893

  • Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 4438–4446

  • Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 317–326

  • Ge W, Lin X, Yu Y (2019) Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Publishing, pp 3034–3043

  • Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 770–778

  • Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42:2011–2023

    Article  Google Scholar 

  • Khosla A, Jayadevaprakash N, Yao B, Li F-F (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proceedings of CVPR workshop on fine-grained visual categorization (FGVC), 2(1)

  • Kong S, Fowlkes C (2017) Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 365–374

  • Krause J, Stark M, Deng J, Fei-Fei L (2013) 3D object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, Publishing, pp 554–561

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  • Lam M, Mahasseni B (2017) Fine-grained recognition as hsnet search for informative image parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 2520–2529

  • Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 3367–3375

  • Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, Publishing, pp 1449–1457

  • Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, Publishing, pp 1150–1157

  • Maji S, Kannala J, Rahtu E, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151

  • Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2020) Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis 128:336–359

    Article  Google Scholar 

  • Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39:640–651

    Article  Google Scholar 

  • Shi WW, Gong YH, Tho XY, Cheng D, Zheng NN (2019) Fine-grained image classification using modified DCNNs trained by cascaded softmax and generalized large-margin losses. IEEE Trans Neural Netw Learn Syst 30:683–694

    Article  Google Scholar 

  • Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 1–14

  • Solmaz B, Gundogdu E, Yucesoy V, Koç A, Alatan AA (2018) Fine-grained recognition of maritime vessels and land vehicles by deep feature embedding. IET Comput Vis 12:1121–1132

    Article  Google Scholar 

  • Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European conference on computer vision (ECCV), Publishing, pp 805–821

  • Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12:97–136

    Article  Google Scholar 

  • Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200-2011 dataset. Techique Report CNS-TR-2011-001

  • Wang D, Shen Z, Shao J, Zhang W, Xue X, Zhang Z (2015) Multiple granularity descriptors for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision, Publishing, pp 2399–2406

  • Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 4148–4157

  • Wang Q, Wu B, Zhu P, Li P, Hu Q, Recognition P (2020a) (CVPR), Publishing, pp 11531–11539

  • Wang W, Cui Y, Li G, Jiang C, Deng S (2020b) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32:14613–14622

    Article  Google Scholar 

  • Wei X-S, Xie C-W, Wu J, Shen C (2018) Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714

    Article  Google Scholar 

  • Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision, Publishing, pp 499–515

  • Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), Publishing, pp 3–19

  • Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Publishing, pp 842–850

  • Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based R-CNNs for fine-grained category detection. In: European conference on computer vision, Publishing, pp 834–849

  • Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimedia 19:1245–1256

    Article  Google Scholar 

  • Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, Publishing, pp 5209–5217

  • Zheng H, Fu J, Zha Z, Luo J (2019) Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Publishing, pp 5012–5021

  • Zhu Y, Zhao C, Guo H, Wang J, Zhao X, Lu H (2018) Attention couplenet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process 28:113–126

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 41876110).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamido Fujita.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Shi, J., Fujita, H. et al. Aggregate attention module for fine-grained image classification. J Ambient Intell Human Comput 14, 8335–8345 (2023). https://doi.org/10.1007/s12652-021-03599-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-03599-7

Keywords

Navigation