Skip to main content
Log in

Subtler mixed attention network on fine-grained image classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The key of fine-grained image categorization is to locate discriminative regions and feature extraction from these regions correspond to subtle visual traits. Some of the current methods use the attention mechanism to identify the discriminative region, but ignore that there is still a large amount of non-foreground noise information in these regions. In this work, we propose a Subtler Mixed Attention Network (SMA-Net), which contains two modules: 1) Discriminative region location module uses the channel attention mechanism to construct a feature pyramid network to locate the discriminative regions. And use the positive effect of classification to screen a group of the most discriminative regions and learn through rank to learn. 2) Mixed attention module (MAM) of feature extraction that can focus on subtler and differentiated regions. We divide the feature map into intervals according to regions, and learn attention features according to regional orientation. Then the attention maps are multiplied to the input feature map for adaptive features reinforce. At the same time, MAM is a lightweight module that can be easily integrated into advanced networks without increasing too much calculation. We validated our SMA-Net through substantial experiments on Caltech-UCSD Birds (CUB-200-2011), Stanford Cars, CIFAR-10, Fish4Knowledge and Flower17. In particular, the accuracy on two widely used fine-grained datasets, CUB-2011 and Stanford Cars, reached 87.71% and 94.37%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp 1097–1105

  2. Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1653–1660

  3. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440

  4. Ren S, He K, Girshick R, Sun J (2015) Faster r–cnn: Towards real–time object detection with region proposal networks. In: Neural Information Processing Systems, pp 91–99

  5. Yuan C, Wu Y, Qin X et al (2019) An effective image classification method for shallow densely connected convolution networks through squeezing and splitting techniques. Appl Intell 49(10):3570–3586

  6. Liu J, Kanazawa A, Jacobs D, Belhumeur P (2012) Dog breed classification using part localization. In: European Conference on Computer Vision, pp 172–185

  7. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part–based R–CNNs for fine–grained category detection. In: European Conference on Computer Vision, pp 834–849

  8. Gavves E, Fernando B, Snoek CG, Smeulders AW, Tuytelaars T (2014) Fine–grained categorization by alignments. In: IEEE International Conference on Computer Vision, pp 1713–1720

  9. Berg T, Belhumeur P (2013) Poof: Part–based one–vs.–one features for fine–grained categorization, face verification, and attribute estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 955–962

  10. Branson S, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. arXiv:1406.2952

  11. Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine–grained visual categorization. In: IEEE International Conference on Computer Vision, pp 1641– 1648

  12. Chai Y, Lempitsky V, Zisserman A (2013) Symbiotic segmentation and part localization for fine–grained categorization. In: IEEE International Conference on Computer Vision, pp 321–328

  13. Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine–grained image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 4438–4446

  14. Zheng H, Fu J, Mei T, Luo J (2017) Learning multi–attention convolutional neural network for fine–grained image recognition. In: IEEE International Conference on Computer Vision, pp 5209–5217

  15. Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: Learning trilinear attention sampling network for fine–grained image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5012–5021

  16. Farrell R, Oza O, Zhang N, Morariu VI, Darrell T, Davis LS (2011) Birdlets: Subordinate categorization using volumetric primitives and pose–normalized appearance. In: IEEE Conference on International Conference on Computer Vision, pp 161–168

  17. Zhang N, Paluri M, Ranzato MA, Darrell T, Bourdev L (2014) Panda: Pose aligned networks for deep attribute modeling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1637–1644

  18. Lin YL, Morariu VI, Hsu W, Davis LS (2014) Jointly optimizing 3d model fitting and fine–grained classification. In: European Conference on Computer Vision, pp 466–480

  19. Krause J, Jin H, Yang J, Fei-Fei L (2015) Fine-grained recognition without part annotations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5546–5555

  20. Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine–grained visual recognition. In: IEEE International Conference on Computer Vision, pp 1449–1457

  21. Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine–grained classification. In: European Conference on Computer Vision, pp 420–435

  22. Fernando B, Fromont E, Tuytelaars T (2012) Effective use of frequent itemset mining for image classification. In: European Conference on Computer Vision, pp 214–227

  23. Cıbuk M, Budak U, Guo Y, Ince MC, Sengur A (2019) Efficient deep features selections and classification for flower species recognition. Measurement 137:7–13

  24. Wang Z, Wang S, Zhang P, Li H, Zhong W, Li J (2019) Weakly Supervised Fine–grained Image Classification via Correlation–guided Discriminative Learning. In: ACM International Conference on Multimedia, pp 1851–1860

  25. Liu Z, Huang J, Zhu C et al (2020) Residual attention network using multi–channel dense connections for image super–resolution. Applied Intelligence, pp 1–15

  26. Sun M, Yuan Y, Zhou F, Ding E (2018) Multi–attention multi–class constraint for fine–grained image recognition. In: European Conference on Computer Vision, pp 805–821

  27. Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two–level attention models in deep convolutional neural network for fine–grained image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 842–850

  28. Zhu J, Yu J, Wang C, Li FZ (2015) Object recognition via contextual color attention. J Vis Commun Image Represent 27:44–56

  29. Peng Y, He X, Zhao J (2017) Object–part attention model for fine–grained image classification. IEEE Transactions on Image Processing, pp 1487–1500

  30. Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Song YZ (2020) The devil is in the channels: Mutual–channel loss for fine–grained image classification. IEEE Transactions on Image Processing, pp 4683–4695

  31. Gao Y, Han X, Wang X, Huang W, Scott M (2020) Channel Interaction Networks for Fine–Grained Image Categorization. In: the Association for the Advance of Artificial Intelligence, pp 10818–10825

  32. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125

  33. Hu J, Shen L, Sun G (2018) Squeeze–and–excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

  34. Girshick R (2015) Fast r–cnn. In: IEEE International Conference on Computer Vision, pp 1440–1448

  35. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad–cam: Visual explanations from deep networks via gradient–based localization. In: IEEE International Conference on Computer Vision, pp 618–626

  36. Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P (2010) Caltech–UCSD birds 200

  37. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine–grained categorization. In: IEEE International Conference on Computer Vision workshops, pp 554–561

  38. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images

  39. Boom BJ, Huang PX, He J, Fisher RB (2012) Supporting ground–truth annotation of image datasets using clustering. In: 21st International Conference on Pattern Recognition, pp 1542– 1545

  40. Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp 722–729

  41. Liu X, Xia T, Wang J, Yang Y, Zhou F, Lin Y (2016) Fully convolutional attention networks for fine–grained recognition. arXiv:1603.06765

  42. Qin H, Li X, Liang J, Peng Y, Zhang C (2016) DeepFish: Accurate underwater live fish recognition with a deep architecture. Neurocomputing 187:49–58

  43. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp 630–645

  44. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  45. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Tang X (2017) Residual attention network for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61872326, No. 61672475); Shandong Provincial Natural Science Foundation (ZR2019MF044). This work got the GPU computation support from Center for High Performance Computing and System Simulation, Pilot National Laboratory for Marine Science and Technology (Qingdao).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Huang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C., Huang, L., Wei, Z. et al. Subtler mixed attention network on fine-grained image classification. Appl Intell 51, 7903–7916 (2021). https://doi.org/10.1007/s10489-021-02280-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02280-y

Keywords

Navigation