Abstract
The key of fine-grained image categorization is to locate discriminative regions and feature extraction from these regions correspond to subtle visual traits. Some of the current methods use the attention mechanism to identify the discriminative region, but ignore that there is still a large amount of non-foreground noise information in these regions. In this work, we propose a Subtler Mixed Attention Network (SMA-Net), which contains two modules: 1) Discriminative region location module uses the channel attention mechanism to construct a feature pyramid network to locate the discriminative regions. And use the positive effect of classification to screen a group of the most discriminative regions and learn through rank to learn. 2) Mixed attention module (MAM) of feature extraction that can focus on subtler and differentiated regions. We divide the feature map into intervals according to regions, and learn attention features according to regional orientation. Then the attention maps are multiplied to the input feature map for adaptive features reinforce. At the same time, MAM is a lightweight module that can be easily integrated into advanced networks without increasing too much calculation. We validated our SMA-Net through substantial experiments on Caltech-UCSD Birds (CUB-200-2011), Stanford Cars, CIFAR-10, Fish4Knowledge and Flower17. In particular, the accuracy on two widely used fine-grained datasets, CUB-2011 and Stanford Cars, reached 87.71% and 94.37%, respectively.
Similar content being viewed by others
References
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp 1097–1105
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1653–1660
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440
Ren S, He K, Girshick R, Sun J (2015) Faster r–cnn: Towards real–time object detection with region proposal networks. In: Neural Information Processing Systems, pp 91–99
Yuan C, Wu Y, Qin X et al (2019) An effective image classification method for shallow densely connected convolution networks through squeezing and splitting techniques. Appl Intell 49(10):3570–3586
Liu J, Kanazawa A, Jacobs D, Belhumeur P (2012) Dog breed classification using part localization. In: European Conference on Computer Vision, pp 172–185
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part–based R–CNNs for fine–grained category detection. In: European Conference on Computer Vision, pp 834–849
Gavves E, Fernando B, Snoek CG, Smeulders AW, Tuytelaars T (2014) Fine–grained categorization by alignments. In: IEEE International Conference on Computer Vision, pp 1713–1720
Berg T, Belhumeur P (2013) Poof: Part–based one–vs.–one features for fine–grained categorization, face verification, and attribute estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 955–962
Branson S, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. arXiv:1406.2952
Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine–grained visual categorization. In: IEEE International Conference on Computer Vision, pp 1641– 1648
Chai Y, Lempitsky V, Zisserman A (2013) Symbiotic segmentation and part localization for fine–grained categorization. In: IEEE International Conference on Computer Vision, pp 321–328
Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine–grained image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 4438–4446
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi–attention convolutional neural network for fine–grained image recognition. In: IEEE International Conference on Computer Vision, pp 5209–5217
Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: Learning trilinear attention sampling network for fine–grained image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5012–5021
Farrell R, Oza O, Zhang N, Morariu VI, Darrell T, Davis LS (2011) Birdlets: Subordinate categorization using volumetric primitives and pose–normalized appearance. In: IEEE Conference on International Conference on Computer Vision, pp 161–168
Zhang N, Paluri M, Ranzato MA, Darrell T, Bourdev L (2014) Panda: Pose aligned networks for deep attribute modeling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1637–1644
Lin YL, Morariu VI, Hsu W, Davis LS (2014) Jointly optimizing 3d model fitting and fine–grained classification. In: European Conference on Computer Vision, pp 466–480
Krause J, Jin H, Yang J, Fei-Fei L (2015) Fine-grained recognition without part annotations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5546–5555
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine–grained visual recognition. In: IEEE International Conference on Computer Vision, pp 1449–1457
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine–grained classification. In: European Conference on Computer Vision, pp 420–435
Fernando B, Fromont E, Tuytelaars T (2012) Effective use of frequent itemset mining for image classification. In: European Conference on Computer Vision, pp 214–227
Cıbuk M, Budak U, Guo Y, Ince MC, Sengur A (2019) Efficient deep features selections and classification for flower species recognition. Measurement 137:7–13
Wang Z, Wang S, Zhang P, Li H, Zhong W, Li J (2019) Weakly Supervised Fine–grained Image Classification via Correlation–guided Discriminative Learning. In: ACM International Conference on Multimedia, pp 1851–1860
Liu Z, Huang J, Zhu C et al (2020) Residual attention network using multi–channel dense connections for image super–resolution. Applied Intelligence, pp 1–15
Sun M, Yuan Y, Zhou F, Ding E (2018) Multi–attention multi–class constraint for fine–grained image recognition. In: European Conference on Computer Vision, pp 805–821
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two–level attention models in deep convolutional neural network for fine–grained image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 842–850
Zhu J, Yu J, Wang C, Li FZ (2015) Object recognition via contextual color attention. J Vis Commun Image Represent 27:44–56
Peng Y, He X, Zhao J (2017) Object–part attention model for fine–grained image classification. IEEE Transactions on Image Processing, pp 1487–1500
Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Song YZ (2020) The devil is in the channels: Mutual–channel loss for fine–grained image classification. IEEE Transactions on Image Processing, pp 4683–4695
Gao Y, Han X, Wang X, Huang W, Scott M (2020) Channel Interaction Networks for Fine–Grained Image Categorization. In: the Association for the Advance of Artificial Intelligence, pp 10818–10825
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Hu J, Shen L, Sun G (2018) Squeeze–and–excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Girshick R (2015) Fast r–cnn. In: IEEE International Conference on Computer Vision, pp 1440–1448
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad–cam: Visual explanations from deep networks via gradient–based localization. In: IEEE International Conference on Computer Vision, pp 618–626
Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P (2010) Caltech–UCSD birds 200
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine–grained categorization. In: IEEE International Conference on Computer Vision workshops, pp 554–561
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Boom BJ, Huang PX, He J, Fisher RB (2012) Supporting ground–truth annotation of image datasets using clustering. In: 21st International Conference on Pattern Recognition, pp 1542– 1545
Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp 722–729
Liu X, Xia T, Wang J, Yang Y, Zhou F, Lin Y (2016) Fully convolutional attention networks for fine–grained recognition. arXiv:1603.06765
Qin H, Li X, Liang J, Peng Y, Zhang C (2016) DeepFish: Accurate underwater live fish recognition with a deep architecture. Neurocomputing 187:49–58
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp 630–645
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Tang X (2017) Residual attention network for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 61872326, No. 61672475); Shandong Provincial Natural Science Foundation (ZR2019MF044). This work got the GPU computation support from Center for High Performance Computing and System Simulation, Pilot National Laboratory for Marine Science and Technology (Qingdao).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, C., Huang, L., Wei, Z. et al. Subtler mixed attention network on fine-grained image classification. Appl Intell 51, 7903–7916 (2021). https://doi.org/10.1007/s10489-021-02280-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02280-y