Abstract
Fine-grained recognition is a very challenging issue, since it is difficulty to mine discriminative and subtle feature for objects with similar visual appearance. Because massive manual annotations (e.g., bounding box for discriminative regions) are time-consuming and labor-consuming, existing methods designed single form of attention model outputted discriminative regions in a weakly supervised way. In this paper, we proposed a novel method named a Spatial-Channel Aware Attention Filters (SCAF) to address the weakly supervised fine-grained recognition problem. Compared with the previous attention models, SCAF can obtain attentions-aware features from two dimensions, i.e., the spatial location of image and the channel of feature maps. With the proposed SCAF, the model can enhance the discriminative regions on both spatial and channel dimensions simultaneously. In addition, the multi-channel network multi-level structure are designed to extract richer regional features. Moreover, focal loss is introduced to balance the samples’ distribution of fine-grained image dataset. Comprehensive and comparable experiments are conducted in publicly available datasets, and the experimental results show that our method can achieve the state-of-the-art performance on fine-grained recognition tasks. For instance, we achieve 99.370%, 80.749% accuracy on two underwater datasets respectively, i.e., Fish4Knowlege and Wild Fish.








We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
References
Branson S, Horn G Van, Perona P, Belongie S (2014) Improved Bird Species Recognition Using Pose Normalized Deep Convolutional Nets. In: Valstar M, French A, Pridmore T, Proceedings of the British Machine Vision Conference (BMVA Press). https://doi.org/10.5244/C.28.87
Chai Y, Rahtu E, Lempitsky V, Van Gool L., Zisserman A (2012) TriCoS: A Tri-level Class-Discriminative Co-segmentation Method for Image Classification. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer Berlin Heidelberg, Berlin, Heidelberg pp 794–807
Dai Y, Jin T, Song Y, Du H, Zhao D (Jul. 2019) CNN-based multiple-input multiple-output radar image enhancement method. J Eng 2019(20):6840–6844
Donahue J et al (2013) DeCAF: a deep convolutional activation feature for generic visual recognition. In: Proceedings of International Conference on International Conference on Machine Learning (ICML), Beijing, China, pp 647–655
Fu J, Zheng H, Mei T (2017) Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://www.microsoft.com/en-us/research/publication/look-closer-see-better-recurrent-attention-convolutional-neural-network-fine-grained-imagerecognition/
Ge W, Lin X, Yu Y (2019) Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification from the Bottom Up. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00315
He X, Peng Y, Zhao J (2017) Fine-grained discriminative localization via saliency-guided faster R-CNN. In: Proceedings of the 25th ACM international conference on Multimedia, California, pp 627–635
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2913372
Jaderberg M, Simonyan K, Zisserman A, and others (2015) Spatial transformer networks. In: Proceedings of the 28th international conference on neural information processing systems. MIT Press, Montreal, Canada, pp 2017–2025. https://doi.org/10.5555/2969442.2969465
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops. pp 554–561. https://doi.org/10.1109/ICCVW.2013.77
Lam M, Mahasseni B, Todorovic S (2017) Fine-grained recognition as hsnet search for informative image parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2017.688
Li S, Liu X, Wu L, Ma H, H. Zhang (2016) A discriminative null space based deep learning approach for person re-identification. In: Proceedings of 2016 4th International Conference on Cloud Computing and Intelligence Systems (CCIS), Beijing, China, pp 480–484
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Santiago, pp 1449–1457.
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision ICCV
Liu X, Xia T, Wang J, Yang Y, Zhou F, Lin Y (2017) Fully Convolutional Attention Networks for Fine-Grained Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas
Lopez PR, Dorta DV, Preixens GC, Sitjes JMG, Marva FXR, Gonzalez J (2020) Pay attention to the activations: a modular attention mechanism for fine-grained image recognition. IEEE Trans Multimed 22(2):502–514
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft
Nie J, Huang L, Zhang W, Wei G, Wei Z (2019) Deep feature ranking for person re-identification. IEEE Access, p 1. https://doi.org/10.1109/ACCESS.2019.2894347
Peng Y, He X, Zhao J (2018) Object-part attention model for fine-grained image classification. IEEE Trans Image Process Publ IEEE Signal Process Soc 27(3):1487–1500. https://doi.org/10.1109/TIP.2017.2774041
Peng Y, Qi J, Huang X (2019) Research status and Prospect of multimedia content understanding. J Comput Res Develop 56(1):183–208
Qin H, Xiu L, Jian L, Peng Y, Zhang C (2016) DeepFish: Accurate underwater live fish recognition with a deep architecture. Neurocomputing 187:49–58. https://doi.org/10.1016/j.neucom.2015.10.122
Sermanet P, Frome A, Real E (Dec. 2014) Attention for fine-grained categorization. Comput Sci 10(1):224–230
Shi Z, Hao H, Zhao M, Feng Y, He L, Wang Y, Suzuki K (2019) A deep CNN based transfer learning method for false positive reduction. Multim Tools Appl 78(1):1017–1033
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv
Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer International Publishing, Cham, pp 834–850
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. California Institute of Technology, Pasadena
Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, pp 4148–4157
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Boston, pp 842–850
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: Proceedings of the European Conference on Computer Vision (ECCV), Munich, pp 420–435
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-Based R-CNNs for Fine-Grained Category Detection. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer International Publishing, Cham, pp 834–849. https://doi.org/10.1007/978-3-319-10590-1_54
Zhang X, Xiong H, Zhou W, Lin W, Qi T (2016) Picking Deep Filter Responses for Fine-Grained Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp 1134–1142
Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, pp 5012–5021
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, pp 5219–5227. https://doi.org/10.1109/ICCV.2017.557
Zhuang P, Wang Y, Qiao Y (2018) Wildfish: A large benchmark for fish recognition in the wild. pp 1301–1309. https://doi.org/10.1145/3240508.3240616
Acknowledgments
This work is supported by the National Key R&D Program of China (2019YFD0900401); National Natural Science Foundation of China (No.61872326, No.61672475); Shandong Provincial Natural Science Foundation (ZR2019MF044).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yu, N., Huang, L., Wei, Z. et al. Weakly supervised fine-grained recognition based on spatial-channel aware attention filters. Multimed Tools Appl 80, 14409–14427 (2021). https://doi.org/10.1007/s11042-020-10268-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10268-y