Skip to main content
Log in

Weakly supervised fine-grained recognition based on spatial-channel aware attention filters

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Fine-grained recognition is a very challenging issue, since it is difficulty to mine discriminative and subtle feature for objects with similar visual appearance. Because massive manual annotations (e.g., bounding box for discriminative regions) are time-consuming and labor-consuming, existing methods designed single form of attention model outputted discriminative regions in a weakly supervised way. In this paper, we proposed a novel method named a Spatial-Channel Aware Attention Filters (SCAF) to address the weakly supervised fine-grained recognition problem. Compared with the previous attention models, SCAF can obtain attentions-aware features from two dimensions, i.e., the spatial location of image and the channel of feature maps. With the proposed SCAF, the model can enhance the discriminative regions on both spatial and channel dimensions simultaneously. In addition, the multi-channel network multi-level structure are designed to extract richer regional features. Moreover, focal loss is introduced to balance the samples’ distribution of fine-grained image dataset. Comprehensive and comparable experiments are conducted in publicly available datasets, and the experimental results show that our method can achieve the state-of-the-art performance on fine-grained recognition tasks. For instance, we achieve 99.370%, 80.749% accuracy on two underwater datasets respectively, i.e., Fish4Knowlege and Wild Fish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Branson S, Horn G Van, Perona P, Belongie S (2014) Improved Bird Species Recognition Using Pose Normalized Deep Convolutional Nets. In: Valstar M, French A, Pridmore T, Proceedings of the British Machine Vision Conference (BMVA Press). https://doi.org/10.5244/C.28.87

  2. Chai Y, Rahtu E, Lempitsky V, Van Gool L., Zisserman A (2012) TriCoS: A Tri-level Class-Discriminative Co-segmentation Method for Image Classification. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer Berlin Heidelberg, Berlin, Heidelberg pp 794–807

  3. Dai Y, Jin T, Song Y, Du H, Zhao D (Jul. 2019) CNN-based multiple-input multiple-output radar image enhancement method. J Eng 2019(20):6840–6844

    Article  Google Scholar 

  4. Donahue J et al (2013) DeCAF: a deep convolutional activation feature for generic visual recognition. In: Proceedings of International Conference on International Conference on Machine Learning (ICML), Beijing, China, pp 647–655

  5. Fu J, Zheng H, Mei T (2017) Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://www.microsoft.com/en-us/research/publication/look-closer-see-better-recurrent-attention-convolutional-neural-network-fine-grained-imagerecognition/

  6. Ge W, Lin X, Yu Y (2019) Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification from the Bottom Up. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00315

  7. He X, Peng Y, Zhao J (2017) Fine-grained discriminative localization via saliency-guided faster R-CNN. In: Proceedings of the 25th ACM international conference on Multimedia, California, pp 627–635

  8. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2913372

  9. Jaderberg M, Simonyan K, Zisserman A, and others (2015) Spatial transformer networks. In: Proceedings of the 28th international conference on neural information processing systems. MIT Press, Montreal, Canada, pp 2017–2025. https://doi.org/10.5555/2969442.2969465

  10. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops. pp 554–561. https://doi.org/10.1109/ICCVW.2013.77

  11. Lam M, Mahasseni B, Todorovic S (2017) Fine-grained recognition as hsnet search for informative image parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2017.688

  12. Li S, Liu X, Wu L, Ma H, H. Zhang (2016) A discriminative null space based deep learning approach for person re-identification. In: Proceedings of 2016 4th International Conference on Cloud Computing and Intelligence Systems (CCIS), Beijing, China, pp 480–484

  13. Lin TY, RoyChowdhury A, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Santiago, pp 1449–1457.

  14. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision ICCV

  15. Liu X, Xia T, Wang J, Yang Y, Zhou F, Lin Y (2017) Fully Convolutional Attention Networks for Fine-Grained Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas

  16. Lopez PR, Dorta DV, Preixens GC, Sitjes JMG, Marva FXR, Gonzalez J (2020) Pay attention to the activations: a modular attention mechanism for fine-grained image recognition. IEEE Trans Multimed 22(2):502–514

    Article  Google Scholar 

  17. Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft

  18. Nie J, Huang L, Zhang W, Wei G, Wei Z (2019) Deep feature ranking for person re-identification. IEEE Access, p 1. https://doi.org/10.1109/ACCESS.2019.2894347

  19. Peng Y, He X, Zhao J (2018) Object-part attention model for fine-grained image classification. IEEE Trans Image Process Publ IEEE Signal Process Soc 27(3):1487–1500. https://doi.org/10.1109/TIP.2017.2774041

  20. Peng Y, Qi J, Huang X (2019) Research status and Prospect of multimedia content understanding. J Comput Res Develop 56(1):183–208

    Google Scholar 

  21. Qin H, Xiu L, Jian L, Peng Y, Zhang C (2016) DeepFish: Accurate underwater live fish recognition with a deep architecture. Neurocomputing 187:49–58. https://doi.org/10.1016/j.neucom.2015.10.122

  22. Sermanet P, Frome A, Real E (Dec. 2014) Attention for fine-grained categorization. Comput Sci 10(1):224–230

    Google Scholar 

  23. Shi Z, Hao H, Zhao M, Feng Y, He L, Wang Y, Suzuki K (2019) A deep CNN based transfer learning method for false positive reduction. Multim Tools Appl 78(1):1017–1033

    Article  Google Scholar 

  24. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv

  25. Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer International Publishing, Cham, pp 834–850

  26. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. California Institute of Technology, Pasadena

    Google Scholar 

  27. Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, pp 4148–4157

  28. Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Boston, pp 842–850

  29. Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: Proceedings of the European Conference on Computer Vision (ECCV), Munich, pp 420–435

  30. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-Based R-CNNs for Fine-Grained Category Detection. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer International Publishing, Cham, pp 834–849. https://doi.org/10.1007/978-3-319-10590-1_54

  31. Zhang X, Xiong H, Zhou W, Lin W, Qi T (2016) Picking Deep Filter Responses for Fine-Grained Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp 1134–1142

  32. Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, pp 5012–5021

  33. Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, pp 5219–5227. https://doi.org/10.1109/ICCV.2017.557

  34. Zhuang P, Wang Y, Qiao Y (2018) Wildfish: A large benchmark for fish recognition in the wild. pp 1301–1309. https://doi.org/10.1145/3240508.3240616

Download references

Acknowledgments

This work is supported by the National Key R&D Program of China (2019YFD0900401); National Natural Science Foundation of China (No.61872326, No.61672475); Shandong Provincial Natural Science Foundation (ZR2019MF044).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Huang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, N., Huang, L., Wei, Z. et al. Weakly supervised fine-grained recognition based on spatial-channel aware attention filters. Multimed Tools Appl 80, 14409–14427 (2021). https://doi.org/10.1007/s11042-020-10268-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10268-y

Keywords

Navigation