Skip to main content
Log in

Attention cutting and padding learning for fine-grained image recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Fine-grained image recognition is an important task in the field of computer vision. In fine-grained image recognition, the difference between different categories is very small. Thus, fine-grained image recognition highly depends on local features. In this paper, a novel “Attention Cutting And Padding Learning” method is proposed to learn the local features. Firstly, the image is fed to Convolutional Neural Networks, and a saliency map is gotten. According to the saliency map, the attention image is obtained. Secondly, the attention image is cut into \(N*N\) sub-images. Every sub-image is padded by 0 and the padding size is P. All sub-images are spliced into a Cutting And Padding image. Finally, the Cutting And Padding image and the attention image are fed to CNNs to train. In this method, more local features can be learned, and the high-level semantics is not damaged. Experimental results show that the recognition accuracy of Attention Cutting And Padding Learning is 87.9%, 94.6%, and 92.4% respectively on CUB-200-2011, Stanford Cars, and FGVC-Aircraft dataset. Moreover, this method can be easily applied to biodiversity automatic monitoring, intelligent retail, intelligent transportation, and other fields to improve recognition accuracy without changing the network structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Berg T, Liu J, Woo Lee S, Alexander ML, Jacobs DW, Belhumeur PN (2014) Birdsnap: Large-scale fine-grained visual categorization of birds. In Proc IEEE Conf Comput Vis Pattern Recognit 2011–2018

  2. Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In Proc IEEE Conf Comput Vis Pattern Recognit 5157–5166

  3. Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In Proc IEEE Conf Comput Vis Pattern Recognit 4109–4118

  4. Dumoulin V, Visin F (2016) A guide to convolution arithmetic for deep learning. arXiv preprint. arXiv: 1603.07285

  5. Fu J, Zheng H, Mei T (2017) Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In Proc IEEE Conf Comput Vis Pattern Recognit 4438–4446

  6. Guillaumin M, Küttel D, Ferrari V (2014) Imagenet auto-annotation with segmentation propagation. Int J Comput Vis 110(3):328–348

    Article  Google Scholar 

  7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proc IEEE Conf Comput Vis Pattern Recognit 770–778

  8. Huang S, Xu Z, Tao D, Zhang Y (2016) Part-stacked cnn for fine-grained visual categorization. In Proc IEEE Conf Comput Vis Pattern Recognit 1173–1182

  9. Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. In Adv Neural Inf Proces Syst 2017–2025

  10. Krause J, Jin H, Yang J, Fei-Fei L (2015) Fine-grained recognition without part annotations. In Proc IEEE Conf Comput Vis Pattern Recognit 5546–5555

  11. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops 554–561

  12. Kuettel D, Guillaumin M, Ferrari V (2012) Segmentation propagation in imagenet. In European Conference on Computer Vision 459–473. Springer

  13. LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324

    Article  Google Scholar 

  14. Li Z, Yang Y, Liu X, Zhou F, Wen S, Xu W (2017) Dynamic computational time for visual attention. In Proceedings of the IEEE International Conference on Computer Vision 1199–1209

  15. Liu X, Xia T, Wang J, Yang Y, Zhou F, Lin Y (2016) Fully convolutional attention networks for fine-grained recognition. arXiv preprint. arXiv: 1603.06765

  16. Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv preprint. arXiv: 1306.5151

  17. Peng Y, He X, Zhao J (2017) Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing 27(3):1487–1500

    Article  MathSciNet  Google Scholar 

  18. Recasens A, Kellnhofer P, Stent S, Matusik W, Torralba A (2018) Learning to zoom: a saliency-based sampling layer for neural networks. In Proceedings of the European Conference on Computer Vision (ECCV) 51–66

  19. Rodríguez P, Gonfaus JM, Cucurull G, XavierRoca F, Gonzalez J (2018) Attend and rectify: a gated attention mechanism for fine-grained recovery. In Proceedings of the European Conference on Computer Vision (ECCV) 349–364

  20. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  21. Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In Proceedings of the European Conference on Computer Vision (ECCV) 805–821

  22. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  23. Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In Proc IEEE Conf Comput Vis Pattern Recognit 4148–4157

  24. Wei X-S, Xie C-W, Wu J (2016) Mask-cnn: Localizing parts and selecting descriptors for fine-grained image recognition. arXiv preprint arXiv: 1605.06878

  25. Wei X-S, Xie C-W, Wu J, Shen C (2018) Mask-cnn: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognit 76:704–714

    Article  Google Scholar 

  26. Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proc IEEE Conf Comput Vis Pattern Recognit 842–850

  27. Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In Proceedings of the European Conference on Computer Vision (ECCV) 420–435

  28. Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimedia 19(6):1245–1256

    Article  Google Scholar 

  29. Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In Proc IEEE Conf Comput Vis Pattern Recognit 5209–5217

  30. Zheng H, Fu J, Zha Z-J, Luo J (2019) Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In Proc IEEE Conf Comput Vis Pattern Recognit 5012–5021

  31. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In Proc IEEE Conf Comput Vis Pattern Recognit 2921–2929

Download references

Acknowledgements

This work was supported by Chongqing Science and Technology Commission Project (Grant No:cstc2017jcyjAX0142 and cstc2018jcyjAX0525), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongjian Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, Z., Li, H., Duan, X. et al. Attention cutting and padding learning for fine-grained image recognition. Multimed Tools Appl 80, 32791–32805 (2021). https://doi.org/10.1007/s11042-021-11314-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11314-z

Keywords

Navigation