Skip to main content
Log in

Inverse transformation sampling-based attentive cutout for fine-grained visual recognition

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Recent works on fine-grained visual categorization rely on detecting discriminative regions that correspond to specific visual patterns. Promising progress has been obtained by constructing complicated network architecture, which either involves explicitly or implicitly capturing subtle differences to learn part-level representations. Instead of sophisticated model designs, we consider the learning paradigm of data augmentation to utilize subtle cues through a single vanilla neural network (e.g., ResNet). However, the powerful regional dropout strategy, Cutout, which randomly overlays a square patch of inputs in training, may produce inefficient images for fine-grained classification. This is because constant and causal block extent is potentially inflexible for the variety of object position and size. To generate more reasonable samples, we propose an enhanced image synthesis strategy called Attentive Cutout, which purposefully conceals informative details by performing attention-guided content sampling on the high responses from channels. As the feature channels generally represent the multitude of different clues for specified categories, our method is capable of selecting distinct parts to occlude in every iteration. Compare with previous synthesizing training data approaches, Attentive Cutout ensures more diversity and attends to part-level features among generated images. Extensive experiments and analysis studies demonstrate the effectiveness of our approach, which is efficient but easy to implement and achieves competitive performance with structure-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Berg, T., Belhumeur, P.N.: Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 955–962 (2013)

  2. Cai, S., Zuo, W., Zhang, L.: Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE Conference on Computer Vision, pp. 511–520 (2017)

  3. Chai, Y., Lempitsky, V., Zisserman, A.: Symbiotic segmentation and part localization for fine-grained categorization. In: Proceedings of the International Conference on Computer Vision, pp. 321–328 (2013)

  4. Chang, D., Ding, Y., Xie, J., Bhunia, A.K., Li, X., Ma, Z., Wu, M., Guo, J., Song, Y.Z.: The devil is in the channels: mutual-channel loss for fine-grained image classification. IEEE Trans. Image Process. 29, 4683–4695 (2020)

    Article  MATH  Google Scholar 

  5. Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5157–5166 (2019)

  6. Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3049–3058 (2017)

  7. DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout (2017). arXiv preprint arXiv:1708.04552

  8. Devroye, L.: Sample-based non-uniform random variate generation. In: Proceedings of the Conference on Winter Simulation, pp. 260–265 (1986)

  9. Ding, Y., Zhou, Y., Zhu, Y., Ye, Q., Jiao, J.: Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision, pp. 6598–6607 (2019)

  10. Dubey, A., Gupta, O., Guo, P., Raskar, R., Farrell, R., Naik, N.: Pairwise confusion for fine-grained visual classification. In: Proceedings of the IEEE European Conference on Computer Vision, pp. 70–86 (2018)

  11. Fu, J., Zheng, H., Mei, T.: Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4476–4484 (2017)

  12. Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016)

  13. Gavves, E., Fernando, B., Snoek, C.G.M., Smeulders, A.W.M., Tuytelaars, T.: Fine-grained categorization by alignments. In: Proceedings of the IEEE Conference on International Conference on Computer Vision, pp. 1713–1720 (2013)

  14. Ge, W., Lin, X., Yu, Y.: Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3034–3043 (2019)

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  16. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of The International Conference on Machine Learning, pp. 448–456 (2015)

  17. Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7025–7034 (2017)

  18. Krause, J., Jin, H., Yang, J., Fei-Fei, L.: Fine-grained recognition without part annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5546–5555 (2015)

  19. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013)

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  21. Lam, M., Mahasseni, B., Todorovic, S.: Fine-grained recognition as hsnet search for informative image parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  22. Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)

  23. Liu, C., Huang, L., Wei, Z., Zhang, W.: Subtler mixed attention network on fine-grained image classification. Appl. Intell. 1–14 (2021)

  24. Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-Grained Visual Classification of Aircraft. Technical Report (2013). arXiv:1306.5151

  25. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the IEEE Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729 (2008)

  26. Niu, Y., Jiao, Y., Shi, G.: Attention-shift based deep neural network for fine-grained visual categorization. Pattern Recogn. 116, 107947 (2021)

    Article  Google Scholar 

  27. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  28. Peng, Y., He, X., Zhao, J.: Object-part attention model for fine-grained image classification. IEEE Trans. Image Process. 27, 1487–1500 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  29. Sedik, A., Hammad, M., Abd El-Samie, F.E., et al.: Efficient deep learning approach for augmented detection of Coronavirus disease. Neural Comput. Appl. 1–18 (2021)

  30. Sedik, A., Iliyasu, A.M., El-Rahiem, A., et al.: Deploying machine and deep learning models for efficient data-augmented detection of COVID-19 infections. Viruses 12(7), 769 (2020)

    Article  Google Scholar 

  31. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

  32. Song, K., Wei, X., Shu, X., Song, R., Lu, J.: Bi-modal progressive mask attention for fine-grained recognition. IEEE Trans. Image Process. 1–1 (2020)

  33. Sun, G., Cholakkal, H., Khan, S., Khan, F.S., Shao, L.: Fine-grained recognition: accounting for subtle differences between similar classes. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12047–12054 (2020)

  34. Tokozume, Y., Ushiku, Y., Harada, T.: Between-class learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5486–5494 (2018)

  35. Uijlings, J.R., Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)

    Article  Google Scholar 

  36. Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: better representations by interpolating hidden states. In: Proceedings of International Conference on Machine Learning, pp. 6438–6447 (2019)

  37. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset (2011)

  38. Walawalkar, D., Shen, Z., Liu, Z., et al.: Attentive CutMix: an enhanced data augmentation approach for deep learning based image classification. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2020)

  39. Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., Zhang, Z.: Multiple granularity descriptors for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2399–2406 (2015)

  40. Wang, Y., Morariu, V.I., Davis, L.S.: Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4148–4157 (2018)

  41. Wei, X.S., Xie, C.W., Wu, J., Shen, C.: Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn. 76, 704–714 (2018)

    Article  Google Scholar 

  42. Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: Proceedings of the IEEE European Conference on Computer Vision, pp. 438–454 (2018)

  43. Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y., Choe, J.: CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6023–6032 (2019)

  44. Yu, X., Zhao, Y., Gao, Y., Xiong, S.: MaskCOV: a random mask covariance network for ultra-fine-grained visual categorization. Pattern Recogn. 119, 108067 (2021)

    Article  Google Scholar 

  45. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: Proceedings of International Conference on Learning Representations (2017)

  46. Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., Metaxas, D.: Spda-cnn: unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1143–1152 (2016)

  47. Zhang, L., Huang, S., Liu, W., Tao, D.: Learning a mixture of granularity-specific experts for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8331–8340 (2019)

  48. Zhang, Y., Wei, X.S., Wu, J., Cai, J., Lu, J., Nguyen, V.A., Do, M.N.: Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans. Image Process. 25, 1713–1725 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  49. Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the International Conference on Computer Vision, pp. 5219–5227 (2017)

  50. Zheng, H., Fu, J., Zha, Z.J., Luo, J.: Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5012–5021 (2019)

  51. Zheng, H., Fu, J., Zha, Z.J., Luo, J., Mei, T.: Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Trans. Image Process. 29, 476–488 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  52. Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 13001–13008 (2020)

  53. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

Download references

Acknowledgements

We are very grateful to the anonymous reviewers for their valuable comments and suggestions. This work is supported by Grants from the National Natural Science Foundation of China (Nos. 62076116, and 61672272), the Natural Science Foundation of Fujian Province (Nos. 2020J01811 and 2020J01792).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yaojin Lin or Junfeng Yao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, C., Lin, Y., Xu, M. et al. Inverse transformation sampling-based attentive cutout for fine-grained visual recognition. Vis Comput 39, 2597–2608 (2023). https://doi.org/10.1007/s00371-022-02481-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02481-7

Keywords

Navigation