Inverse transformation sampling-based attentive cutout for fine-grained visual recognition

Guo, Chen; Lin, Yaojin; Xu, Meiyan; Shao, Mingwen; Yao, Junfeng

doi:10.1007/s00371-022-02481-7

Inverse transformation sampling-based attentive cutout for fine-grained visual recognition

Original article
Published: 02 June 2022

Volume 39, pages 2597–2608, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Chen Guo ORCID: orcid.org/0000-0002-2230-8511^1,2,
Yaojin Lin^1,2,
Meiyan Xu^1,2,
Mingwen Shao³ &
…
Junfeng Yao⁴

508 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Recent works on fine-grained visual categorization rely on detecting discriminative regions that correspond to specific visual patterns. Promising progress has been obtained by constructing complicated network architecture, which either involves explicitly or implicitly capturing subtle differences to learn part-level representations. Instead of sophisticated model designs, we consider the learning paradigm of data augmentation to utilize subtle cues through a single vanilla neural network (e.g., ResNet). However, the powerful regional dropout strategy, Cutout, which randomly overlays a square patch of inputs in training, may produce inefficient images for fine-grained classification. This is because constant and causal block extent is potentially inflexible for the variety of object position and size. To generate more reasonable samples, we propose an enhanced image synthesis strategy called Attentive Cutout, which purposefully conceals informative details by performing attention-guided content sampling on the high responses from channels. As the feature channels generally represent the multitude of different clues for specified categories, our method is capable of selecting distinct parts to occlude in every iteration. Compare with previous synthesizing training data approaches, Attentive Cutout ensures more diversity and attends to part-level features among generated images. Extensive experiments and analysis studies demonstrate the effectiveness of our approach, which is efficient but easy to implement and achieves competitive performance with structure-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PS-DeiT: A Part-Selection Based DeiT for Fine-Grained Classification

ZoomViT: an observation behavior-based fine-grained recognition scheme

Article 27 May 2024

Coarse2Fine: a two-stage training method for fine-grained visual classification

Article 25 February 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Berg, T., Belhumeur, P.N.: Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 955–962 (2013)
Cai, S., Zuo, W., Zhang, L.: Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE Conference on Computer Vision, pp. 511–520 (2017)
Chai, Y., Lempitsky, V., Zisserman, A.: Symbiotic segmentation and part localization for fine-grained categorization. In: Proceedings of the International Conference on Computer Vision, pp. 321–328 (2013)
Chang, D., Ding, Y., Xie, J., Bhunia, A.K., Li, X., Ma, Z., Wu, M., Guo, J., Song, Y.Z.: The devil is in the channels: mutual-channel loss for fine-grained image classification. IEEE Trans. Image Process. 29, 4683–4695 (2020)
Article MATH Google Scholar
Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5157–5166 (2019)
Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3049–3058 (2017)
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout (2017). arXiv preprint arXiv:1708.04552
Devroye, L.: Sample-based non-uniform random variate generation. In: Proceedings of the Conference on Winter Simulation, pp. 260–265 (1986)
Ding, Y., Zhou, Y., Zhu, Y., Ye, Q., Jiao, J.: Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision, pp. 6598–6607 (2019)
Dubey, A., Gupta, O., Guo, P., Raskar, R., Farrell, R., Naik, N.: Pairwise confusion for fine-grained visual classification. In: Proceedings of the IEEE European Conference on Computer Vision, pp. 70–86 (2018)
Fu, J., Zheng, H., Mei, T.: Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4476–4484 (2017)
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016)
Gavves, E., Fernando, B., Snoek, C.G.M., Smeulders, A.W.M., Tuytelaars, T.: Fine-grained categorization by alignments. In: Proceedings of the IEEE Conference on International Conference on Computer Vision, pp. 1713–1720 (2013)
Ge, W., Lin, X., Yu, Y.: Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3034–3043 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of The International Conference on Machine Learning, pp. 448–456 (2015)
Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7025–7034 (2017)
Krause, J., Jin, H., Yang, J., Fei-Fei, L.: Fine-grained recognition without part annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5546–5555 (2015)
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Lam, M., Mahasseni, B., Todorovic, S.: Fine-grained recognition as hsnet search for informative image parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Liu, C., Huang, L., Wei, Z., Zhang, W.: Subtler mixed attention network on fine-grained image classification. Appl. Intell. 1–14 (2021)
Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-Grained Visual Classification of Aircraft. Technical Report (2013). arXiv:1306.5151
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the IEEE Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729 (2008)
Niu, Y., Jiao, Y., Shi, G.: Attention-shift based deep neural network for fine-grained visual categorization. Pattern Recogn. 116, 107947 (2021)
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Article MathSciNet Google Scholar
Peng, Y., He, X., Zhao, J.: Object-part attention model for fine-grained image classification. IEEE Trans. Image Process. 27, 1487–1500 (2018)
Article MathSciNet MATH Google Scholar
Sedik, A., Hammad, M., Abd El-Samie, F.E., et al.: Efficient deep learning approach for augmented detection of Coronavirus disease. Neural Comput. Appl. 1–18 (2021)
Sedik, A., Iliyasu, A.M., El-Rahiem, A., et al.: Deploying machine and deep learning models for efficient data-augmented detection of COVID-19 infections. Viruses 12(7), 769 (2020)
Article Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Song, K., Wei, X., Shu, X., Song, R., Lu, J.: Bi-modal progressive mask attention for fine-grained recognition. IEEE Trans. Image Process. 1–1 (2020)
Sun, G., Cholakkal, H., Khan, S., Khan, F.S., Shao, L.: Fine-grained recognition: accounting for subtle differences between similar classes. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12047–12054 (2020)
Tokozume, Y., Ushiku, Y., Harada, T.: Between-class learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5486–5494 (2018)
Uijlings, J.R., Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)
Article Google Scholar
Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: better representations by interpolating hidden states. In: Proceedings of International Conference on Machine Learning, pp. 6438–6447 (2019)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset (2011)
Walawalkar, D., Shen, Z., Liu, Z., et al.: Attentive CutMix: an enhanced data augmentation approach for deep learning based image classification. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2020)
Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., Zhang, Z.: Multiple granularity descriptors for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2399–2406 (2015)
Wang, Y., Morariu, V.I., Davis, L.S.: Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4148–4157 (2018)
Wei, X.S., Xie, C.W., Wu, J., Shen, C.: Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn. 76, 704–714 (2018)
Article Google Scholar
Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: Proceedings of the IEEE European Conference on Computer Vision, pp. 438–454 (2018)
Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y., Choe, J.: CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6023–6032 (2019)
Yu, X., Zhao, Y., Gao, Y., Xiong, S.: MaskCOV: a random mask covariance network for ultra-fine-grained visual categorization. Pattern Recogn. 119, 108067 (2021)
Article Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: Proceedings of International Conference on Learning Representations (2017)
Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., Metaxas, D.: Spda-cnn: unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1143–1152 (2016)
Zhang, L., Huang, S., Liu, W., Tao, D.: Learning a mixture of granularity-specific experts for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8331–8340 (2019)
Zhang, Y., Wei, X.S., Wu, J., Cai, J., Lu, J., Nguyen, V.A., Do, M.N.: Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans. Image Process. 25, 1713–1725 (2016)
Article MathSciNet MATH Google Scholar
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the International Conference on Computer Vision, pp. 5219–5227 (2017)
Zheng, H., Fu, J., Zha, Z.J., Luo, J.: Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5012–5021 (2019)
Zheng, H., Fu, J., Zha, Z.J., Luo, J., Mei, T.: Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Trans. Image Process. 29, 476–488 (2020)
Article MathSciNet MATH Google Scholar
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 13001–13008 (2020)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

Download references

Acknowledgements

We are very grateful to the anonymous reviewers for their valuable comments and suggestions. This work is supported by Grants from the National Natural Science Foundation of China (Nos. 62076116, and 61672272), the Natural Science Foundation of Fujian Province (Nos. 2020J01811 and 2020J01792).

Author information

Authors and Affiliations

School of Computer Science, Minnan Normal University, Zhangzhou, 363000, People’s Republic of China
Chen Guo, Yaojin Lin & Meiyan Xu
Lab of Data Science and Intelligence Application, Minnan Normal University, Zhangzhou, 363000, People’s Republic of China
Chen Guo, Yaojin Lin & Meiyan Xu
College of Computer and Communication Engineering, Chinese University of Petroleum, Qingdao, 266580, People’s Republic of China
Mingwen Shao
Xiamen University, Xiamen, People’s Republic of China
Junfeng Yao

Authors

Chen Guo
View author publications
You can also search for this author inPubMed Google Scholar
Yaojin Lin
View author publications
You can also search for this author inPubMed Google Scholar
Meiyan Xu
View author publications
You can also search for this author inPubMed Google Scholar
Mingwen Shao
View author publications
You can also search for this author inPubMed Google Scholar
Junfeng Yao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Yaojin Lin or Junfeng Yao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, C., Lin, Y., Xu, M. et al. Inverse transformation sampling-based attentive cutout for fine-grained visual recognition. Vis Comput 39, 2597–2608 (2023). https://doi.org/10.1007/s00371-022-02481-7

Download citation

Accepted: 13 March 2022
Published: 02 June 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00371-022-02481-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inverse transformation sampling-based attentive cutout for fine-grained visual recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

PS-DeiT: A Part-Selection Based DeiT for Fine-Grained Classification

ZoomViT: an observation behavior-based fine-grained recognition scheme

Coarse2Fine: a two-stage training method for fine-grained visual classification

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now