Abstract
Recent works on fine-grained visual categorization rely on detecting discriminative regions that correspond to specific visual patterns. Promising progress has been obtained by constructing complicated network architecture, which either involves explicitly or implicitly capturing subtle differences to learn part-level representations. Instead of sophisticated model designs, we consider the learning paradigm of data augmentation to utilize subtle cues through a single vanilla neural network (e.g., ResNet). However, the powerful regional dropout strategy, Cutout, which randomly overlays a square patch of inputs in training, may produce inefficient images for fine-grained classification. This is because constant and causal block extent is potentially inflexible for the variety of object position and size. To generate more reasonable samples, we propose an enhanced image synthesis strategy called Attentive Cutout, which purposefully conceals informative details by performing attention-guided content sampling on the high responses from channels. As the feature channels generally represent the multitude of different clues for specified categories, our method is capable of selecting distinct parts to occlude in every iteration. Compare with previous synthesizing training data approaches, Attentive Cutout ensures more diversity and attends to part-level features among generated images. Extensive experiments and analysis studies demonstrate the effectiveness of our approach, which is efficient but easy to implement and achieves competitive performance with structure-based methods.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Berg, T., Belhumeur, P.N.: Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 955–962 (2013)
Cai, S., Zuo, W., Zhang, L.: Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE Conference on Computer Vision, pp. 511–520 (2017)
Chai, Y., Lempitsky, V., Zisserman, A.: Symbiotic segmentation and part localization for fine-grained categorization. In: Proceedings of the International Conference on Computer Vision, pp. 321–328 (2013)
Chang, D., Ding, Y., Xie, J., Bhunia, A.K., Li, X., Ma, Z., Wu, M., Guo, J., Song, Y.Z.: The devil is in the channels: mutual-channel loss for fine-grained image classification. IEEE Trans. Image Process. 29, 4683–4695 (2020)
Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5157–5166 (2019)
Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3049–3058 (2017)
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout (2017). arXiv preprint arXiv:1708.04552
Devroye, L.: Sample-based non-uniform random variate generation. In: Proceedings of the Conference on Winter Simulation, pp. 260–265 (1986)
Ding, Y., Zhou, Y., Zhu, Y., Ye, Q., Jiao, J.: Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision, pp. 6598–6607 (2019)
Dubey, A., Gupta, O., Guo, P., Raskar, R., Farrell, R., Naik, N.: Pairwise confusion for fine-grained visual classification. In: Proceedings of the IEEE European Conference on Computer Vision, pp. 70–86 (2018)
Fu, J., Zheng, H., Mei, T.: Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4476–4484 (2017)
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016)
Gavves, E., Fernando, B., Snoek, C.G.M., Smeulders, A.W.M., Tuytelaars, T.: Fine-grained categorization by alignments. In: Proceedings of the IEEE Conference on International Conference on Computer Vision, pp. 1713–1720 (2013)
Ge, W., Lin, X., Yu, Y.: Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3034–3043 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of The International Conference on Machine Learning, pp. 448–456 (2015)
Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7025–7034 (2017)
Krause, J., Jin, H., Yang, J., Fei-Fei, L.: Fine-grained recognition without part annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5546–5555 (2015)
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Lam, M., Mahasseni, B., Todorovic, S.: Fine-grained recognition as hsnet search for informative image parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Liu, C., Huang, L., Wei, Z., Zhang, W.: Subtler mixed attention network on fine-grained image classification. Appl. Intell. 1–14 (2021)
Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-Grained Visual Classification of Aircraft. Technical Report (2013). arXiv:1306.5151
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the IEEE Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729 (2008)
Niu, Y., Jiao, Y., Shi, G.: Attention-shift based deep neural network for fine-grained visual categorization. Pattern Recogn. 116, 107947 (2021)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Peng, Y., He, X., Zhao, J.: Object-part attention model for fine-grained image classification. IEEE Trans. Image Process. 27, 1487–1500 (2018)
Sedik, A., Hammad, M., Abd El-Samie, F.E., et al.: Efficient deep learning approach for augmented detection of Coronavirus disease. Neural Comput. Appl. 1–18 (2021)
Sedik, A., Iliyasu, A.M., El-Rahiem, A., et al.: Deploying machine and deep learning models for efficient data-augmented detection of COVID-19 infections. Viruses 12(7), 769 (2020)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Song, K., Wei, X., Shu, X., Song, R., Lu, J.: Bi-modal progressive mask attention for fine-grained recognition. IEEE Trans. Image Process. 1–1 (2020)
Sun, G., Cholakkal, H., Khan, S., Khan, F.S., Shao, L.: Fine-grained recognition: accounting for subtle differences between similar classes. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12047–12054 (2020)
Tokozume, Y., Ushiku, Y., Harada, T.: Between-class learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5486–5494 (2018)
Uijlings, J.R., Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)
Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: better representations by interpolating hidden states. In: Proceedings of International Conference on Machine Learning, pp. 6438–6447 (2019)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset (2011)
Walawalkar, D., Shen, Z., Liu, Z., et al.: Attentive CutMix: an enhanced data augmentation approach for deep learning based image classification. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2020)
Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., Zhang, Z.: Multiple granularity descriptors for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2399–2406 (2015)
Wang, Y., Morariu, V.I., Davis, L.S.: Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4148–4157 (2018)
Wei, X.S., Xie, C.W., Wu, J., Shen, C.: Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn. 76, 704–714 (2018)
Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: Proceedings of the IEEE European Conference on Computer Vision, pp. 438–454 (2018)
Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y., Choe, J.: CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6023–6032 (2019)
Yu, X., Zhao, Y., Gao, Y., Xiong, S.: MaskCOV: a random mask covariance network for ultra-fine-grained visual categorization. Pattern Recogn. 119, 108067 (2021)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: Proceedings of International Conference on Learning Representations (2017)
Zhang, H., Xu, T., Elhoseiny, M., Huang, X., Zhang, S., Elgammal, A., Metaxas, D.: Spda-cnn: unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1143–1152 (2016)
Zhang, L., Huang, S., Liu, W., Tao, D.: Learning a mixture of granularity-specific experts for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8331–8340 (2019)
Zhang, Y., Wei, X.S., Wu, J., Cai, J., Lu, J., Nguyen, V.A., Do, M.N.: Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans. Image Process. 25, 1713–1725 (2016)
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the International Conference on Computer Vision, pp. 5219–5227 (2017)
Zheng, H., Fu, J., Zha, Z.J., Luo, J.: Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5012–5021 (2019)
Zheng, H., Fu, J., Zha, Z.J., Luo, J., Mei, T.: Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Trans. Image Process. 29, 476–488 (2020)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 13001–13008 (2020)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Acknowledgements
We are very grateful to the anonymous reviewers for their valuable comments and suggestions. This work is supported by Grants from the National Natural Science Foundation of China (Nos. 62076116, and 61672272), the Natural Science Foundation of Fujian Province (Nos. 2020J01811 and 2020J01792).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Guo, C., Lin, Y., Xu, M. et al. Inverse transformation sampling-based attentive cutout for fine-grained visual recognition. Vis Comput 39, 2597–2608 (2023). https://doi.org/10.1007/s00371-022-02481-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02481-7