Abstract
Fine-grained visual classification aims to identify images that belong to multiple subcategories within the same category. This is a challenging task as there are only subtle regional differences between subcategories. Most of the existing methods utilize neural networks to extract global image features and quickly lock local feature regions by adding various external attention mechanisms. This type of approach may ignore the details that are inherent in the feature map itself. This paper proposes an efficient global channel position-aware interaction method to solve this problem. Specifically, we first hierarchically group the original features and take advantage of the translation-invariant linearity and local weight sharing of convolutional networks to propose a hierarchical structure that enhances the receptive field of global features. Then, same-direction location attention interaction is performed based on the global feature with rich fields of view, thus encouraging the model to capture its common areas of interest according to the feature’s own learning ability. Finally, multiple attention feature map is obtained based on the relative position interactions of the global features. We again use convolutional networks to learn the discriminative features of the attention target regions and perform feature clustering optimization on the discriminative feature regions to guide the classification process. The proposed model performs well on three datasets, i.e. CUB-200-2011, Stanford Cars, and FGVC Aircraft.
Similar content being viewed by others
Data Availability
The raw/processed data cannot be shared temporarily as the data also forms part of an ongoing study.
Code Availability
Not Applicable.
References
Liu H, Zhang C, Deng Y, Xie B, Liu T, Zhang Z, Li YF (2023) Transifc: Invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. IEEE Transactions on Multimedia
Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Transactions on Multimedia 24:2449–2460
Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y (2022) Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human-computer interaction. IEEE Transactions on Industrial Informatics 18(10):7107–7117
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Proceedings of the European conference on computer vision, pp. 834-849
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587
Lei J, Duan J, Wu F, Ling N, Hou C (2016) Fast mode decision based on grayscale similarity and inter-view correlation for depth map coding in 3dhevc. IEEE Trans Circ Syst Video Technol 28(3):706–718
Li Z, Lin L, Zhang C, Ma H, Zhao W, Shi Z (2021) A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Trans Multimed Comput Commun Appl (TOMM) 17(1):1–23
Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: Proceedings of the IEEE international conference on computer vision, pp. 1641-1648
Khan A, Chefranov A, Demirel H (2020) Image-level structure recognition using image features, templates, and ensemble of classifiers. Symmetry 12(7):1072
Zhou T, Li Z, Zhang C, Ma H (2020) Classify multi-label images via improved cnn model with adversarial network. Multimed Tools Appl 79(9):6871–6890
Yan K, Li Z, Zhang C (2016) A new multi-instance multi-label learning approach for image and text classification. Multimed Tools Appl 75(13):7875–7890
Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimed 19(6):1245–1256
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4438-4446
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: Proceedings of the European conference on computer vision, pp. 420-435
Zhang L, Huang S, Liu W, Tao D (2019) Learning a mixture of granularityspecific experts for fine-grained categorization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8331-8340
Gao Y, Han X, Wang X, Huang W, Scott M (2020) Channel interaction networks for fine-grained image categorization. Proc AAAI Conf Artif Intell 34:10818–10825
Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4148-4157
Zheng H, Fu J, Zha ZJ, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5012-5021
Zhang F, Li M, Zhai G, Liu Y (2020) Multi-branch and multi-scale attention learning for fine-grained visual categorization. arXiv:2003.09150
Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Wu M, Guo J, Song YZ (2020) The devil is in the channels: Mutual-channel loss for finegrainedimage classification. IEEE Trans Image Process 29:4683–4695
Khan A, Chefranov A, Demirel H (2021) Image scene geometry recognition using low-level features fusion at multi-layer deep cnn. Neurocomputing 440:111–126
Khan A, Eker A, Chefranov A, Demirel H (2021) White blood cell type identification using multi-layer convolutional features with an extremelearning machine. Biomed Signal Process Control 69:102932
Simonyan K, Zisserman A (2014) Very deep convolutional networks for largescale image recognition. arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778
Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussianlabel distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220
Liu T, Liu H, Li Y, Zhang Z, Liu S (2018) Efficient blind signal reconstruction with wavelet transforms regularization for educational robot infrared vision sensing. IEEE/ASME Trans Mechatron 24(1):384–394
Liu T, Liu H, Chen Li YF, Chen Z, Zhang Z, Liu S (2019) Flexible ftir spectral imaging enhancement for industrial robot infrared vision sensing. IEEE Trans Ind Inform 16(1):544–554
Liu X, Liu T, Zhou J, Liu H (2023) High-resolution facial expression image restoration via adaptive total variation regularization for classroom learning environment. Infrared Phys Technol 128:104482
Chen S, Li Z, Tang Z (2020) Relation r-cnn: a graph based relation-aware network for object detection. IEEE Signal Process Lett 27:1680–1684
Chefranov A, Khan A, Demirel H (2022) Stage classification using twostream deep convolutional neural networks. Signal, Image and Video Processing 16(2):311–319
Zhang F, Li M, Zhai G, Liu Y (2021) Multi-branch and multi-scale attention learning for fine-grained visual categorization. In: Proceedings of the international conference on multimedia modeling, pp. 136-147
Liu H, Nie H, Zhang Z, Li YF (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322
Liu T, Liu H, Yang B, Zhang Z (2023) Ldcnet: Limb direction cuesaware network for flexible human pose estimation in industrial behavioral biometrics systems. IEEE Trans Ind Inform
Liu H, Liu T, Chen Y, Zhang Z, Li YF (2022) Ehpe: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimed
Li Z, Yang Y, Liu X, Zhou F, Wen S, Xu W (2017) Dynamic computational time for visual attention. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 1199-1209
Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European conference on computer vision, pp. 805-821
Shroff P, Chen T, Wei Y, Wang Z (2020) Focus longer to see better: recursively refined attention for fine-grained image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 868-869
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp. 1-11
Jia Z, Ng MK, Wang W (2019) Color image restoration by saturationvalue total variation. SIAM J Imaging Sci 12(2):972-1000
He J, Chen JN, Liu S, Kortylewski A, Yang C, Bai Y, Wang C (2022) Transfg: A transformer architecture for fine-grained recognition. Proc AAAI Conf Artif Intell 36:852–860
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for finegrained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 1449-1457
Zhuang P, Wang Y, Qiao Y (2020) Learning attentive pairwise interaction for fine-grained classification. Proc AAAI Conf Artif Intell 34:13130–13137
Jia Z, Jin Q, Ng MK, Zhao XL (2022) Non-local robust quaternion matrix completion for large-scale color image and video inpainting. IEEE Trans Image Process 31:3868–3883
Jia Z, Ng MK, Song GJ (2019) Robust quaternion matrix completion with applications to image inpainting. Numerical Linear Algebra with Appl 26(4):2245
Zhu Q, Kuang W, Li Z (2022) Dual attention interactive fine-grained classification network based on data augmentation. J Visual Commun Image Representation 88:103632
Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 317-326
Kong S, Fowlkes C (2017) Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 365-374
Cui Y, Zhou F, Wang J, Liu X, Lin Y, Belongie S (2017) Kernel pooling for convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921-2930
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141
Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: Proceedings of the international conference on machine learning, pp. 7354-7363
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Technical Report 2010-001, California Institute of Technology
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
Liu M, Yu C, Ling H, Lei J (2016) Hierarchical joint cnn-based models for fine-grained cars recognition. In: Proceedings of the international conference on cloud computing and security, pp. 337-347
Ye Z, Hu F, Liu Y, Xia Z, Lyu F, Liu P (2020) Associating multi-scale receptive fields for fine-grained recognition. In: Proceedings of the IEEE international conference on image processing, pp. 1851-1855
Liu C, Huang L, Wei Z, Zhang W (2021) Subtler mixed attention network on fine-grained image classification. Appl Intel 51(11):7903–7916
Huang S, Wang X, Tao D (2021) Snapmix: Semantically proportional mixing for augmenting fine-grained data. Proc AAAI Conf Artif Intel 35:1628–1636
Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5157-5166
Cai S, Zuo W, Zhang L (2017) Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE international conference on computer vision, pp. 511-520
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization. arXiv:1710.09412
Luo W, Yang X, Mo X, Lu Y, Davis LS, Li J, Yang J, Lim SN (2019) Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8242-8251
Zheng H, Fu J, Zha ZJ, Luo J (2019) Learning deep bilinear transformation for fine-grained image representation. arXiv:1911.03621
Simonelli A, De Natale F, Messelodi S, Bulo SR (2018) Increasingly specialized ensemble of convolutional neural networks for fine-grained recognition. In: Proceedings of the 25th IEEE international conference on image processing, pp. 594-598
Gwilliam M, Teuscher A, Anderson C, Farrell R (2021) Fair comparison: Quantifying variance in results for fine-grained visual categorization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 3309-3318
Li X, Yang C, Chen SL, Zhu C, Yin XC (2021) Semantic bilinear pooling for fine-grained recognition. In: Proceedings of the 25th international conference on pattern recognition, pp. 3660-3666
Du Y, Rui T, Li H, Yang C, Wang D (2023) Deepbp: A bilinear model integrating multi-order statistics for fine-grained recognition. Comput Electr Eng 105:108432
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp. 618-626
Funding
This work is supported by National Natural Science Foundation of China (Nos. 62276073, 61966004), Guangxi Natural Science Foundation (No. 2019GXNSFDA245018), Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Guangxi Talent Highland Project of Big Data Intelligence and Application, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Author information
Authors and Affiliations
Contributions
Qiangxi Zhu and Zhixin Li contributed to the research conception and design of the paper, analyzed the data, and writing the paper. The remaining authors contributed to refining the ideas and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not Applicable.
Consent to participate
Not Applicable.
Consent for publication
Not Applicable.
Conflicts of interest
The authors declare there is no conflicts of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, Q., Li, Z., Kuang, W. et al. A multichannel location-aware interaction network for visual classification. Appl Intell 53, 23049–23066 (2023). https://doi.org/10.1007/s10489-023-04734-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04734-x