A multichannel location-aware interaction network for visual classification

Zhu, Qiangxi; Li, Zhixin; Kuang, Wenlan; Ma, Huifang

doi:10.1007/s10489-023-04734-x

A multichannel location-aware interaction network for visual classification

Published: 05 July 2023

Volume 53, pages 23049–23066, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Qiangxi Zhu¹,
Zhixin Li ORCID: orcid.org/0000-0002-5313-6134¹,
Wenlan Kuang¹ &
…
Huifang Ma²

179 Accesses
3 Citations
Explore all metrics

Abstract

Fine-grained visual classification aims to identify images that belong to multiple subcategories within the same category. This is a challenging task as there are only subtle regional differences between subcategories. Most of the existing methods utilize neural networks to extract global image features and quickly lock local feature regions by adding various external attention mechanisms. This type of approach may ignore the details that are inherent in the feature map itself. This paper proposes an efficient global channel position-aware interaction method to solve this problem. Specifically, we first hierarchically group the original features and take advantage of the translation-invariant linearity and local weight sharing of convolutional networks to propose a hierarchical structure that enhances the receptive field of global features. Then, same-direction location attention interaction is performed based on the global feature with rich fields of view, thus encouraging the model to capture its common areas of interest according to the feature’s own learning ability. Finally, multiple attention feature map is obtained based on the relative position interactions of the global features. We again use convolutional networks to learn the discriminative features of the attention target regions and perform feature clustering optimization on the discriminative feature regions to guide the classification process. The proposed model performs well on three datasets, i.e. CUB-200-2011, Stanford Cars, and FGVC Aircraft.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Data Availability

The raw/processed data cannot be shared temporarily as the data also forms part of an ongoing study.

Code Availability

Not Applicable.

References

Liu H, Zhang C, Deng Y, Xie B, Liu T, Zhang Z, Li YF (2023) Transifc: Invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. IEEE Transactions on Multimedia
Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Transactions on Multimedia 24:2449–2460
Article Google Scholar
Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y (2022) Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human-computer interaction. IEEE Transactions on Industrial Informatics 18(10):7107–7117
Article Google Scholar
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Proceedings of the European conference on computer vision, pp. 834-849
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587
Lei J, Duan J, Wu F, Ling N, Hou C (2016) Fast mode decision based on grayscale similarity and inter-view correlation for depth map coding in 3dhevc. IEEE Trans Circ Syst Video Technol 28(3):706–718
Article Google Scholar
Li Z, Lin L, Zhang C, Ma H, Zhao W, Shi Z (2021) A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Trans Multimed Comput Commun Appl (TOMM) 17(1):1–23
Article Google Scholar
Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: Proceedings of the IEEE international conference on computer vision, pp. 1641-1648
Khan A, Chefranov A, Demirel H (2020) Image-level structure recognition using image features, templates, and ensemble of classifiers. Symmetry 12(7):1072
Article Google Scholar
Zhou T, Li Z, Zhang C, Ma H (2020) Classify multi-label images via improved cnn model with adversarial network. Multimed Tools Appl 79(9):6871–6890
Article Google Scholar
Yan K, Li Z, Zhang C (2016) A new multi-instance multi-label learning approach for image and text classification. Multimed Tools Appl 75(13):7875–7890
Article Google Scholar
Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimed 19(6):1245–1256
Article Google Scholar
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4438-4446
Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: Proceedings of the European conference on computer vision, pp. 420-435
Zhang L, Huang S, Liu W, Tao D (2019) Learning a mixture of granularityspecific experts for fine-grained categorization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8331-8340
Gao Y, Han X, Wang X, Huang W, Scott M (2020) Channel interaction networks for fine-grained image categorization. Proc AAAI Conf Artif Intell 34:10818–10825
Google Scholar
Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4148-4157
Zheng H, Fu J, Zha ZJ, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5012-5021
Zhang F, Li M, Zhai G, Liu Y (2020) Multi-branch and multi-scale attention learning for fine-grained visual categorization. arXiv:2003.09150
Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Wu M, Guo J, Song YZ (2020) The devil is in the channels: Mutual-channel loss for finegrainedimage classification. IEEE Trans Image Process 29:4683–4695
Article MATH Google Scholar
Khan A, Chefranov A, Demirel H (2021) Image scene geometry recognition using low-level features fusion at multi-layer deep cnn. Neurocomputing 440:111–126
Article Google Scholar
Khan A, Eker A, Chefranov A, Demirel H (2021) White blood cell type identification using multi-layer convolutional features with an extremelearning machine. Biomed Signal Process Control 69:102932
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for largescale image recognition. arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778
Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussianlabel distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220
Article Google Scholar
Liu T, Liu H, Li Y, Zhang Z, Liu S (2018) Efficient blind signal reconstruction with wavelet transforms regularization for educational robot infrared vision sensing. IEEE/ASME Trans Mechatron 24(1):384–394
Article Google Scholar
Liu T, Liu H, Chen Li YF, Chen Z, Zhang Z, Liu S (2019) Flexible ftir spectral imaging enhancement for industrial robot infrared vision sensing. IEEE Trans Ind Inform 16(1):544–554
Article Google Scholar
Liu X, Liu T, Zhou J, Liu H (2023) High-resolution facial expression image restoration via adaptive total variation regularization for classroom learning environment. Infrared Phys Technol 128:104482
Article Google Scholar
Chen S, Li Z, Tang Z (2020) Relation r-cnn: a graph based relation-aware network for object detection. IEEE Signal Process Lett 27:1680–1684
Article Google Scholar
Chefranov A, Khan A, Demirel H (2022) Stage classification using twostream deep convolutional neural networks. Signal, Image and Video Processing 16(2):311–319
Article Google Scholar
Zhang F, Li M, Zhai G, Liu Y (2021) Multi-branch and multi-scale attention learning for fine-grained visual categorization. In: Proceedings of the international conference on multimedia modeling, pp. 136-147
Liu H, Nie H, Zhang Z, Li YF (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322
Article Google Scholar
Liu T, Liu H, Yang B, Zhang Z (2023) Ldcnet: Limb direction cuesaware network for flexible human pose estimation in industrial behavioral biometrics systems. IEEE Trans Ind Inform
Liu H, Liu T, Chen Y, Zhang Z, Li YF (2022) Ehpe: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimed
Li Z, Yang Y, Liu X, Zhou F, Wen S, Xu W (2017) Dynamic computational time for visual attention. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 1199-1209
Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European conference on computer vision, pp. 805-821
Shroff P, Chen T, Wei Y, Wang Z (2020) Focus longer to see better: recursively refined attention for fine-grained image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 868-869
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp. 1-11
Jia Z, Ng MK, Wang W (2019) Color image restoration by saturationvalue total variation. SIAM J Imaging Sci 12(2):972-1000
Article MathSciNet MATH Google Scholar
He J, Chen JN, Liu S, Kortylewski A, Yang C, Bai Y, Wang C (2022) Transfg: A transformer architecture for fine-grained recognition. Proc AAAI Conf Artif Intell 36:852–860
Google Scholar
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for finegrained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 1449-1457
Zhuang P, Wang Y, Qiao Y (2020) Learning attentive pairwise interaction for fine-grained classification. Proc AAAI Conf Artif Intell 34:13130–13137
Google Scholar
Jia Z, Jin Q, Ng MK, Zhao XL (2022) Non-local robust quaternion matrix completion for large-scale color image and video inpainting. IEEE Trans Image Process 31:3868–3883
Article Google Scholar
Jia Z, Ng MK, Song GJ (2019) Robust quaternion matrix completion with applications to image inpainting. Numerical Linear Algebra with Appl 26(4):2245
Article MathSciNet MATH Google Scholar
Zhu Q, Kuang W, Li Z (2022) Dual attention interactive fine-grained classification network based on data augmentation. J Visual Commun Image Representation 88:103632
Article Google Scholar
Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 317-326
Kong S, Fowlkes C (2017) Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 365-374
Cui Y, Zhou F, Wang J, Liu X, Lin Y, Belongie S (2017) Kernel pooling for convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921-2930
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141
Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: Proceedings of the international conference on machine learning, pp. 7354-7363
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Technical Report 2010-001, California Institute of Technology
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
Liu M, Yu C, Ling H, Lei J (2016) Hierarchical joint cnn-based models for fine-grained cars recognition. In: Proceedings of the international conference on cloud computing and security, pp. 337-347
Ye Z, Hu F, Liu Y, Xia Z, Lyu F, Liu P (2020) Associating multi-scale receptive fields for fine-grained recognition. In: Proceedings of the IEEE international conference on image processing, pp. 1851-1855
Liu C, Huang L, Wei Z, Zhang W (2021) Subtler mixed attention network on fine-grained image classification. Appl Intel 51(11):7903–7916
Article Google Scholar
Huang S, Wang X, Tao D (2021) Snapmix: Semantically proportional mixing for augmenting fine-grained data. Proc AAAI Conf Artif Intel 35:1628–1636
Google Scholar
Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5157-5166
Cai S, Zuo W, Zhang L (2017) Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE international conference on computer vision, pp. 511-520
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization. arXiv:1710.09412
Luo W, Yang X, Mo X, Lu Y, Davis LS, Li J, Yang J, Lim SN (2019) Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8242-8251
Zheng H, Fu J, Zha ZJ, Luo J (2019) Learning deep bilinear transformation for fine-grained image representation. arXiv:1911.03621
Simonelli A, De Natale F, Messelodi S, Bulo SR (2018) Increasingly specialized ensemble of convolutional neural networks for fine-grained recognition. In: Proceedings of the 25th IEEE international conference on image processing, pp. 594-598
Gwilliam M, Teuscher A, Anderson C, Farrell R (2021) Fair comparison: Quantifying variance in results for fine-grained visual categorization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 3309-3318
Li X, Yang C, Chen SL, Zhu C, Yin XC (2021) Semantic bilinear pooling for fine-grained recognition. In: Proceedings of the 25th international conference on pattern recognition, pp. 3660-3666
Du Y, Rui T, Li H, Yang C, Wang D (2023) Deepbp: A bilinear model integrating multi-order statistics for fine-grained recognition. Comput Electr Eng 105:108432
Article Google Scholar
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp. 618-626

Download references

Funding

This work is supported by National Natural Science Foundation of China (Nos. 62276073, 61966004), Guangxi Natural Science Foundation (No. 2019GXNSFDA245018), Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Guangxi Talent Highland Project of Big Data Intelligence and Application, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541000, China
Qiangxi Zhu, Zhixin Li & Wenlan Kuang
College of Computer Science and Engineering, Northwest Normal University, Lanzhou, 730070, China
Huifang Ma

Authors

Qiangxi Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenlan Kuang
View author publications
You can also search for this author in PubMed Google Scholar
Huifang Ma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Qiangxi Zhu and Zhixin Li contributed to the research conception and design of the paper, analyzed the data, and writing the paper. The remaining authors contributed to refining the ideas and approved the final manuscript.

Corresponding author

Correspondence to Zhixin Li.

Ethics declarations

Ethics approval

Not Applicable.

Consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Conflicts of interest

The authors declare there is no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, Q., Li, Z., Kuang, W. et al. A multichannel location-aware interaction network for visual classification. Appl Intell 53, 23049–23066 (2023). https://doi.org/10.1007/s10489-023-04734-x

Download citation

Accepted: 26 May 2023
Published: 05 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04734-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multichannel location-aware interaction network for visual classification

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

ImageNet Large Scale Visual Recognition Challenge

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multichannel location-aware interaction network for visual classification

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Attention mechanisms in computer vision: A survey

ImageNet Large Scale Visual Recognition Challenge

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation