Skip to main content
Log in

A multichannel location-aware interaction network for visual classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Fine-grained visual classification aims to identify images that belong to multiple subcategories within the same category. This is a challenging task as there are only subtle regional differences between subcategories. Most of the existing methods utilize neural networks to extract global image features and quickly lock local feature regions by adding various external attention mechanisms. This type of approach may ignore the details that are inherent in the feature map itself. This paper proposes an efficient global channel position-aware interaction method to solve this problem. Specifically, we first hierarchically group the original features and take advantage of the translation-invariant linearity and local weight sharing of convolutional networks to propose a hierarchical structure that enhances the receptive field of global features. Then, same-direction location attention interaction is performed based on the global feature with rich fields of view, thus encouraging the model to capture its common areas of interest according to the feature’s own learning ability. Finally, multiple attention feature map is obtained based on the relative position interactions of the global features. We again use convolutional networks to learn the discriminative features of the attention target regions and perform feature clustering optimization on the discriminative feature regions to guide the classification process. The proposed model performs well on three datasets, i.e. CUB-200-2011, Stanford Cars, and FGVC Aircraft.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The raw/processed data cannot be shared temporarily as the data also forms part of an ongoing study.

Code Availability

Not Applicable.

References

  1. Liu H, Zhang C, Deng Y, Xie B, Liu T, Zhang Z, Li YF (2023) Transifc: Invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. IEEE Transactions on Multimedia

  2. Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Transactions on Multimedia 24:2449–2460

    Article  Google Scholar 

  3. Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y (2022) Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human-computer interaction. IEEE Transactions on Industrial Informatics 18(10):7107–7117

    Article  Google Scholar 

  4. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Proceedings of the European conference on computer vision, pp. 834-849

  5. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587

  6. Lei J, Duan J, Wu F, Ling N, Hou C (2016) Fast mode decision based on grayscale similarity and inter-view correlation for depth map coding in 3dhevc. IEEE Trans Circ Syst Video Technol 28(3):706–718

    Article  Google Scholar 

  7. Li Z, Lin L, Zhang C, Ma H, Zhao W, Shi Z (2021) A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Trans Multimed Comput Commun Appl (TOMM) 17(1):1–23

    Article  Google Scholar 

  8. Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: Proceedings of the IEEE international conference on computer vision, pp. 1641-1648

  9. Khan A, Chefranov A, Demirel H (2020) Image-level structure recognition using image features, templates, and ensemble of classifiers. Symmetry 12(7):1072

    Article  Google Scholar 

  10. Zhou T, Li Z, Zhang C, Ma H (2020) Classify multi-label images via improved cnn model with adversarial network. Multimed Tools Appl 79(9):6871–6890

    Article  Google Scholar 

  11. Yan K, Li Z, Zhang C (2016) A new multi-instance multi-label learning approach for image and text classification. Multimed Tools Appl 75(13):7875–7890

    Article  Google Scholar 

  12. Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimed 19(6):1245–1256

    Article  Google Scholar 

  13. Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4438-4446

  14. Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: Proceedings of the European conference on computer vision, pp. 420-435

  15. Zhang L, Huang S, Liu W, Tao D (2019) Learning a mixture of granularityspecific experts for fine-grained categorization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8331-8340

  16. Gao Y, Han X, Wang X, Huang W, Scott M (2020) Channel interaction networks for fine-grained image categorization. Proc AAAI Conf Artif Intell 34:10818–10825

    Google Scholar 

  17. Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4148-4157

  18. Zheng H, Fu J, Zha ZJ, Luo J (2019) Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5012-5021

  19. Zhang F, Li M, Zhai G, Liu Y (2020) Multi-branch and multi-scale attention learning for fine-grained visual categorization. arXiv:2003.09150

  20. Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Wu M, Guo J, Song YZ (2020) The devil is in the channels: Mutual-channel loss for finegrainedimage classification. IEEE Trans Image Process 29:4683–4695

    Article  MATH  Google Scholar 

  21. Khan A, Chefranov A, Demirel H (2021) Image scene geometry recognition using low-level features fusion at multi-layer deep cnn. Neurocomputing 440:111–126

    Article  Google Scholar 

  22. Khan A, Eker A, Chefranov A, Demirel H (2021) White blood cell type identification using multi-layer convolutional features with an extremelearning machine. Biomed Signal Process Control 69:102932

    Article  Google Scholar 

  23. Simonyan K, Zisserman A (2014) Very deep convolutional networks for largescale image recognition. arXiv:1409.1556

  24. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778

  25. Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussianlabel distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220

    Article  Google Scholar 

  26. Liu T, Liu H, Li Y, Zhang Z, Liu S (2018) Efficient blind signal reconstruction with wavelet transforms regularization for educational robot infrared vision sensing. IEEE/ASME Trans Mechatron 24(1):384–394

    Article  Google Scholar 

  27. Liu T, Liu H, Chen Li YF, Chen Z, Zhang Z, Liu S (2019) Flexible ftir spectral imaging enhancement for industrial robot infrared vision sensing. IEEE Trans Ind Inform 16(1):544–554

    Article  Google Scholar 

  28. Liu X, Liu T, Zhou J, Liu H (2023) High-resolution facial expression image restoration via adaptive total variation regularization for classroom learning environment. Infrared Phys Technol 128:104482

    Article  Google Scholar 

  29. Chen S, Li Z, Tang Z (2020) Relation r-cnn: a graph based relation-aware network for object detection. IEEE Signal Process Lett 27:1680–1684

    Article  Google Scholar 

  30. Chefranov A, Khan A, Demirel H (2022) Stage classification using twostream deep convolutional neural networks. Signal, Image and Video Processing 16(2):311–319

    Article  Google Scholar 

  31. Zhang F, Li M, Zhai G, Liu Y (2021) Multi-branch and multi-scale attention learning for fine-grained visual categorization. In: Proceedings of the international conference on multimedia modeling, pp. 136-147

  32. Liu H, Nie H, Zhang Z, Li YF (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322

    Article  Google Scholar 

  33. Liu T, Liu H, Yang B, Zhang Z (2023) Ldcnet: Limb direction cuesaware network for flexible human pose estimation in industrial behavioral biometrics systems. IEEE Trans Ind Inform

  34. Liu H, Liu T, Chen Y, Zhang Z, Li YF (2022) Ehpe: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimed

  35. Li Z, Yang Y, Liu X, Zhou F, Wen S, Xu W (2017) Dynamic computational time for visual attention. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 1199-1209

  36. Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European conference on computer vision, pp. 805-821

  37. Shroff P, Chen T, Wei Y, Wang Z (2020) Focus longer to see better: recursively refined attention for fine-grained image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 868-869

  38. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp. 1-11

  39. Jia Z, Ng MK, Wang W (2019) Color image restoration by saturationvalue total variation. SIAM J Imaging Sci 12(2):972-1000

    Article  MathSciNet  MATH  Google Scholar 

  40. He J, Chen JN, Liu S, Kortylewski A, Yang C, Bai Y, Wang C (2022) Transfg: A transformer architecture for fine-grained recognition. Proc AAAI Conf Artif Intell 36:852–860

    Google Scholar 

  41. Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for finegrained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 1449-1457

  42. Zhuang P, Wang Y, Qiao Y (2020) Learning attentive pairwise interaction for fine-grained classification. Proc AAAI Conf Artif Intell 34:13130–13137

    Google Scholar 

  43. Jia Z, Jin Q, Ng MK, Zhao XL (2022) Non-local robust quaternion matrix completion for large-scale color image and video inpainting. IEEE Trans Image Process 31:3868–3883

    Article  Google Scholar 

  44. Jia Z, Ng MK, Song GJ (2019) Robust quaternion matrix completion with applications to image inpainting. Numerical Linear Algebra with Appl 26(4):2245

    Article  MathSciNet  MATH  Google Scholar 

  45. Zhu Q, Kuang W, Li Z (2022) Dual attention interactive fine-grained classification network based on data augmentation. J Visual Commun Image Representation 88:103632

    Article  Google Scholar 

  46. Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 317-326

  47. Kong S, Fowlkes C (2017) Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 365-374

  48. Cui Y, Zhou F, Wang J, Liu X, Lin Y, Belongie S (2017) Kernel pooling for convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921-2930

  49. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141

  50. Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: Proceedings of the international conference on machine learning, pp. 7354-7363

  51. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Technical Report 2010-001, California Institute of Technology

  52. Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151

  53. Liu M, Yu C, Ling H, Lei J (2016) Hierarchical joint cnn-based models for fine-grained cars recognition. In: Proceedings of the international conference on cloud computing and security, pp. 337-347

  54. Ye Z, Hu F, Liu Y, Xia Z, Lyu F, Liu P (2020) Associating multi-scale receptive fields for fine-grained recognition. In: Proceedings of the IEEE international conference on image processing, pp. 1851-1855

  55. Liu C, Huang L, Wei Z, Zhang W (2021) Subtler mixed attention network on fine-grained image classification. Appl Intel 51(11):7903–7916

    Article  Google Scholar 

  56. Huang S, Wang X, Tao D (2021) Snapmix: Semantically proportional mixing for augmenting fine-grained data. Proc AAAI Conf Artif Intel 35:1628–1636

    Google Scholar 

  57. Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5157-5166

  58. Cai S, Zuo W, Zhang L (2017) Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE international conference on computer vision, pp. 511-520

  59. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization. arXiv:1710.09412

  60. Luo W, Yang X, Mo X, Lu Y, Davis LS, Li J, Yang J, Lim SN (2019) Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8242-8251

  61. Zheng H, Fu J, Zha ZJ, Luo J (2019) Learning deep bilinear transformation for fine-grained image representation. arXiv:1911.03621

  62. Simonelli A, De Natale F, Messelodi S, Bulo SR (2018) Increasingly specialized ensemble of convolutional neural networks for fine-grained recognition. In: Proceedings of the 25th IEEE international conference on image processing, pp. 594-598

  63. Gwilliam M, Teuscher A, Anderson C, Farrell R (2021) Fair comparison: Quantifying variance in results for fine-grained visual categorization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 3309-3318

  64. Li X, Yang C, Chen SL, Zhu C, Yin XC (2021) Semantic bilinear pooling for fine-grained recognition. In: Proceedings of the 25th international conference on pattern recognition, pp. 3660-3666

  65. Du Y, Rui T, Li H, Yang C, Wang D (2023) Deepbp: A bilinear model integrating multi-order statistics for fine-grained recognition. Comput Electr Eng 105:108432

    Article  Google Scholar 

  66. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp. 618-626

Download references

Funding

This work is supported by National Natural Science Foundation of China (Nos. 62276073, 61966004), Guangxi Natural Science Foundation (No. 2019GXNSFDA245018), Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Guangxi Talent Highland Project of Big Data Intelligence and Application, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Authors

Contributions

Qiangxi Zhu and Zhixin Li contributed to the research conception and design of the paper, analyzed the data, and writing the paper. The remaining authors contributed to refining the ideas and approved the final manuscript.

Corresponding author

Correspondence to Zhixin Li.

Ethics declarations

Ethics approval

Not Applicable.

Consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Conflicts of interest

The authors declare there is no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Q., Li, Z., Kuang, W. et al. A multichannel location-aware interaction network for visual classification. Appl Intell 53, 23049–23066 (2023). https://doi.org/10.1007/s10489-023-04734-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04734-x

Keywords

Navigation