DCMA-Net: dual cross-modal attention for fine-grained few-shot recognition

Zhou, Yan; Ren, Xiao; Li, Jianxun; Yang, Yin; Zhou, Haibin

doi:10.1007/s11042-023-15776-1

DCMA-Net: dual cross-modal attention for fine-grained few-shot recognition

Published: 30 June 2023

Volume 83, pages 14521–14537, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yan Zhou ORCID: orcid.org/0000-0002-2372-4947¹,
Xiao Ren¹,
Jianxun Li²,
Yin Yang³ &
…
Haibin Zhou

235 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Since obtaining comprehensive labeled samples is expensive, the Fine-grained Few-shot Recognition task aims to identify unseen meta classes by using one or several labeled known meta classes. Besides, Fine-grained Recognition suffers some challenges such as minimal inter-class variation, backgrounds clutter, and most of the previous methods are single visual modality. In this paper, we propose a novel Dual Cross-modal Attention Network (DCMA-Net) to address the mentioned problems. Concretely, we first propose the Local Mutuality Attention branch that encodes contextual information by merging cross-modal information to learn more discriminatory information and increase inter-class differences. Meanwhile, we add a regularization mechanism to filter the visual features that match the attribute information to ensure the effectiveness of learning. Focusing on local features is easy to ignore instance information, so we propose the Global Correlation Attention branch which gains details activation representation acquired by global pooling of visual features serially in spatial and channel dimensions. This branch avoids learning bias as the counterpart of the Local Mutuality Attention branch. After that, both outputs of the two branches are aggregated as an integral feature embedding, which can be used to enhance the prototypes. Extensive experiments on CUB and SUN datasets demonstrate that our framework is effective. Particularly, our method has improved the accuracy of Prototype Network from 51.31 to 77.67 on 5-way 1-shot scenarios on the CUB dataset with Conv-4 backbone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attentive fine-grained recognition for cross-domain few-shot classification

Article 31 January 2022

Light transformer learning embedding for few-shot classification with task-based enhancement

Article 01 August 2022

Multi-level adaptive few-shot learning network combined with vision transformer

Article 25 July 2022

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

References

Abdelaziz M, Zhang Z (2021) Few-shot learning with saliency maps as additional visual information. Multimed Tools Appl 80(7):10491–10508
Article Google Scholar
Bhatti UA, Yu Z, Yuan L, Zeeshan Z, Nawaz SA, Bhatti M, Mehmood A, Ain QU, Wen L (2020) Geometric algebra applications in geospatial artificial intelligence and remote sensing image processing. IEEE Access 8:155783–155796
Article Google Scholar
Bhatti UA, Yu Z, Li J, Nawaz SA, Mehmood A, Zhang K, Yuan L (2020) Hybrid watermarking algorithm using clifford algebra with arnold scrambling and chaotic encryption. IEEE Access 8:76386–76398
Article Google Scholar
Bhatti UA, Yuan L, Yu Z, Nawaz SA, Mehmood A, Bhatti MA, Nizamani MM, Xiao S et al (2021) Predictive data modeling using sp-knn for risk factor evaluation in urban demographical healthcare data. J Med Imaging Health Inform 11(1):7–14
Article Google Scholar
Bhatti UA, Yu Z, Hasnain A, Nawaz SA, Yuan L, Wen L, Bhatti MA (2022) Evaluating the impact of roads on the diversity pattern and density of trees to improve the conservation of species. Environ Sci Pollut Res 29(10):14780–14790
Article Google Scholar
Cao S, Wang W, Zhang J, Zheng M, Li Q (2022) A few-shot fine-grained image classification method leveraging global and local structures. Int J Mach Learn Cybern 13:2273–2281
Article Google Scholar
Cao S, Wang W, Zhang J, Zheng M, Li Q (2022) A few-shot fine-grained image classification method leveraging global and local structures. Int J Mach Learn Cybern 13:2273–2281
Article Google Scholar
Chen W-Y, Liu Y-C, Kira Z, Wang Y-CF, Huang J-B (2019) A closer look at few-shot classification. In: International Conference on Learning Representations
Chen K, Lee C-G (2022) Meta-free few-shot learning via representation learning with weight averaging. International Joint Conference on Neural Networks (IJCNN) 2022:1–8
Google Scholar
Chen Z, Fu Y, Zhang Y, Jiang Y-G, Xue X, Sigal L (2019) Multi-level semantic feature augmentation for one-shot learning. IEEE Trans Image Process 28(9):4594–4605
Article MathSciNet Google Scholar
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning. pp 1126–1135
Han M, Zhan Y, Yu B, Luo Y, Du B, Tao D (2022) Knowledge graph enhanced multimodal learning for few-shot visual recognition. 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP) 1–6
Hao F, He F, Cheng J, Tao D (2022) Global-local interplay in semantic alignment for few-shot learning. IEEE Transactions on Circuits and Systems for Video Technology 32:4351–4363
Article Google Scholar
Huang H, Zhang J, Zhang J, Xu J, Wu Q (2020) Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. IEEE Trans Multimed 23:1666–1680
Article Google Scholar
Huang S, Zhang M, Kang Y, Wang D (2021) Attributes-guided and purevisual attention alignment for few-shot recognition. Proceedings of the AAAI Conference on Artificial Intelligence 35:7840–7847
Article Google Scholar
Huang H, Zhang J, Zhang J, Xu J, Wu Q (2021) Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. IEEE Trans Multimed 23:1666–1680
Article Google Scholar
Huang H, Zhang J, Zhang J, Wu Q, Xu J (2019) Compare More Nuanced: Pairwise Alignment Bilinear Network for Few-Shot Fine-Grained Learning. In: IEEE International Conference on Multimedia and Expo. pp 91–96
Ji Z, Hou Z, Liu X, Pang Y, Han J (2022) Information symmetry matters: A modal-alternating propagation network for few-shot learning. IEEE Trans Image Process 31:1520–1531
Article Google Scholar
Ji H, Yang H, Gao Z, Li C, Wan Y, Cui J (2022) Few-shot scene classification using auxiliary objectives and transductive inference. IEEE Geosci Remote Sens Lett 1–5
Li A, Huang W, Lan X, Feng J, Li Z, Wang L (2020) Boosting Fewshot Learning with Adaptive Margin Loss. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 12576–12584
Liu Y, Guo Y, Zhu Y, Yu M (2022) Mining semantic information from intra-image and cross-image for few-shot segmentation. Multimed Tools Appl 81(13):18305–18326
Article Google Scholar
Li K, Zhang Y, Li K, Fu Y (2020) Adversarial feature hallucination networks for few-shot learning. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 13470–13479
Mazumder P, Singh P, Namboodiri VP (2022) Few-shot image classification with composite rotation based self-supervised auxiliary task. Neurocomputing 489:179–195
Article Google Scholar
Mittal S, Galesso S, Brox T (2021) Essentials for class incremental learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 3513–3522
Nguyen T, Luu TM, Pham TX, Rakhimkul S, Yoo CD (2021) Robust maml: Prioritization task buffer with adaptive learning process for model-agnostic meta-learning. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 3460–3464
Pahde F, Puscas M, Klein T, Nabi M (2021) Multimodal prototypical networks for few-shot learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp 2644–2653
Pan C, Huang J, Gong J, Hao J (2022) Few-shot learning with hierarchical pooling induction network. Multimedia Tools and Applications 1–16
Pan L, Liu W (2022) Transductive graph-attention network for few-shot classification. In: 2022 16th IEEE International Conference on Signal Processing (ICSP), vol 1. pp 190–195. IEEE
Patterson G, Xu C, Su H, Hays J (2014) The sun attribute database: Beyond categories for deeper scene understanding. Int J Comput Vision 108(1):59–81
Article Google Scholar
Ren K, Guo Z, Zhang Z, Zhu R, Li X (2022) Multi-branch network for few-shot learning. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022:520–525
Article Google Scholar
Schwartz E, Karlinsky L, Feris R, Giryes R, Bronstein A (2022) Baby steps towards few-shot learning with multiple semantics. Pattern Recogn Lett 160:142–147
Article Google Scholar
Shyam P, Gupta S, Dukkipati A (2017) Attentive recurrent comparators. In: International Conference on Machine Learning. pp 3173–3181. PMLR
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30:4077–4087
Google Scholar
Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2017) Learning to compare: Relation network for few-shot learning. IEEE/CVF Conf Comput Vis Pattern Recognit 2018:1199–1208
Google Scholar
Tang L, Wertheimer D, Hariharan B (2020) Revisiting Pose-normalization for Fine-grained Few-shot Recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 14352–14361
Tian D, Lin C, Zhou J, Duan X, Cao Y, Zhao D, Cao D (2022) Sa-yolov3: An efficient and accurate object detector using self-attention mechanism for autonomous driving. IEEE Trans Intell Transp Syst 23:4099–4110
Article Google Scholar
Tliba M, Kerkouri MA, Ghariba B, Chetouani A, Çöltekin A, Shehata MS, Bruno A (2022) Satsal: A multi-level self-attention based architecture for visual saliency prediction. IEEE Access 10:20701–20713
Article Google Scholar
Tokmakov P, Wang Y-X, Hebert M (2019) Learning Compositional Representations for Few-Shot Recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 6372–6381
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Technical report, California Institute of Technology
Wang Y, Anderson DV (2022) Hybrid attention-based prototypical networks for few-shot sound classification. IEEE International Conference on Acoustics, Speech and Signal Processing 651–655
Xing C, Rostamzadeh N, Oreshkin B, O Pinheiro PO (2019) Adaptive cross-modal few-shot learning. Adv Neural Inf Process Syst 32:4847–4857
Xu W, Xian Y, Wang J, Schiele B, Akata Z (2022) Attribute prototype network for any-shot learning. Int J Comput Vision 130:1735–1753
Article Google Scholar
Xu J, Le H (2022) Generating representative samples for few-shot classification. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit. pp 9003–9013
Ye H-J, Hu H, Zhan D-C, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. IEEE Trans Multimed 8808–8817
Zhang C, Chen R, Zeng Y, Ren S, Cui Q (2022) Improving generalization of model-agnostic meta-learning by channel exchanging. 2022 International Conference on Electronics and Devices, Computational Science (ICEDCS) 485–489
Zhang H, Koniusz P, Jian S, Li H, Torr PH (2021) Rethinking class relations: Absolute-relative supervised and unsupervised few-shot learning. Proc IEEE Conf Comput Vis Pattern Recognit 9432–9441

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Project under Grand 2020YFA0713503, in part by the National Natural Science Foundation of China under Grand 61773330, and in the part by the Aeronautical Science Foundation of China under Grand 20200020114004.

Author information

Authors and Affiliations

School of Automation and Electronic Information, Xiangtan University, 411105, Xiangtan, China
Yan Zhou & Xiao Ren
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong Universityty, 200240, Shanghai, China
Jianxun Li
School of Mathematics and Computational Science, Xiangtan University, 411105, Xiangtan, China
Yin Yang

Authors

Yan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Ren
View author publications
You can also search for this author in PubMed Google Scholar
Jianxun Li
View author publications
You can also search for this author in PubMed Google Scholar
Yin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Zhou.

Ethics declarations

Competing Interests

The authors state that they have no conflicting financial interests or personal connections that may have influenced the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, Y., Ren, X., Li, J. et al. DCMA-Net: dual cross-modal attention for fine-grained few-shot recognition. Multimed Tools Appl 83, 14521–14537 (2024). https://doi.org/10.1007/s11042-023-15776-1

Download citation

Received: 17 November 2022
Revised: 20 March 2023
Accepted: 25 April 2023
Published: 30 June 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-15776-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DCMA-Net: dual cross-modal attention for fine-grained few-shot recognition

Abstract

Access this article

Similar content being viewed by others

Attentive fine-grained recognition for cross-domain few-shot classification

Light transformer learning embedding for few-shot classification with task-based enhancement

Multi-level adaptive few-shot learning network combined with vision transformer

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DCMA-Net: dual cross-modal attention for fine-grained few-shot recognition

Abstract

Access this article

Similar content being viewed by others

Attentive fine-grained recognition for cross-domain few-shot classification

Light transformer learning embedding for few-shot classification with task-based enhancement

Multi-level adaptive few-shot learning network combined with vision transformer

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation