Feature channel interaction long-tailed image classification model based on dual attention

Liao, Kaiyang; Wang, Keer; Zheng, Yuanlin; Lin, Guangfeng; Cao, Congjun

doi:10.1007/s11760-023-02848-w

Feature channel interaction long-tailed image classification model based on dual attention

Original Paper
Published: 29 November 2023

Volume 18, pages 1661–1670, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Kaiyang Liao¹,
Keer Wang¹,
Yuanlin Zheng¹,
Guangfeng Lin¹ &
…
Congjun Cao^1,2

167 Accesses
Explore all metrics

Abstract

In the real world, the data distribution often presents a long tail distribution, and the imbalance of data will lead to the model learning bias to the head class. To address the influence of long tail distribution on image classification, this paper proposes a feature channel interactive long tail image classification model based on dual attention. Firstly, the dual attention module is used to capture the autocorrelation and spatial dimension information of the feature map, and the enhanced image is obtained by transformation and class activation map. After that, image preprocessing is performed on the enhanced data set to reduce the over-fitting of the model to the head, and the features that are more conducive to tail classification are obtained through learning. Finally, by interacting with the local channels adjacent to the features, the correlation between the channels is extracted to obtain more robust features. The method achieves good performance on CIFAR10-LT, CIFAR100-LT and ImageNet datasets, which proves the effectiveness of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Densely Connected Image Classification Algorithm Combining with Self-attention

Feature Channel Adaptive Enhancement for Fine-Grained Visual Classification

TripleFormer: improving transformer-based image classification method using multiple self-attention inputs

Article 01 March 2024

Data availability

Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

References

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)
Article PubMed Google Scholar
Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2016)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Zhang, C., Lin, G., Liu, F., Yao, R., Shen, C.: CANet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5212–5221 (2019)
Wang, Y., Luo, F., Yang, X., et al.: The Swin-Transformer network based on focal loss is used to identify images of pathological subtypes of lung adenocarcinoma with high similarity and class imbalance. J. Cancer Res. Clin. Oncol. 149, 8581–8592 (2023)
Article PubMed Google Scholar
Huang, P., Tan, X., Zhou, X., Liu, S., Mercaldo, F., Santone, A.: FABNet: fusion attention block and transfer learning for laryngeal cancer tumor grading in P63 IHC histopathology images. IEEE J. Biomed. Health Inform. 26(4), 1696–1707 (2022)
Article PubMed Google Scholar
Zhou, X., Tang, C., Huang, P., et al.: ASI-DBNet: an adaptive sparse interactive ResNet-vision transformer dual-branch network for the grading of brain cancer histopathological images. Interdiscip. Sci. Comput. Life Sci. 15, 15–31 (2023)
CAS Google Scholar
Tian, S., et al.: CASDD: automatic surface defect detection using a complementary adversarial network. IEEE Sens. J. 22(20), 19583–19595 (2022)
Article ADS CAS Google Scholar
Huang, P., et al.: A ViT-AMC network with adaptive model fusion and multiobjective optimization for interpretable laryngeal tumor grading from histopathological images. IEEE Trans. Med. Imaging 42(1), 15–28 (2023)
Article PubMed Google Scholar
Huang, P., et al.: Interpretable laryngeal tumor grading of histopathological images via depth domain adaptive network with integration gradient CAM and priori experience-guided attention. Comput. Biol. Med. 154, 106447 (2023)
Article PubMed Google Scholar
Park, S., Hong, Y., Heo, B., Yun, S., Choi, J.Y.: The majority can help the minority: context-rich minority oversampling for long-tailed classification. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR0), pp. 6877–6886. New Orleans, LA, USA (2022)
Wang, Y., Gan, W., Yang, J., Wu, W., Yan, J.: Dynamic curriculum learning for imbalanced data classification. In: International Conference on Computer Vision, pp. 5017–5026 (2019)
Tan, J., Wang, C., Li, B., Li, Q., Ouyang, W., Yin, C., Yan, J.: Equalization loss for long-tailed object recognition. In: Computer Vision and Pattern Recognition, pp. 11662–11671 (2020)
He, Y.-Y., Wu, J., Wei, X.-S.: Distilling virtual examples for long-tailed recognition. In: International Conference on Computer Vision (2021)
Chu, P., Bian, X., Liu, S., Ling, H.: Feature space augmentation for long-tailed data. In: European Conference on Computer Vision (2020)
Wu, T., Liu, Z., Huang, Q., Wang, Y., Lin, D.: Adversarial robustness under long-tailed distribution. In: Computer Vision and Pattern Recognition, pp. 8659–8668 (2021)
Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., Kalantidis, Y.: Decoupling representation and classifier for long-tailed recognition. In: International Conference on Learning Representations (2020)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)
Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Kim, J., Jeong, J., Shin, J.: M2m: imbalanced classification via major-to-minor translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.13896–13905 (2020)
Chou, H.-P., Chang, S.-C., Pan, J.-Y., Wei, W., Juan, D.-C.: Remix: rebalanced mixup. In: European Conference on Computer Vision Workshop (2020)
Wang, T., Li, Y., Kang, B., Li, J., Liew, J., Tang, S., Hoi, S., Feng, J.: The devil is in classification: a simple framework for long-tail instance segmentation. In: Proceedings of the European Conference on Computer Vision, pp. 728–744 (2020)
Kang, B., Li, Y., Xie, S., Yuan, Z., Feng, J.: Exploring balanced feature spaces for representation learning. In: International Conference on Learning Representations (2021)
Zhong, Z., Cui, J., Liu, S., Jia, J.: Improving calibration for long-tailed recognition. In: Computer Vision and Pattern Recognition (2021)
Desai, A., Wu, T.-Y., Tripathi, S., Vasconcelos, N.: Learning of visual relations: the devil is in the tails. In: International Conference on Computer Vision (2021)
Mnih, V., et al.: Recurrent models of visual attention. Adv. Neural Inf. Process. Syst. 3 (2014)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Polosukhin, I.: Attention is all you need. In: Proceedings of the Conference Neural Information Processing Systems, pp. 5998–6008 (2017)
Hao, Y., et al.: Attention in attention: modeling context correlation for efficient video classification. IEEE Trans. Circuits Syst. Video Technol. 32(10), 7120–7132 (2022)
Article Google Scholar
Fan, Q., Zhuo, W., Tang, C.-K., Tai, Y.-W.: Few-shot object detection with attention-RPN and multi-relation detector. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4012–4021. Seattle, WA, USA (2020)
Yang, Y., et al.: Dual wavelet attention networks for image classification. IEEE Trans. Circuits Syst. Video Technol. 33(4), 1899–1910 (2023)
Article Google Scholar
Wang, W., Zhao, Z., Wang, P., Su, F., Meng, H.: Attentive feature augmentation for long-tailed visual recognition. IEEE Trans. Circuits Syst. Video Technol. 32(9), 5803–5816 (2022)
Article Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929. Las Vegas, NV, USA (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. Salt Lake City, UT, USA (2018)
Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9268–9277 (2019)
Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X.: Large-scale long-tailed recognition in an open world. In: CVPR, pp. 2537–2546 (2019)
Zhou, B., Cui, Q., Wei, X.-S., Chen, Z.-M.: BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9716–9725. Seattle, WA, USA (2020)
Li, T., Cao, P., Yuan, Y., Fan, L., Yang, Y., Feris, R. S., Indyk, P., Katabi, D.: Targeted supervised contrastive learning for long-tailed recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6918–6928 (2022)
Cai, J., Wang, Y., Hwang, J.-N.: ACE: ally complementary experts for solving long-tailed recognition in one-shot. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 112–121. Montreal, QC, Canada (2021)
Han, B.: Wrapped cauchy distributed angular softmax for long-tailed visual recognition. arXiv e-prints (2023)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. TPAMI 42(02), 318–327 (2020)
Article Google Scholar
Cao, K., Wei, C., Gaidon, A., Arechiga, N., Ma, T.: Learning imbalanced datasets with label-distribution-aware margin loss. In: NeurIPS (2019)
Park, S., Lim, J., Jeon, Y., Choi, J.Y.: Influence-balanced loss for imbalanced visual classification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC, Canada (2021)
Nah, W.J., et al.: Rethinking long-tailed visual recognition with dynamic probability smoothing and frequency weighted focusing. In: 2023 IEEE International Conference on Image Processing (ICIP). Kuala Lumpur, Malaysia (2023)
Cui, J., Liu, S., Tian, Z., Zhong, Z., Jia, J.: ResLT: residual learning for long-tailed recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3695–3706 (2023)
PubMed Google Scholar
Ye, H., Zhou, F., Li, X., Zhang, Q.: Balanced mixup loss for long-tailed visual recognition. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island, Greece (2023)

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

School of Printing, Packaging and Digital Media, Xi’an University of Technology, Xi’an, 710048, Shanxi Province, China
Kaiyang Liao, Keer Wang, Yuanlin Zheng, Guangfeng Lin & Congjun Cao
Printing and Packaging Engineering Technology Research Centre of Shaanxi Province, Xi’an, 710048, China
Congjun Cao

Authors

Kaiyang Liao
View author publications
You can also search for this author in PubMed Google Scholar
Keer Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanlin Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Guangfeng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Congjun Cao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.W. and K.L. conducted research and wrote the paper. All authors made substantial contributions to the concept, design, and revision of the paper.

Corresponding author

Correspondence to Keer Wang.

Ethics declarations

Conflict of interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liao, K., Wang, K., Zheng, Y. et al. Feature channel interaction long-tailed image classification model based on dual attention. SIViP 18, 1661–1670 (2024). https://doi.org/10.1007/s11760-023-02848-w

Download citation

Received: 09 September 2023
Revised: 08 October 2023
Accepted: 14 October 2023
Published: 29 November 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11760-023-02848-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature channel interaction long-tailed image classification model based on dual attention

Abstract

Access this article

Similar content being viewed by others

Densely Connected Image Classification Algorithm Combining with Self-attention

Feature Channel Adaptive Enhancement for Fine-Grained Visual Classification

TripleFormer: improving transformer-based image classification method using multiple self-attention inputs

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature channel interaction long-tailed image classification model based on dual attention

Abstract

Access this article

Similar content being viewed by others

Densely Connected Image Classification Algorithm Combining with Self-attention

Feature Channel Adaptive Enhancement for Fine-Grained Visual Classification

TripleFormer: improving transformer-based image classification method using multiple self-attention inputs

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation