MEKF: long-tailed visual recognition via multiple experts with knowledge fusion

Zhang, Qian; Ji, Chenghao; Shao, Mingwen; Liang, Hong

doi:10.1007/s11227-025-06920-9

MEKF: long-tailed visual recognition via multiple experts with knowledge fusion

Published: 20 January 2025

Volume 81, article number 407, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Qian Zhang¹,
Chenghao Ji¹,
Mingwen Shao¹ &
…
Hong Liang¹

95 Accesses
Explore all metrics

Abstract

The distribution of data in real-world applications often shows a long-tailed shape, making long-tailed recognition challenging as traditional models bias toward majority categories. Multi-expert ensemble methods have shown promise but often suffer from insufficient expert diversity and high model variance. To address these issues, we propose multiple experts with knowledge fusion (MEKF), which includes diversified fusion experts and dual-view self-distillation. MEKF enhances expert diversity by fusing features of different depths and introduces distribution diversity loss with distribution weights. Dual-view self-distillation reduces model variance by extracting semantic information from weakly augmented data predictions. Experiments on CIFAR100-LT, ImageNet-LT, and Places-LT benchmarks validate MEKF’s effectiveness, showing excellent performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AMI-Class: Towards a Fully Automated Multi-view Image Classifier

Few-shot and meta-learning methods for image understanding: a survey

Article Open access 29 June 2023

GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. The code that supports the findings of this study is available from the corresponding author upon reasonable request.

Code availability

The code that support the findings of this study are available from the corresponding author upon reasonable request.

References

Deng J, Dong W, Socher R, Li J, Li K, Li F (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464
Article MATH Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Buda M, Maki A, Mazurowski M (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Article MATH Google Scholar
Byrd J, Lipton Z (2019) What is the effect of importance weighting in deep learning? In: International Conference on Machine Learning, pp 872–881
He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article MATH Google Scholar
Cui J, Zhong Z, Liu S, Yu B, Jia, J (2021) Parametric contrastive learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 715–724
Hinton, G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531
Cai J, Wang Y, Hwang J (2021) Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 112–121
Zhang Y, Kang B, Hooi B, Yan S, Feng J (2023) Deep long-tailed learning: a survey. IEEE Trans Pattern Anal Mach Intell 45(9):10795–10816
Article MATH Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Girshick R, Donahue J, Darrell T and Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Khan S, Hayat M, Bennamoun M, Sohel F, Togneri R (2017) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587
Article MATH Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988
Cubuk E, Zoph B, Shlens J, Le Q (2020) Randaugment: Practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 702–703
Zhu J, Wang Z, Chen J, Chen Y, Jiang Y (2022) Balanced contrastive learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6908–6917
Li J, Tan Z, Wan J, Lei Z, Guo G (2022) Nested collaborative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6949–6958
Kang B, Xie, S, Rohrbach, M, Yan, Z, Gordo A, Feng, J, Kalantidis Y (2019) Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217
Park S, Lim J, Jeon Y, Choi J (2021) Influence-balanced loss for imbalanced visual classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 735–744
Zhong Z, Cui J, Liu S, Jia J (2021) Improving calibration for long-tailed recognition. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition:16489–16498
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449
Article MATH Google Scholar
Zhou B, Cui Q, Wei X, Chen Z (2020) BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9719–9728
Cao K, Wei C, Gaidon A, Arechiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in neural information processing systems, vol 32
Hou Z, Yu B, Tao D (2022) BatchFormer: learning to explore sample relationships for robust representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7256–7266
Cui J, Liu S, Tian Z, Zhong Z, Jia J (2022) ResLT: residual learning for long-tailed recognition. IEEE Trans Pattern Anal Mach Intell 45(3):3695–3706
MATH Google Scholar
Ren J, Yu C, Ma X, Zhao H, Yi S (2020) Balanced meta-softmax for long-tailed visual recognition. Adv Neural Inf Process Syst 33:4175–4186
MATH Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III, vol 18, pp 234–241
Cho J, Hariharan B (2019) On the efficacy of knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4794–4802
Menon A, Jayasumana S, Rawat A, Jain H, Veit A, Kumar S (2020) Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314
Li B, Han Z, Li H, Fu,H, Zhang C (2022) Trustworthy long-tailed classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6970–6979
Li Y, Wang T, Kang B, Tang S, Wang C, Li J, Feng J (2020) Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10991–11000
Kim J, Park S, Kwak N (2018) Paraphrasing complex network: network compression via factor transfer. In: Advances in neural information processing systems 31
Ahn S, Hu S, Damianou A, Lawrence N, Dai Z (2019) Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9163–9171
Zagoruyko S, Komodakis N (2016) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928
Liu Z, Miao Z, Zhan X, Wang J, Gong B, Yu S (2019) Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2537–2546
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440
Xiang L, Ding G, Han J (2020) Learning from multiple experts: self-paced knowledge distillation for long-tailed classification. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020. Proceedings, Part V, vol 16, pp 247–263
Wang X, Lian L, Miao Z, Liu Z, Yu S (2020) Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp 213–229
Zhang Y, Hooi B, Hong L, Feng J (2022) Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition. In: Advances in neural information processing systems, pp 34077–34090
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

Download references

Acknowledgements

This work described in this paper was supported by the National Natural Science Foundation of China (No. 61673396) and the Natural Science Foundation of Shandong Province (No. ZR2022MF260).

Funding

The work was supported by the National Natural Science Foundation of China (No. 61673396) and the Natural Science Foundation of Shandong Province (No. ZR2022MF260).

Author information

Authors and Affiliations

Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, Shandong, China
Qian Zhang, Chenghao Ji, Mingwen Shao & Hong Liang

Authors

Qian Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Chenghao Ji
View author publications
You can also search for this author inPubMed Google Scholar
Mingwen Shao
View author publications
You can also search for this author inPubMed Google Scholar
Hong Liang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

H.L. contributed to data curation; M.S. involved in funding acquisition; C.H. prepared writing—original draft; Q.Z. revised writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Chenghao Ji.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Q., Ji, C., Shao, M. et al. MEKF: long-tailed visual recognition via multiple experts with knowledge fusion. J Supercomput 81, 407 (2025). https://doi.org/10.1007/s11227-025-06920-9

Download citation

Accepted: 06 January 2025
Published: 20 January 2025
DOI: https://doi.org/10.1007/s11227-025-06920-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MEKF: long-tailed visual recognition via multiple experts with knowledge fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

AMI-Class: Towards a Fully Automated Multi-view Image Classifier

Few-shot and meta-learning methods for image understanding: a survey

GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now