skip to main content
10.1145/3664647.3680799acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

Parameter-Efficient Complementary Expert Learning for Long-Tailed Visual Recognition

Published: 28 October 2024 Publication History

Abstract

Long-tailed recognition (LTR) aims to learn balanced models from extremely unbalanced training data. Fine-tuning pretrained foundation models has recently emerged as a promising research direction for LTR. However, we observe that the fine-tuning process tends to degrade the intrinsic representation capability of pretrained models and lead to model bias towards certain classes, thereby hindering the overall recognition performance. To unleash the intrinsic representation capability of pretrained foundation models, in this work, we propose a new Parameter-Efficient Complementary Expert Learning (PECEL) for LTR. Specifically, PECEL consists of multiple experts, where individual experts are trained via Parameter-Efficient Fine-Tuning (PEFT) and encouraged to learn different expertise on complementary sub-categories via the proposed sample-aware logit adjustment loss. By aggregating the predictions of different experts, PECEL effectively achieves a balanced performance on long-tailed classes. Nevertheless, learning multiple experts generally introduces extra trainable parameters. To ensure parameter efficiency, we further propose a parameter sharing strategy which decomposes and shares the parameters in each expert. Extensive experiments on 4 LTR benchmarks show that the proposed PECEL can effectively learn multiple complementary experts without increasing the trainable parameters and achieve new state-of-the-art performance.

References

[1]
Emanuel Sanchez Aimar, Arvi Jonnarth, Michael Felsberg, and Marco Kuhlmann. 2023. Balanced Product of Calibrated Experts for Long-Tailed Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19967--19977.
[2]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[3]
Philip Bachman, R Devon Hjelm, and William Buchwalter. 2019. Learning representations by maximizing mutual information across views. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[4]
Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. 2019. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[5]
Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. 2022. Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, Vol. 35 (2022), 16664--16678.
[6]
Xiaohua Chen, Yucan Zhou, Dayan Wu, Chule Yang, Bo Li, Qinghua Hu, and Weiping Wang. 2023. Area: adaptive reweighting via effective area for long-tailed classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 19277--19287.
[7]
Jiequan Cui, Zhisheng Zhong, Shu Liu, Bei Yu, and Jiaya Jia. 2021. Parametric contrastive learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 715--724.
[8]
Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9268--9277.
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 248--255.
[10]
Bowen Dong, Pan Zhou, Shuicheng Yan, and Wangmeng Zuo. 2023. LPT: Long-tailed Prompt Tuning for Image Classification. In International Conference on Learning Representations.
[11]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
[12]
Fei Du, Peng Yang, Qi Jia, Fengtao Nan, Xiaoting Chen, and Yun Yang. 2023. Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 15814--15823.
[13]
Naoya Hasegawa and Issei Sato. 2023. Exploring Weight Balancing on Long-Tailed Recognition Problem. arXiv preprint arXiv:2305.16573 (2023).
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[16]
Xiaoxuan He, Siming Fu, Xinpeng Ding, Yuchen Cao, and Hualiang Wang. 2023. Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition. In Proceedings of the 31st ACM International Conference on Multimedia. 5027--5037.
[17]
Yan Hong, Jianfu Zhang, Zhongyi Sun, and Ke Yan. 2022. Safa: Sample-adaptive feature augmentation for long-tailed image classification. In European Conference on Computer Vision. Springer, 587--603.
[18]
Chengkai Hou, Jieyu Zhang, Haonan Wang, and Tianyi Zhou. 2023. Subclass-balancing contrastive learning for long-tailed recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5395--5407.
[19]
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790--2799.
[20]
Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2021. LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations.
[21]
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022. Visual prompt tuning. In European Conference on Computer Vision. Springer, 709--727.
[22]
Bingyi Kang, Yu Li, Sa Xie, Zehuan Yuan, and Jiashi Feng. 2020. Exploring balanced feature spaces for representation learning. In International Conference on Learning Representations.
[23]
Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. 2020. Decoupling Representation and Classifier for Long-Tailed Recognition. In International Conference on Learning Representations.
[24]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross B. Girshick. 2023. Segment Anything. Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), 3992--4003.
[25]
Dawid Jan Kopiczko, Tijmen Blankevoort, and Yuki M Asano. 2024. VeRA: Vector-based Random Matrix Adaptation. In International Conference on Learning Representations.
[26]
Alex Krizhevsky et al. 2009. Learning multiple layers of features from tiny images. (2009).
[27]
Jun Li, Zichang Tan, Jun Wan, Zhen Lei, and Guodong Guo. 2022. Nested collaborative learning for long-tailed visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6949--6958.
[28]
Mengke Li, Yiu-ming Cheung, and Yang Lu. 2022. Long-tailed visual recognition via gaussian clouded logit adjustment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6929--6938.
[29]
Tianhong Li, Peng Cao, Yuan Yuan, Lijie Fan, Yuzhe Yang, Rogerio S Feris, Piotr Indyk, and Dina Katabi. 2022. Targeted supervised contrastive learning for long-tailed recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6918--6928.
[30]
Vladislav Lialin, Vijeta Deshpande, and Anna Rumshisky. 2023. Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv preprint arXiv:2303.15647 (2023).
[31]
Bo Liu, Haoxiang Li, Hao Kang, Gang Hua, and Nuno Vasconcelos. 2022. Breadcrumbs: Adversarial class-balanced sampling for long-tailed recognition. In European Conference on Computer Vision. Springer, 637--653.
[32]
Alexander Long, Wei Yin, Thalaiyasingam Ajanthan, Vu Nguyen, Pulak Purkait, Ravi Garg, Alan Blair, Chunhua Shen, and Anton van den Hengel. 2022. Retrieval augmented classification for long-tail visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6959--6969.
[33]
Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations.
[34]
Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, and Yu Qiao. 2021. A simple long-tailed recognition baseline via vision-language model. arXiv preprint arXiv:2111.14745 (2021).
[35]
Yanbiao Ma, Licheng Jiao, Fang Liu, Shuyuan Yang, Xu Liu, and Lingling Li. 2023. Orthogonal Uncertainty Representation of Data Manifold for Robust Long-Tailed Learning. In Proceedings of the 31st ACM International Conference on Multimedia. 4848--4857.
[36]
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. 2021. Long-tail learning via logit adjustment. In International Conference on Learning Representations.
[37]
Giung Nam, Sunguk Jang, and Juho Lee. 2023. Decoupled Training for Long-Tailed Classification With Stochastic Representations. In The Eleventh International Conference on Learning Representations.
[38]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. 2023. Dinov2: Learning robust visual features without supervision. Transactions on Machine Learning Research (2023).
[39]
Seulki Park, Youngkyu Hong, Byeongho Heo, Sangdoo Yun, and Jin Young Choi. 2022. The majority can help the minority: Context-rich minority oversampling for long-tailed classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6887--6896.
[40]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.
[41]
Jiawei Ren, Cunjun Yu, Xiao Ma, Haiyu Zhao, Shuai Yi, et al. 2020. Balanced meta-softmax for long-tailed visual recognition. Advances in Neural Information Processing Systems, Vol. 33 (2020), 4175--4186.
[42]
Jiang-Xin Shi, Tong Wei, Yuke Xiang, and Yu-Feng Li. 2023. How Re-sampling Helps for Long-Tail Learning? Advances in Neural Information Processing Systems, Vol. 36 (2023).
[43]
Jiang-Xin Shi, Tong Wei, Zhi Zhou, Jie-Jing Shao, Xin-Yan Han, and Yu-Feng Li. 2024. Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts. In International Conference on Machine Learning.
[44]
Min-Kook Suh and Seung-Woo Seo. 2023. Long-tailed recognition by mutual information maximization between latent features and ground-truth labels. In International Conference on Machine Learning. PMLR, 32770--32782.
[45]
Yingfan Tao, Jingna Sun, Hao Yang, Li Chen, Xu Wang, Wenming Yang, Daniel Du, and Min Zheng. 2023. Local and Global Logit Adjustments for Long-Tailed Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11783--11792.
[46]
Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, and Yu Qiao. 2022. Vl-ltr: Learning class-wise visual-linguistic representation for long-tailed visual recognition. In European Conference on Computer Vision. Springer, 73--91.
[47]
Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. 2018. The inaturalist species classification and detection dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8769--8778.
[48]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems, Vol. 30 (2017).
[49]
Binwu Wang, Pengkun Wang, Wei Xu, Xu Wang, Yudong Zhang, Kun Wang, and Yang Wang. 2024. Kill Two Birds with One Stone: Rethinking Data Augmentation for Deep Long-tailed Learning. (2024).
[50]
Hualiang Wang, Siming Fu, Xiaoxuan He, Hangxiang Fang, Zuozhu Liu, and Haoji Hu. 2022. Towards calibrated hyper-sphere representation via distribution overlap coefficient for long-tailed learning. In European Conference on Computer Vision. Springer, 179--196.
[51]
Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, and Stella Yu. 2021. Long-tailed Recognition by Routing Diverse Distribution-Aware Experts. In International Conference on Learning Representations.
[52]
Zhengzhuo Xu, Ruikang Liu, Shuo Yang, Zenghao Chai, and Chun Yuan. 2023. Learning Imbalanced Data with Vision Transformers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 15793--15803.
[53]
Yibo Yang, Shixiang Chen, Xiangtai Li, Liang Xie, Zhouchen Lin, and Dacheng Tao. 2022. Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network? Advances in Neural Information Processing Systems, Vol. 35 (2022), 37991--38002.
[54]
Yifan Zhang, Bryan Hooi, Lanqing Hong, and Jiashi Feng. 2022. Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition. Advances in Neural Information Processing Systems, Vol. 35 (2022), 34077--34090.
[55]
Qihao Zhao, Chen Jiang, Wei Hu, Fan Zhang, and Jun Liu. 2023. MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11597--11608.
[56]
Yan Zhao, Weicong Chen, Xu Tan, Kai Huang, and Jihong Zhu. 2022. Adaptive logit adjustment loss for long-tailed visual recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 3472--3480.
[57]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).
[58]
Jianggang Zhu, Zheng Wang, Jingjing Chen, Yi-Ping Phoebe Chen, and Yu-Gang Jiang. 2022. Balanced contrastive learning for long-tailed visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6908--6917.

Index Terms

  1. Parameter-Efficient Complementary Expert Learning for Long-Tailed Visual Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Check for updates

    Author Tags

    1. long-tailed recognition
    2. multiple experts
    3. parameter-efficient fine-tuning

    Qualifiers

    • Research-article

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 134
      Total Downloads
    • Downloads (Last 12 months)134
    • Downloads (Last 6 weeks)56
    Reflects downloads up to 30 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media