Skip to main content

Dropout Mixture Low-Rank Adaptation for Visual Parameters-Efficient Fine-Tuning

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Parameter-efficient fine-tuning methods adjust a small subset of parameters in large models, achieving performance comparable to or even surpassing that of models fine-tuned with the full parameter set, and significantly reducing the time and computational costs associated with the fine-tuning process. Despite the developments of parameter-efficient fine-tuning methods for large models, we observe significant performance disparities across different vision tasks. We attribute this pronounced performance variability to the insufficient robustness of current parameter-efficient fine-tuning methods. In this paper, we propose a robust reparameterization framework for parameter-efficient fine-tuning. This framework has a dynamic training structure and introduces no additional computational overhead during the inference stage. Specifically, we propose Dropout-Mixture Low-Rank Adaptation (DMLoRA), yrrwhich incorporates multiple up and down branches, to provide the model with a more robust gradient descent path. As training proceeds, DMLoRA gradually drops out branches to achieve a balance between accuracy and regularization. Additionally, we employ a 2-Stage Learning Scalar (LS) strategy to optimize the scale factor for each layer’s DMLoRA module. Experimental results demonstrate that our method achieves state-of-the-art performance on the benchmark VTAB-1k and FGVC datasets for parameter-efficient fine-tuning.

Z. Fang and Y. Wang—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  2. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)

    Google Scholar 

  3. Chen, S., et al.: Adaptformer: adapting vision transformers for scalable visual recognition. Adv. Neural. Inf. Process. Syst. 35, 16664–16678 (2022)

    Google Scholar 

  4. Ding, N., et al.: Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat. Mach. Intell. 5(3), 220–235 (2023)

    Google Scholar 

  5. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  6. Dou, S., et al.: The art of balancing: revolutionizing mixture of experts for maintaining world knowledge in language model alignment. arXiv preprint arXiv:2312.09979 (2023)

  7. Edalati, A., Tahaei, M.S., Kobyzev, I., Nia, V.P., Clark, J.J., Rezagholizadeh, M.: Krona: parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650 (2022)

  8. Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Fei-Fei, L.: Fine-grained car detection for visual census estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

    Google Scholar 

  9. Guo, D., Rush, A.M., Kim, Y.: Parameter-efficient transfer learning with diff pruning. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, 1–6 August 2021, pp. 4884–4896. Association for Computational Linguistics (2021). https://doi.org/10.18653/V1/2021.ACL-LONG.378

  10. Han, C., et al.: E 2 VPT: an effective and efficient approach for visual prompt tuning. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 17445–17456. IEEE (2023)

    Google Scholar 

  11. He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., Neubig, G.: Towards a unified view of parameter-efficient transfer learning. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net (2022). https://openreview.net/forum?id=0RDcd5Axok

  12. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)

    Google Scholar 

  13. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  14. Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: International Conference on Machine Learning, pp. 2790–2799. PMLR (2019)

    Google Scholar 

  15. Hu, E.J., et al.: Lora: low-rank adaptation of large language models. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net (2022). https://openreview.net/forum?id=nZeVKeeFYf9

  16. Jia, M., et al.: Visual prompt tuning. In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13693, pp. 709–727. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_41

  17. Jie, S., Deng, Z.H.: Convolutional bypasses are better vision transformer adapters. arXiv preprint arXiv:2207.07039 (2022)

  18. Jie, S., Deng, Z.H.: Fact: Factor-tuning for lightweight adaptation on vision transformer. Proc. AAAI Conf. Artif. Intell. 37, 1060–1068 (2023)

    Google Scholar 

  19. Khosla, A., Jayadevaprakash, N., Yao, B., Li, F.F.: Novel dataset for fine-grained image categorization: stanford dogs. In: Proceedings of the CVPR Workshop on Fine-Grained Visual Categorization (FGVC), vol. 2. Citeseer (2011)

    Google Scholar 

  20. Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. In: Moens, M., Huang, X., Specia, L., Yih, S.W. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event/Punta Cana, 7–11 November 2021, pp. 3045–3059. Association for Computational Linguistics (2021). https://doi.org/10.18653/V1/2021.EMNLP-MAIN.243

  21. Li, J., Li, D., Xiong, C., Hoi, S.: Blip: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International Conference on Machine Learning, pp. 12888–12900. PMLR (2022)

    Google Scholar 

  22. Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, 1–6 August 2021, pp. 4582–4597. Association for Computational Linguistics (2021). https://doi.org/10.18653/V1/2021.ACL-LONG.353

  23. Lialin, V., Deshpande, V., Rumshisky, A.: Scaling down to scale up: a guide to parameter-efficient fine-tuning. arXiv preprint arXiv:2303.15647 (2023)

  24. Lian, D., Zhou, D., Feng, J., Wang, X.: Scaling & shifting your features: a new baseline for efficient model tuning. In: NeurIPS (2022). http://papers.nips.cc/paper_files/paper/2022/hash/00bb4e415ef117f2dee2fc3b778d806d-Abstract-Conference.html

  25. Lin, W., et al.: Hierarchical side-tuning for vision transformers. arXiv preprint arXiv:2310.05393 (2023)

  26. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  27. Luo, G., et al.: Towards efficient visual adaption via structural re-parameterization. arXiv preprint arXiv:2302.08106 (2023)

  28. Mao, Y., et al.: Unipelt: a unified framework for parameter-efficient language model tuning. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, 22–27 May 2022, pp. 6253–6264. Association for Computational Linguistics (2022). https://doi.org/10.18653/V1/2022.ACL-LONG.433

  29. Mercea, O.B., Gritsenko, A., Schmid, C., Arnab, A.: Time-memory-and parameter-efficient visual adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5536–5545 (2024)

    Google Scholar 

  30. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729. IEEE (2008)

    Google Scholar 

  31. OpenAI: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  32. Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., Gurevych, I.: Adapterfusion: non-destructive task composition for transfer learning. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 487–503 (2021)

    Google Scholar 

  33. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  34. Rücklé, A., et al.: Adapterdrop: on the efficiency of adapters in transformers. arXiv preprint arXiv:2010.11918 (2020)

  35. Touvron, H., et al.: Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

  36. Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595–604 (2015)

    Google Scholar 

  37. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  38. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)

    Google Scholar 

  39. Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)

    Google Scholar 

  40. Wang, Y., Mukherjee, S., Liu, X., Gao, J., Awadallah, A.H., Gao, J.: Adamix: mixture-of-adapter for parameter-efficient tuning of large language models

    Google Scholar 

  41. Zaken, E.B., Ravfogel, S., Goldberg, Y.: Bitfit: simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199 (2021)

  42. Zhai, X., et al.: A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint arXiv:1910.04867 (2019)

  43. Zhang, Y., Zhou, K., Liu, Z.: Neural prompt search (2022)

    Google Scholar 

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 62302297, 72192821, 62272447), Shanghai Sailing Program (22YF1420300), Young Elite Scientists Sponsorship Program by CAST (2022QNRC001), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), Shanghai Science and Technology Commission (21511101200), the Fundamental Research Funds for the Central Universities (YG2023QNB17, YG2024QNA44), Beijing Natural Science Foundation (L222117).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ran Yi .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 599 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fang, Z., Wang, Y., Yi, R., Ma, L. (2025). Dropout Mixture Low-Rank Adaptation for Visual Parameters-Efficient Fine-Tuning. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15065. Springer, Cham. https://doi.org/10.1007/978-3-031-72667-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72667-5_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72666-8

  • Online ISBN: 978-3-031-72667-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics