skip to main content
10.1145/3688636.3688654acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccbnConference Proceedingsconference-collections
research-article

WGLora:Efficient fine-tuning method integrating weights and gradient low-rank adaptation

Published: 11 October 2024 Publication History

Abstract

To the ever-increasing size of weights and optimizer states. While existing low-rank adaptation techniques, like LoRA, reduce memory by introducing trainable low-rank matrices, they often fall short in matching the performance of full-parameter fine-tuning. To overcome this limitation, we introduce a novel fine-tuning approach that combines weight decomposition and gradient low-rank projection. By decomposing pre-trained weights into magnitude and direction components, and projecting gradients into a low-rank space, our method substantially reduces memory usage while maintaining gradient statistics. This approach enables efficient fine-tuning with reduced training and storage costs, without compromising model performance. When fine-tuning RoBERTa on GLUE tasks, our method achieves up to 63% reduction in optimizer state memory and around 21% decrease in GPU memory, while improving inference time by 12%. Notably, on the RTE dataset, our approach surpasses full-parameter fine-tuning in accuracy.

References

[1]
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arxiv preprint arxiv:2303.18223.
[2]
Chowdhary, KR1442, and K. R. Chowdhary. "Natural language processing." Fundamentals of artificial intelligence (2020): 603-649.
[3]
Han, Zeyu, "Parameter-efficient fine-tuning for large models: A comprehensive survey." arxiv preprint arxiv:2403.14608 (2024).
[4]
Houlsby, Neil, "Parameter-efficient transfer learning for NLP." International conference on machine learning. PMLR, 2019.
[5]
Lester, Brian, Rami Al-Rfou, and Noah Constant. "The power of scale for parameter-efficient prompt tuning." arxiv preprint arxiv:2104.08691 (2021).
[6]
Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. arxiv preprint arxiv:2101.00190.
[7]
Shi Z, Lipani A. Dept: Decomposed prompt tuning for parameter-efficient fine-tuning[J]. arxiv preprint arxiv:2309.05173, 2023.
[8]
Zaken, Elad Ben, Shauli Ravfogel, and Yoav Goldberg. "Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models." arxiv preprint arxiv:2106.10199 (2021).
[9]
Mao, Yuning, "Unipelt: A unified framework for parameter-efficient language model tuning." arxiv preprint arxiv:2110.07577 (2021).
[10]
Devlin, Jacob, "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
[11]
Vaswani, Ashish, "Attention is all you need." Advances in neural information processing systems 30 (2017).
[12]
Radford A, Narasimhan K, Salimans T, Improving language understanding by generative pre-training[J]. 2018.
[13]
Liu Y, Ott M, Goyal N, Roberta: A robustly optimized bert pretraining approach[J]. arxiv preprint arxiv:1907.11692, 2019.
[14]
He, Pengcheng, "Deberta: Decoding-enhanced bert with disentangled attention." arxiv preprint arxiv:2006.03654 (2020).
[15]
J. Koco ́ n, I. Cichecki, O. Kaszyca, M. Kochanek, D. Szydło, J. Baran, J. Bielaniewicz, M. Gruza, A. Janz, K. Kanclerz, Chatgpt: Jack of all trades, master of none. Information Fusion, 99:101861, 2023
[16]
Touvron, Hugo, "Llama: Open and efficient foundation language models." arxiv preprint arxiv:2302.13971 (2023).
[17]
Hu, Edward J., "Lora: Low-rank adaptation of large language models." arxiv preprint arxiv:2106.09685 (2021).
[18]
Han Z, Gao C, Liu J, Parameter-efficient fine-tuning for large models: A comprehensive survey[J]. arxiv preprint arxiv:2403.14608, 2024.
[19]
Zhang, Qingru, "Adaptive budget allocation for parameter-efficient fine-tuning." International Conference on Learning Representations. Openreview, 2023.
[20]
Dettmers, Tim, "Qlora: Efficient finetuning of quantized llms." Advances in Neural Information Processing Systems 36 (2024).
[21]
Zhang, Feiyu, "Increlora: Incremental parameter allocation method for parameter-efficient fine-tuning." arxiv preprint arxiv:2308.12043 (2023).
[22]
Kingma, Diederik, and J. Ba . "Adam: A Method for Stochastic Optimization." Computer Science (2014).
[23]
Shazeer, Noam, and Mitchell Stern. "Adafactor: Adaptive learning rates with sublinear memory cost." International Conference on Machine Learning. PMLR, 2018.
[24]
Zhao, Jiawei, "Galore: Memory-efficient llm training by gradient low-rank projection." arxiv preprint arxiv:2403.03507 (2024).
[25]
Anil, Rohan, "Memory efficient adaptive optimization." Advances in Neural Information Processing Systems 32 (2019).
[26]
yang Liu, Shih, "Dora: Weight-decomposed low-rank adaptation." Arxiv, abs/2402.09353 5 (2024).
[27]
Li, Zhiyuan, and Sanjeev Arora. "An exponential learning rate schedule for deep learning." arxiv preprint arxiv:1910.07454 (2019).

Index Terms

  1. WGLora:Efficient fine-tuning method integrating weights and gradient low-rank adaptation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCBN '24: Proceedings of the 2024 12th International Conference on Communications and Broadband Networking
    July 2024
    221 pages
    ISBN:9798400717109
    DOI:10.1145/3688636
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICCBN 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 41
      Total Downloads
    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 12 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media