research-article

WGLora:Efficient fine-tuning method integrating weights and gradient low-rank adaptation

Authors:

Lilan PengAuthors Info & Claims

ICCBN '24: Proceedings of the 2024 12th International Conference on Communications and Broadband Networking

Pages 108 - 114

https://doi.org/10.1145/3688636.3688654

Published: 11 October 2024 Publication History

Abstract

To the ever-increasing size of weights and optimizer states. While existing low-rank adaptation techniques, like LoRA, reduce memory by introducing trainable low-rank matrices, they often fall short in matching the performance of full-parameter fine-tuning. To overcome this limitation, we introduce a novel fine-tuning approach that combines weight decomposition and gradient low-rank projection. By decomposing pre-trained weights into magnitude and direction components, and projecting gradients into a low-rank space, our method substantially reduces memory usage while maintaining gradient statistics. This approach enables efficient fine-tuning with reduced training and storage costs, without compromising model performance. When fine-tuning RoBERTa on GLUE tasks, our method achieves up to 63% reduction in optimizer state memory and around 21% decrease in GPU memory, while improving inference time by 12%. Notably, on the RTE dataset, our approach surpasses full-parameter fine-tuning in accuracy.

References

[1]

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arxiv preprint arxiv:2303.18223.

[2]

Chowdhary, KR1442, and K. R. Chowdhary. "Natural language processing." Fundamentals of artificial intelligence (2020): 603-649.

[3]

Han, Zeyu, "Parameter-efficient fine-tuning for large models: A comprehensive survey." arxiv preprint arxiv:2403.14608 (2024).

[4]

Houlsby, Neil, "Parameter-efficient transfer learning for NLP." International conference on machine learning. PMLR, 2019.

[5]

Lester, Brian, Rami Al-Rfou, and Noah Constant. "The power of scale for parameter-efficient prompt tuning." arxiv preprint arxiv:2104.08691 (2021).

[6]

Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. arxiv preprint arxiv:2101.00190.

[7]

Shi Z, Lipani A. Dept: Decomposed prompt tuning for parameter-efficient fine-tuning[J]. arxiv preprint arxiv:2309.05173, 2023.

[8]

Zaken, Elad Ben, Shauli Ravfogel, and Yoav Goldberg. "Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models." arxiv preprint arxiv:2106.10199 (2021).

[9]

Mao, Yuning, "Unipelt: A unified framework for parameter-efficient language model tuning." arxiv preprint arxiv:2110.07577 (2021).

[10]

Devlin, Jacob, "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

[11]

Vaswani, Ashish, "Attention is all you need." Advances in neural information processing systems 30 (2017).

[12]

Radford A, Narasimhan K, Salimans T, Improving language understanding by generative pre-training[J]. 2018.

[13]

Liu Y, Ott M, Goyal N, Roberta: A robustly optimized bert pretraining approach[J]. arxiv preprint arxiv:1907.11692, 2019.

[14]

He, Pengcheng, "Deberta: Decoding-enhanced bert with disentangled attention." arxiv preprint arxiv:2006.03654 (2020).

[15]

J. Koco ́ n, I. Cichecki, O. Kaszyca, M. Kochanek, D. Szydło, J. Baran, J. Bielaniewicz, M. Gruza, A. Janz, K. Kanclerz, Chatgpt: Jack of all trades, master of none. Information Fusion, 99:101861, 2023

[16]

Touvron, Hugo, "Llama: Open and efficient foundation language models." arxiv preprint arxiv:2302.13971 (2023).

[17]

Hu, Edward J., "Lora: Low-rank adaptation of large language models." arxiv preprint arxiv:2106.09685 (2021).

[18]

Han Z, Gao C, Liu J, Parameter-efficient fine-tuning for large models: A comprehensive survey[J]. arxiv preprint arxiv:2403.14608, 2024.

[19]

Zhang, Qingru, "Adaptive budget allocation for parameter-efficient fine-tuning." International Conference on Learning Representations. Openreview, 2023.

[20]

Dettmers, Tim, "Qlora: Efficient finetuning of quantized llms." Advances in Neural Information Processing Systems 36 (2024).

[21]

Zhang, Feiyu, "Increlora: Incremental parameter allocation method for parameter-efficient fine-tuning." arxiv preprint arxiv:2308.12043 (2023).

[22]

Kingma, Diederik, and J. Ba . "Adam: A Method for Stochastic Optimization." Computer Science (2014).

[23]

Shazeer, Noam, and Mitchell Stern. "Adafactor: Adaptive learning rates with sublinear memory cost." International Conference on Machine Learning. PMLR, 2018.

[24]

Zhao, Jiawei, "Galore: Memory-efficient llm training by gradient low-rank projection." arxiv preprint arxiv:2403.03507 (2024).

[25]

Anil, Rohan, "Memory efficient adaptive optimization." Advances in Neural Information Processing Systems 32 (2019).

[26]

yang Liu, Shih, "Dora: Weight-decomposed low-rank adaptation." Arxiv, abs/2402.09353 5 (2024).

[27]

Li, Zhiyuan, and Sanjeev Arora. "An exponential learning rate schedule for deep learning." arxiv preprint arxiv:1910.07454 (2019).

Index Terms

WGLora:Efficient fine-tuning method integrating weights and gradient low-rank adaptation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

Compressive sensing via nonlocal low-rank tensor regularization

The aim of Compressing sensing (CS) is to acquire an original signal, when it is sampled at a lower rate than Nyquist rate previously. In the framework of CS, the original signal is often assumed to be sparse and correlated in some domain. Recently, ...
Learning autoencoders with low-rank weights
2017 IEEE International Conference on Image Processing (ICIP)
In this work we propose to regularize the encoding and decoding weights of an autoencoder using low-rank penalty in the form of nuclear norm. Such a formulation models redundancy in the network. We show that our proposed method yields better ...
Image compressed sensing based on non-convex low-rank approximation

Nonlocal sparsity and structured sparsity have been evidenced to improve the reconstruction of image details in various compressed sensing (CS) studies. The nonlocal processing is achieved by grouping similar patches of the image into the groups. To ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCBN '24: Proceedings of the 2024 12th International Conference on Communications and Broadband Networking

July 2024

221 pages

ISBN:9798400717109

DOI:10.1145/3688636

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCBN 2024

ICCBN 2024: 2024 12th International Conference on Communications and Broadband Networking

July 24 - 27, 2024

Nyingchi, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
41
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)8

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten