$$\cal{Y}$$ -Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning

Liu, Yitao; An, Chenxin; Qiu, Xipeng

doi:10.1007/s11704-023-3131-8

$\cal{Y}$-Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning

Research Article
Published: 18 December 2023

Volume 18, article number 184320, (2024)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Yitao Liu¹,
Chenxin An¹ &
Xipeng Qiu¹

122 Accesses
7 Altmetric
1 Mention
Explore all metrics

Abstract

With current success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Previous work focuses on designing parameter-efficient tuning paradigms but needs to save and compute the gradient of the whole computational graph. In this paper, we propose $\cal{Y}$-Tuning, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks. $\cal{Y}$-Tuning learns dense representations for labels $\cal{Y}$ defined in a given task and aligns them to fixed feature representation. Without computing the gradients of text encoder at training phrase, $\cal{Y}$-Tuning is not only parameter-efficient but also training-efficient. Experimental results show that for DeBERTa_XXL with 1.6 billion parameters, $\cal{Y}$-Tuning achieves performance more than 96% of full fine-tuning on GLUE Benchmark with only 2% tunable parameters and much fewer training costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bi-tuning: Efficient Transfer from Pre-trained Models

Reparameterization-Based Parameter-Efficient Fine-Tuning Methods for Large Language Models: A Systematic Survey

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

References

Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 4171–4186
Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: Proceedings of the 34th Conference on Neural Information Processing Systems. 2020, 1877–1901
Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X. Pre-trained models for natural language processing: a survey. Science China Technological Sciences, 2020, 63(10): 1872–1897
Article Google Scholar
Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S. Parameter-efficient transfer learning for NLP. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 2790–2799
Stickland A C, Murray I. BERT and PALs: projected attention layers for efficient adaptation in multi-task learning. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5986–5995
Li X L, Liang P. Prefix-tuning: optimizing continuous prompts for generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 4582–4597
Lester B, Al-Rfou R, Constant N. The power of scale for parameter-efficient prompt tuning. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 3045–3059
Liu X, Zheng Y, Du Z, Ding M, Qian Y, Yang Z, Tang J. GPT understands, too. 2021, arXiv preprint arXiv: 2103.10385
Sun Y, Wang S, Feng S, Ding S, Pang C, Shang J, Liu J, Chen X, Zhao Y, Lu Y, Liu W, Wu Z, Gong W, Liang J, Shang Z, Sun P, Liu W, Ouyang X, Yu D, Tian H, Wu H, Wang H. ERNIE 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. 2021, arXiv preprint arXiv: 2107.02137
Sun T, Shao Y, Qian H, Huang X, Qiu X. Black-box tuning for language-model-as-a-service. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 20841–20855
Pfeiffer J, Rücklé A, Poth C, Kamath A, Vulić I, Ruder S, Cho K, Gurevych I. AdapterHub: a framework for adapting transformers. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020, 46–54
Le Scao T, Rush A. How many data points is a prompt worth? In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 2627–2636
Schick T, Schutze H. Exploiting cloze-questions for few-shot text classification and natural language inference. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021, 255–269
Petroni F, Rocktaschel T, Riedel S, Lewis P, Bakhtin A, Wu Y, Miller A. Language models as knowledge bases? In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 2463–2473
Jiang Z, Xu F F, Araki J, Neubig G. How can we know what language models know? Transactions of the Association for Computational Linguistics, 2020, 8: 423–438
Article Google Scholar
Aghajanyan A, Gupta S, Zettlemoyer L. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 7319–7328
Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. Lora: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2021
He J, Zhou C, Ma X, Berg-Kirkpatrick T, Neubig G. Towards a unified view of parameter-efficient transfer learning. In: Proceedings of the 10th International Conference on Learning Representations. 2022
Sung Y L, Cho J, Bansal M. LST: ladder side-tuning for parameter and memory efficient transfer learning. 2022, arXiv preprint arXiv: 2206.06522
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 7871–7880
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: a robustly optimized BERT pretraining approach. 2019, arXiv preprint arXiv: 1907.11692
He P, Liu X, Gao J, Chen W. DeBERTa: decoding-enhanced Bert with disentangled attention. In: Proceedings of the 9th International Conference on Learning Representations. 2021
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Le Scao T, Gugger S, Drame M, Lhoest Q, Rush A. Transformers: state-of-the-art natural language processing. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020, 38–45
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. Albert: a lite bert for self-supervised learning of language representations. In: Proceedings of the 8th International Conference on Learning Representations. 2020
Wang A, Pruksachatkun Y, Nangia N, Singh A, Michael J, Hill F, Levy O, Bowman S R. SuperGLUE: a stickier benchmark for general-purpose language understanding systems. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019, 32
Hambardzumyan K, Khachatrian H, May J. WARP: word-level adversarial ReProgramming. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021, 4921–4933
Liu X, Ji K, Fu Y, Tam W, Du Z, Yang Z, Tang J. P-tuning: prompt tuning can be comparable to fine-tuning across scales and tasks. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 61–68
Pfeiffer J, Kamath A, Rücklé A, Cho K, Gurevych I. AdapterFusion: non-destructive task composition for transfer learning. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021, 487–503
Jin D, Jin Z, Zhou J T, Szolovits P. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8018–8025
Li L, Ma R, Guo Q, Xue X, Qiu X. BERT-ATTACK: adversarial attack against BERT using BERT. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 6193–6202
Ren S, Deng Y, He K, Che W. Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 1085–1097
Gao J, Lanchantin J, Soffa M L, Qi Y. Black-box generation of adversarial text sequences to evade deep learning classifiers. In: Proceedings of 2018 IEEE Security and Privacy Workshops. 2018, 50–56
Zeng G, Qi F, Zhou Q, Zhang T, Hou B, Zang Y, Liu Z, Sun M. OpenAttack: an open-source textual adversarial attack toolkit. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. 2021, 363–371
Li X, Sun X, Meng Y, Liang J, Wu F, Li J. Dice loss for data-imbalanced NLP tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 465–476
Yeh C K, Wu W C, Ko W J, Wang Y C F. Learning deep latent space for multi-label classification. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2838–2844
Sun X, Wei B, Ren X, Ma S. Label embedding network: learning label representation for soft training of deep networks. 2017, arXiv preprint arXiv: 1710.10393
Wang H, Chen C, Liu W, Chen K, Hu T, Chen G. Incorporating label embedding and feature augmentation for multi-dimensional classification. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 6178–6185
Wang G, Li C, Wang W, Zhang Y, Shen D, Zhang X, Henao R, Carin L. Joint embedding of words and labels for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018, 2321–2331
Zhang H, Xiao L, Chen W, Wang Y, Jin Y. Multi-task label embedding for text classification. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 4545–4553
Du C, Chen Z, Feng F, Zhu L, Gan T, Nie L. Explicit interaction model towards text classification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 6359–6366
Sun C, Huang L, Qiu X. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 380–385
Chai D, Wu W, Han Q, Wu F, Li J. Description based text classification with reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning, 2020, 1371–1382
Wang S, Fang H, Khabsa M, Mao H, Ma H. Entailment as few-shot learner. 2021, arXiv preprint arXiv: 2104.14690

Download references

Acknowledgements

The research was supported by the National Key R&D Program of China (No. 2020AAA0108702) and the National Natural Science Foundation of China (Grant No. 62022027).

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, 200433, China
Yitao Liu, Chenxin An & Xipeng Qiu

Authors

Yitao Liu
View author publications
You can also search for this author inPubMed Google Scholar
Chenxin An
View author publications
You can also search for this author inPubMed Google Scholar
Xipeng Qiu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xipeng Qiu.

Additional information

Yitao Liu is a postgraduate student at the School of Computer Science, Fudan University, China. His major research area lies in natural language processing and deep learning.

Chenxin An is a postgraduate student at the School of Computer Science, Fudan University, China. His major research area lies in natural language processing and deep learning.

Xipeng Qiu is a professor at the School of Computer Science, Fudan University, China. He received his BS and PhD degrees from Fudan University, China. His major research area lies in natural language processing and deep learning.

Electronic Supplementary Material

11704_2023_3131_MOESM1_ESM.pdf

$\cal{Y}$-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., An, C. & Qiu, X. $\cal{Y}$-Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning. Front. Comput. Sci. 18, 184320 (2024). https://doi.org/10.1007/s11704-023-3131-8

Download citation

Received: 18 February 2023
Accepted: 23 April 2023
Published: 18 December 2023
DOI: https://doi.org/10.1007/s11704-023-3131-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

\(\cal{Y}\)-Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bi-tuning: Efficient Transfer from Pre-trained Models

Reparameterization-Based Parameter-Efficient Fine-Tuning Methods for Large Language Models: A Systematic Survey

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

11704_2023_3131_MOESM1_ESM.pdf

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

\(\cal{Y}\)-Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bi-tuning: Efficient Transfer from Pre-trained Models

Reparameterization-Based Parameter-Efficient Fine-Tuning Methods for Large Language Models: A Systematic Survey

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

11704_2023_3131_MOESM1_ESM.pdf

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now