When Few-Shot Learning Meets Large-Scale Knowledge-Enhanced Pre-training: Alibaba at FewCLUE

Xu, Ziyun; Wang, Chengyu; Li, Peng; Li, Yang; Wang, Ming; Hou, Boyu; Qiu, Minghui; Tang, Chengguang; Huang, Jun

doi:10.1007/978-3-030-88483-3_34

When Few-Shot Learning Meets Large-Scale Knowledge-Enhanced Pre-training: Alibaba at FewCLUE

Ziyun Xu^12,13,
Chengyu Wang¹²,
Peng Li¹²,
Yang Li¹²,
Ming Wang^12,14,
Boyu Hou^12,15,
Minghui Qiu¹²,
Chengguang Tang¹² &
…
Jun Huang¹²

Conference paper
First Online: 06 October 2021

1606 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13029))

Abstract

With the wide popularity of Pre-trained Language Models (PLMs), it has been a hot research topic to improve the performance of PLMs in the few-shot learning setting. FewCLUE is a new benchmark to evaluate the few-shot learning ability of PLMs over nine challenging Chinese language understanding tasks, which poses significant challenges to the learning process of PLMs with very little training data available. In this paper, we present our solution to FewCLUE tasks by means of large-scale knowledge-enhanced pre-training over massive texts and knowledge triples, together with a new few-shot learning algorithm for downstream tasks. Experimental results show that the generated models achieve the best performance in both limited and unlimited tracks of FewCLUE. Our solution is developed upon the PyTorch version of the EasyTransfer toolkit and will be released to public.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/CLUEbenchmark/FewCLUE.
2.
For the Chinese language, we can use multiple masked tokens to generate model outputs in the form of multiple Chinese characters. For simplicity, in the algorithm description, we assume there is only one masked token.
3.
https://commoncrawl.org/.

References

Bao, H., et al.: UniLMv2: pseudo-masked language models for unified language model pre-training. In: ICML, vol. 119, pp. 642–652 (2020)
Google Scholar
Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. In: NeurIPS (2020)
Google Scholar
Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training deep nets with sublinear memory cost. CoRR abs/1604.06174 (2016)
Google Scholar
Cui, Y., Che, W., Liu, T., Qin, B., Yang, Z., Wang, S., Hu, G.: Pre-training with whole word masking for Chinese BERT. CoRR abs/1906.08101 (2019)
Google Scholar
Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: ACL, pp. 2978–2988 (2019)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Google Scholar
Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners. CoRR abs/2012.15723 (2020)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: CVPR, pp. 2704–2713 (2018)
Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. In: ICLR (2020)
Google Scholar
Liu, X., et al.: GPT understands, too. CoRR abs/2103.10385 (2021)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019)
Google Scholar
Micikevicius, P., et al.: Mixed precision training. CoRR abs/1710.03740 (2017)
Google Scholar
Peters, M.E., et al.: Knowledge enhanced contextual word representations. In: EMNLP, pp. 43–54 (2019)
Google Scholar
Phang, J., Févry, T., Bowman, S.R.: Sentence encoders on stilts: supplementary training on intermediate labeled-data tasks. CoRR abs/1811.01088 (2018)
Google Scholar
Qiu, M., et al.: EasyTransfer - a simple and scalable deep transfer learning platform for NLP applications. CIKM 2021 (2020). https://arxiv.org/abs/2011.09463
Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N., Huang, X.: Pre-trained models for natural language processing: a survey. CoRR abs/2003.08271 (2020)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020)
Google Scholar
Rasley, J., Rajbhandari, S., Ruwase, O., He, Y.: DeepSpeed: system optimizations enable training deep learning models with over 100 billion parameters. In: SIGKDD, pp. 3505–3506. ACM (2020)
Google Scholar
Schick, T., Schütze, H.: Exploiting cloze-questions for few-shot text classification and natural language inference. In: EACL, pp. 255–269 (2021)
Google Scholar
Shin, T., Razeghi, Y., IV, R.L.L., Wallace, E., Singh, S.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. In: EMNLP, pp. 4222–4235 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Google Scholar
Wang, A., et al.: SuperGLUE: a stickier benchmark for general-purpose language understanding systems. In: NeurIPS, pp. 3261–3275 (2019)
Google Scholar
Wang, S., Fang, H., Khabsa, M., Mao, H., Ma, H.: Entailment as few-shot learner. CoRR abs/2104.14690 (2021)
Google Scholar
Wang, W., et al.: StructBERT: incorporating language structures into pre-training for deep language understanding. In: ICLR (2020)
Google Scholar
Wang, X., Gao, T., Zhu, Z., Liu, Z., Li, J., Tang, J.: KEPLER: a unified model for knowledge embedding and pre-trained language representation. CoRR abs/1911.06136 (2019)
Google Scholar
Xu, L., et al.: CLUE: a Chinese language understanding evaluation benchmark. In: COLING, pp. 4762–4772 (2020)
Google Scholar
Xu, L., Zhang, X., Dong, Q.: CLUECorpus 2020: a large-scale Chinese corpus for pre-training language model. CoRR abs/2003.01355 (2020)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: NeurIPS, pp. 5754–5764 (2019)
Google Scholar
Yin, W., Rajani, N.F., Radev, D.R., Socher, R., Xiong, C.: Universal natural language processing with limited annotations: try few-shot textual entailment as a start. In: EMNLP, pp. 8229–8239 (2020)
Google Scholar
Zhang, D., et al.: E-BERT: a phrase and product knowledge enhanced language model for e-commerce. CoRR abs/2009.02835 (2020)
Google Scholar
Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M., Liu, Q.: ERNIE: enhanced language representation with informative entities. In: ACL, pp. 1441–1451 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Alibaba Group, Hangzhou, Zhejiang, 311121, China
Ziyun Xu, Chengyu Wang, Peng Li, Yang Li, Ming Wang, Boyu Hou, Minghui Qiu, Chengguang Tang & Jun Huang
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Ziyun Xu
School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai, 200433, China
Ming Wang
College of Computer Science, Chongqing University, Chongqing, 400044, China
Boyu Hou

Authors

Ziyun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Chengyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yang Li
View author publications
You can also search for this author in PubMed Google Scholar
Ming Wang
View author publications
You can also search for this author in PubMed Google Scholar
Boyu Hou
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Chengguang Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minghui Qiu .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Lu Wang
Peking University, Beijing, China
Yansong Feng
Soochow University, Suzhou, China
Yu Hong
Tianjin University, Tianjin, China
Ruifang He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Z. et al. (2021). When Few-Shot Learning Meets Large-Scale Knowledge-Enhanced Pre-training: Alibaba at FewCLUE. In: Wang, L., Feng, Y., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2021. Lecture Notes in Computer Science(), vol 13029. Springer, Cham. https://doi.org/10.1007/978-3-030-88483-3_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-88483-3_34
Published: 06 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88482-6
Online ISBN: 978-3-030-88483-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)