A Survey of Pretrained Language Models

Sun, Kaili; Luo, Xudong; Luo, Michael Y.

doi:10.1007/978-3-031-10986-7_36

Kaili Sun¹²,
Xudong Luo¹² &
Michael Y. Luo¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13369))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

3132 Accesses

Abstract

With the emergence of Pretrained Language Models (PLMs) and the success of large-scale PLMs such as BERT and GPT, the field of Natural Language Processing (NLP) has achieved tremendous development. Therefore, nowadays, PLMs have become an indispensable technique for solving problems in NLP. In this paper, we survey PLMs to help researchers quickly understand various PLMs and determine the appropriate ones for their specific NLP projects. Specifically, first, we brief on the main machine learning methods used by PLMs. Second, we explore early PLMs and discuss the main state-of-art PLMs. Third, we review several Chinese PLMs. Fourth, we compare the performance of some mainstream PLMs. Fifth, we outline the applications of PLMs. Finally, we give an outlook on the future development of PLMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Pre-trained models for natural language processing: A survey

Article 15 September 2020

Ascent of Pre-trained State-of-the-Art Language Models

A Review on Natural Language Processing: Back to Basics

Notes

References

Alnawas, A., Arici, N.: Effect of word embedding variable parameters on Arabic sentiment analysis performance. arXiv preprint arXiv:2101.02906 (2021)
Bao, H., et al.: UniLMv2: pseudo-masked language models for unified language model pre-training. In: Proceedings of the 37th International Conference on Machine Learning, pp. 642–652 (2020)
Google Scholar
Barlas, G., Stamatatos, E.: Cross-domain authorship attribution using pre-trained language models. In: Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds.) AIAI 2020. IAICT, vol. 583, pp. 255–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49161-1_22
Chapter Google Scholar
Boudjellal, N., et al.: ABioNER: a BERT-based model for Arabic biomedical named-entity recognition. Complexity 2021, 1–6 (2021)
Article Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Google Scholar
Chaudhari, S., Mithal, V., Polatkan, G., Ramanath, R.: An attentive survey of attention models. ACM Trans. Intell. Syst. Technol. 12(5), 1–32 (2021)
Article Google Scholar
Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)
Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G.: Revisiting pre-trained models for Chinese natural language processing. In: Findings of the Association for Computational Linguistics, EMNLP 2020, pp. 657–668 (2020)
Google Scholar
Cui, Y., Che, W., Liu, T., Qin, B., Yang, Z.: Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3504–3514 (2021)
Article Google Scholar
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
Google Scholar
Do, P., Phan, T.H.V.: Developing a BERT based triple classification model using knowledge graph embedding for question answering system. Appl. Intell. 52(1), 636–651 (2021). https://doi.org/10.1007/s10489-021-02460-w
Article Google Scholar
Dolan, B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of the 3rd International Workshop on Paraphrasing, pp. 9–16 (2005)
Google Scholar
Dong, L., et al.: Unified language model pre-training for natural language understanding and generation. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 13063–13075 (2019)
Google Scholar
El Boukkouri, H., Ferret, O., Lavergne, T., Noji, H., Zweigenbaum, P., Tsujii, J.: CharacterBERT: reconciling ELMo and BERT for word-level open-vocabulary representations from characters. In: Proceedings of the 18th International Conference on Computational Linguistics, pp. 6903–6915 (2020)
Google Scholar
Erhan, D., Courville, A., Bengio, Y., Vincent, P.: Why does unsupervised pre-training help deep learning? In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 201–208 (2010)
Google Scholar
Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems 28 (2015)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Jiao, X., et al.: TinyBERT: distilling BERT for natural language understanding. arXiv preprint arXiv:1909.10351 (2019)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Li, J., Tang, T., Zhao, W., Wen, J.: Pretrained language models for text generation: a survey. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, pp. 4492–4497 (2021)
Google Scholar
Li, L.H., Yatskar, M., Yin, D., Hsieh, C.J., Chang, K.W.: VisualBERT: a simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019)
Lin, Y., Wang, C., Song, H., Li, Y.: Multi-head self-attention transformation networks for aspect-based sentiment analysis. IEEE Access 9, 8762–8770 (2021)
Article Google Scholar
Liu, J., Wu, J., Luo, X.: Chinese judicial summarising based on short sentence extraction and GPT-2. In: Qiu, H., Zhang, C., Fei, Z., Qiu, M., Kung, S.-Y. (eds.) KSEM 2021. LNCS (LNAI), vol. 12816, pp. 376–393. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-82147-0_31
Chapter Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Meng, Z., Tian, S., Yu, L., Lv, Y.: Joint extraction of entities and relations based on character graph convolutional network and multi-head self-attention mechanism. J. Exp. Theoret. Artif. Intell. 33(2), 349–362 (2021)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Ouyang, L., et al.: Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155 (2022)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Google Scholar
Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2227–2237 (2018)
Google Scholar
Qiu, X.P., Sun, T.X., Xu, Y.G., Shao, Y.F., Dai, N., Huang, X.J.: Pre-trained models for natural language processing: a survey. Sci. Chin. Technol. Sci. 63(10), 1872–1897 (2020). https://doi.org/10.1007/s11431-020-1647-3
Article Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018). https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020)
Article MathSciNet Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Sun, Y., et al.: ERNIE 3.0: large-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2107.02137 (2021)
Sun, Y., et al.: ERNIE 2.0: a continual pre-training framework for language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8968–8975 (2020)
Google Scholar
Torrey, L., Shavlik, J.: Transfer learning. In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, pp. 242–264. IGI Global (2010)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010 (2017)
Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)
Wang, T., Lu, K., Chow, K.P., Zhu, Q.: COVID-19 sensing: negative sentiment analysis on social media in China via BERT model. IEEE Access 8, 138162–138169 (2020)
Article Google Scholar
Wang, W., et al.: StructBERT: incorporating language structures into pre-training for deep language understanding. arXiv preprint arXiv:1908.04577 (2019)
Xu, H., et al.: Pre-trained models: past, present and future. arXiv preprint arXiv:2106.07139 (2021)
Xu, L., et al.: CLUE: a Chinese language understanding evaluation benchmark. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 4762–4772 (2020)
Google Scholar
Yang, M., Xu, J., Luo, K., Zhang, Y.: Sentiment analysis of Chinese text based on Elmo-RNN model. J. Phys: Conf. Ser. 1748(2), 022033 (2021)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. Adv. Neural. Inf. Process. Syst. 32, 5753–5763 (2019)
Google Scholar
Yu, X., Feng, W., Wang, H., Chu, Q., Chen, Q.: An attention mechanism and multi-granularity-based Bi-LSTM model for Chinese Q &A system. Soft. Comput. 24(8), 5831–5845 (2019). https://doi.org/10.1007/s00500-019-04367-8
Article Google Scholar
Zhang, Z., Wu, S., Jiang, D., Chen, G.: BERT-JAM: maximizing the utilization of BERT for neural machine translation. Neurocomputing 460, 84–94 (2021)
Article Google Scholar
Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M., Liu, Q.: ERNIE: enhanced language representation with informative entities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1441–1451 (2019)
Google Scholar

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (No. 61762016) and the Graduate Student Innovation Project of School of Computer Science and Engineering, Guangxi Normal University (JXXYYJSCXXM-2021-001).

Author information

Authors and Affiliations

Guangxi Key Lab of Multi-Source Information Mining and Security, School of Computer Science and Engineering, Guangxi Normal University, Guilin, 541001, China
Kaili Sun & Xudong Luo
Emmanuel College, Cambridge University, Cambridge, CB2 3AP, UK
Michael Y. Luo

Authors

Kaili Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xudong Luo
View author publications
You can also search for this author in PubMed Google Scholar
Michael Y. Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xudong Luo .

Editor information

Editors and Affiliations

Télécom Paris, Paris, France
Gerard Memmi
Purdue University, West Lafayette, IN, USA
Baijian Yang
Shanghai Jiao Tong University, Shanghai, Shanghai, China
Linghe Kong
Nanyang Technological University, Singapore, Singapore
Tianwei Zhang
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, K., Luo, X., Luo, M.Y. (2022). A Survey of Pretrained Language Models. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13369. Springer, Cham. https://doi.org/10.1007/978-3-031-10986-7_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-10986-7_36
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10985-0
Online ISBN: 978-3-031-10986-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Survey of Pretrained Language Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Pre-trained models for natural language processing: A survey

Ascent of Pre-trained State-of-the-Art Language Models

A Review on Natural Language Processing: Back to Basics

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Survey of Pretrained Language Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Pre-trained models for natural language processing: A survey

Ascent of Pre-trained State-of-the-Art Language Models

A Review on Natural Language Processing: Back to Basics

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation