Abstract
BERT, a pre-trained language model on the large-scale corpus, has made breakthrough progress in NLP tasks. However, the experimental data shows that the BERT model’s application effect in Chinese tasks is not ideal. The reason is that we believe that only character-level embedding can be obtained through BERT. However, a single Chinese character often cannot express their comprehensive meaning. To improve the model’s ability to understand phrase-level semantic information, this paper proposes an enhanced BERT based on the average pooling(AP-BERT). Our model uses an average pooling layer to act on token embedding and reconstructs the model’s input embedding, which can effectively improve BERT’s application effect in Chinese natural language processing. Experimental data show that our proposed method has been enhanced in the four tasks of Chinese text classification, named entity recognition, reading comprehension, and summary generation. This method can not only improve the application effect of the BERT model in Chinese tasks but also can be well applied to other pre-trained language models.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03190-3/MediaObjects/10489_2022_3190_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03190-3/MediaObjects/10489_2022_3190_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-022-03190-3/MediaObjects/10489_2022_3190_Fig3_HTML.png)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Liu X, Cheng H, He P, Chen W, Wang Y, Poon H, Gao J (2020) Adversarial training for large neural language models. arXiv:2004.08994
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Gu Y, Yang M, Lin P (2020) Lightweight Multiple Perspective Fusion with Information Enriching for BERT-based Answer Selection International Conference on Natural Language Processing and Chinese Computing. Springer, Cham
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. Inproceedings of NAACL
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In: International conference on learning representations
Mikolov T, Chen K, Corrado G, Dean J (2013) Effificient estimation of word representations in vector space. arXiv:1301.3781
Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. Inproceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:, Long Papers), pp 328–339
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. Inproceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
McCann B, Bradbury J, Xiong C, Socher R (2017) Learned in translation: Contextualized word vectors. In: Advances in neural In formation processing systems, pp 6294–6305
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of NAACL-HLT, pp 2227–2237
Brown BT, Mann B, Ryder N (2019) Language Models are Few-Shot Learners. arXiv:2005.14165
Yang X, Tang K, Zhang H, Cai J (2019) Auto-encoding scene graphs for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer vision and pattern recognition, pp 10685–10694
Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3730–3740
Gu J, Shen Y, Zhou B (2020) Image processing using multi-code gan prior. In: Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3012–3021
Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X (2020) Pre-trained models for natural language processing: a survey. Sci China Technol Sci, 1–26
Joseph T, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. Published as a conference paper at ACL
Kaliyev A, Rybin SV, Matveev Y (2017) The pausing method based on brown clustering and word embedding. In: International conference on speech and computer
Bengio Y, Ducharme R, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Collobert R, Weston J (2007) Fast Semantic Extraction Using a Novel Neural Network Architecture. Published as a conference paper at ACL
Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources. In: proceedings of COLING, vol 2014, pp 497–507
Chopra S (2016) Abstractive sentence summarization with attentive recurrent neural networks. The Association for Computational Linguistics 6:93–98
Chen Y-C (2018) Fast Abstractive Summarization with Reinforce Selected Sentence Rewriting. The Association for Computational Linguistics
Vaswani A, Shazeer N, Polosukhin I (2017) Attention is all you need. InNeurIPS
Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) ERNIE: enhanced language representation with informative entities. In: ACL
Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2019) SpanBERT:, Improving Pre-training by Representing and Predicting Spans. arXiv:1907.10529
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Stoyanov V (2019) Roberta:, A robustly optimized bert pretraining approach. arXiv:1907.11692
Yang Z, Dai Z, Le QV (2019) XLNEt: Generalized autoregressive pretraining for language understanding. In: NeurIPS, pp 5754–5764
Lauscher A, Vulic I, Glavas G (2019) Informing unsupervised pre-training with external linguistic knowledge. arXiv:1909.02339
Peters ME, Smith NA (2019) Knowledge enhanced contextual word representations. In: EMNLP-IJCNLP
Liu W, Zhou P, Wang P (2019) K-BERT: Enabling language representation with knowledge graph. In: AAAI
Liu Y, Gu J, Goyal N, Li X, Edunov S, Ghazvininejad M, Zettlemoyer L (2020) Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics 8:726–742
Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT:, smaller, faster,cheaper and lighter. arXiv:1910.01108
Shen S, Dong Z, Keutzer K (2020) Q-BERT: Hessian based ultra low precision quantization of BERT. In: AAAI
Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Liu Q (2020) TinyBERT: distilling BERT for natural language understanding. In: Proceedings of the 2020 conference on empirical methods in natural language processing:, Findings, pp 4163–4174
Zhou D-X (2020) Theory of deep convolutional neural networks: downsampling. Neural Netw 124:319–327
Huang Z (2019) Mask Scoring R-CNN. Published as a conference paper at CVPR
Xie H, Shi F, Wang D, et al. (2018) A novel attention based CNN model for emotion intensity prediction international conference on natural language processing and chinese computing. Springer, Cham
Yin R, Wang Q, Li P, Li R, Wang B (2016) Multi-granularity chinese word embedding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 981– 986
Su T-R, Lee H-Y (2017) Learning chinese word representations from glyphs of characters. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 264– 273
Lin JC-W, Shao Y, Djenouri Y, Yun U (2021) As RNN: a recurrent neural network with an attention model for sequence labeling. Knowl-Based Syst 212:106548
Li J, Sun M (2007) Scalable term selection for text categorization. EMNLP-CoNLL, pp 774–782
Levow G-A (2006) The third international chinese language processing bakeoff: word segmentation and named entity recognition. Association for Computational Linguistics, pp 108–117
Shao C C, Liu T, Lai Y, Tseng Y, Tsai S (2019) DRCD:, a Chinese Machine Reading Comprehension Dataset. arXiv:1806.00920
Yudong, L, Chinese scientific literature dataset. https://github.com/P01son6415/CSL
Kim Y (2014) Convolutional Neural Networks for Sentence Classification
Lai S, Xu L (2015) Recurrent Convolutional Neural Networks for Text Classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence
Joulin A, Grave E, Bojanowski Mikolov T (2016) Fasttext. zip:, Compressing text classification models. arXiv:1612.03651
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang Y, Sun Y, Ma Z, Gao L, Xu Y, Sun T (2020) Application of pre-training models in named entity recognition. In: 2020 12th international conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC) (Vol. 1, pp. 23–26). IEEE
Wang Y, Ru LI, Zhang H, et al. (2018) Causal options in Chinese reading comprehension[J] Journal of Tsinghua University(Science and Technology)
Wang W, Yang N (2017) Gated Self-Matching networks for reading comprehension and question answering. I Association for Computational Linguistics, pp 189–198
He W, Liu K, Liu J, Lyu Y, Zhao S, Wang H (2018) Dureader: a Chinese machine reading comprehension dataset from real-world applications. In: Proceedings of the workshop on machine reading for question answering, pp 37–46
Yu AW, Dohan D, Luong MT, Zhao R, Chen K, Norouzi M, Le QV (2018) QANet: combining local convolution with global self-attention for reading comprehension. In: International conference on learning representations
Acknowledgments
This research is supported by the National Natural Science Foundation of China (Grant No.61773229), the Beijing Municipal Natural Science Foundation (Grant No.KZ201710015010).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhao, S., Zhang, T., Hu, M. et al. AP-BERT: enhanced pre-trained model through average pooling. Appl Intell 52, 15929–15937 (2022). https://doi.org/10.1007/s10489-022-03190-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03190-3