Skip to main content

Advertisement

Log in

AP-BERT: enhanced pre-trained model through average pooling

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

BERT, a pre-trained language model on the large-scale corpus, has made breakthrough progress in NLP tasks. However, the experimental data shows that the BERT model’s application effect in Chinese tasks is not ideal. The reason is that we believe that only character-level embedding can be obtained through BERT. However, a single Chinese character often cannot express their comprehensive meaning. To improve the model’s ability to understand phrase-level semantic information, this paper proposes an enhanced BERT based on the average pooling(AP-BERT). Our model uses an average pooling layer to act on token embedding and reconstructs the model’s input embedding, which can effectively improve BERT’s application effect in Chinese natural language processing. Experimental data show that our proposed method has been enhanced in the four tasks of Chinese text classification, named entity recognition, reading comprehension, and summary generation. This method can not only improve the application effect of the BERT model in Chinese tasks but also can be well applied to other pre-trained language models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Liu X, Cheng H, He P, Chen W, Wang Y, Poon H, Gao J (2020) Adversarial training for large neural language models. arXiv:2004.08994

  2. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training

  3. Gu Y, Yang M, Lin P (2020) Lightweight Multiple Perspective Fusion with Information Enriching for BERT-based Answer Selection International Conference on Natural Language Processing and Chinese Computing. Springer, Cham

  4. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. Inproceedings of NAACL

  5. Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In: International conference on learning representations

  6. Mikolov T, Chen K, Corrado G, Dean J (2013) Effificient estimation of word representations in vector space. arXiv:1301.3781

  7. Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. Inproceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:, Long Papers), pp 328–339

  8. Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. Inproceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  9. McCann B, Bradbury J, Xiong C, Socher R (2017) Learned in translation: Contextualized word vectors. In: Advances in neural In formation processing systems, pp 6294–6305

  10. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of NAACL-HLT, pp 2227–2237

  11. Brown BT, Mann B, Ryder N (2019) Language Models are Few-Shot Learners. arXiv:2005.14165

  12. Yang X, Tang K, Zhang H, Cai J (2019) Auto-encoding scene graphs for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer vision and pattern recognition, pp 10685–10694

  13. Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3730–3740

  14. Gu J, Shen Y, Zhou B (2020) Image processing using multi-code gan prior. In: Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3012–3021

  15. Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X (2020) Pre-trained models for natural language processing: a survey. Sci China Technol Sci, 1–26

  16. Joseph T, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. Published as a conference paper at ACL

  17. Kaliyev A, Rybin SV, Matveev Y (2017) The pausing method based on brown clustering and word embedding. In: International conference on speech and computer

  18. Bengio Y, Ducharme R, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  19. Collobert R, Weston J (2007) Fast Semantic Extraction Using a Novel Neural Network Architecture. Published as a conference paper at ACL

  20. Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources. In: proceedings of COLING, vol 2014, pp 497–507

  21. Chopra S (2016) Abstractive sentence summarization with attentive recurrent neural networks. The Association for Computational Linguistics 6:93–98

    Google Scholar 

  22. Chen Y-C (2018) Fast Abstractive Summarization with Reinforce Selected Sentence Rewriting. The Association for Computational Linguistics

  23. Vaswani A, Shazeer N, Polosukhin I (2017) Attention is all you need. InNeurIPS

  24. Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) ERNIE: enhanced language representation with informative entities. In: ACL

  25. Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2019) SpanBERT:, Improving Pre-training by Representing and Predicting Spans. arXiv:1907.10529

  26. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Stoyanov V (2019) Roberta:, A robustly optimized bert pretraining approach. arXiv:1907.11692

  27. Yang Z, Dai Z, Le QV (2019) XLNEt: Generalized autoregressive pretraining for language understanding. In: NeurIPS, pp 5754–5764

  28. Lauscher A, Vulic I, Glavas G (2019) Informing unsupervised pre-training with external linguistic knowledge. arXiv:1909.02339

  29. Peters ME, Smith NA (2019) Knowledge enhanced contextual word representations. In: EMNLP-IJCNLP

  30. Liu W, Zhou P, Wang P (2019) K-BERT: Enabling language representation with knowledge graph. In: AAAI

  31. Liu Y, Gu J, Goyal N, Li X, Edunov S, Ghazvininejad M, Zettlemoyer L (2020) Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics 8:726–742

    Article  Google Scholar 

  32. Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT:, smaller, faster,cheaper and lighter. arXiv:1910.01108

  33. Shen S, Dong Z, Keutzer K (2020) Q-BERT: Hessian based ultra low precision quantization of BERT. In: AAAI

  34. Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Liu Q (2020) TinyBERT: distilling BERT for natural language understanding. In: Proceedings of the 2020 conference on empirical methods in natural language processing:, Findings, pp 4163–4174

  35. Zhou D-X (2020) Theory of deep convolutional neural networks: downsampling. Neural Netw 124:319–327

    Article  MATH  Google Scholar 

  36. Huang Z (2019) Mask Scoring R-CNN. Published as a conference paper at CVPR

  37. Xie H, Shi F, Wang D, et al. (2018) A novel attention based CNN model for emotion intensity prediction international conference on natural language processing and chinese computing. Springer, Cham

  38. Yin R, Wang Q, Li P, Li R, Wang B (2016) Multi-granularity chinese word embedding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 981– 986

  39. Su T-R, Lee H-Y (2017) Learning chinese word representations from glyphs of characters. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 264– 273

  40. Lin JC-W, Shao Y, Djenouri Y, Yun U (2021) As RNN: a recurrent neural network with an attention model for sequence labeling. Knowl-Based Syst 212:106548

    Article  Google Scholar 

  41. Li J, Sun M (2007) Scalable term selection for text categorization. EMNLP-CoNLL, pp 774–782

  42. Levow G-A (2006) The third international chinese language processing bakeoff: word segmentation and named entity recognition. Association for Computational Linguistics, pp 108–117

  43. Shao C C, Liu T, Lai Y, Tseng Y, Tsai S (2019) DRCD:, a Chinese Machine Reading Comprehension Dataset. arXiv:1806.00920

  44. Yudong, L, Chinese scientific literature dataset. https://github.com/P01son6415/CSL

  45. Kim Y (2014) Convolutional Neural Networks for Sentence Classification

  46. Lai S, Xu L (2015) Recurrent Convolutional Neural Networks for Text Classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

  47. Joulin A, Grave E, Bojanowski Mikolov T (2016) Fasttext. zip:, Compressing text classification models. arXiv:1612.03651

  48. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  49. Wang Y, Sun Y, Ma Z, Gao L, Xu Y, Sun T (2020) Application of pre-training models in named entity recognition. In: 2020 12th international conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC) (Vol. 1, pp. 23–26). IEEE

  50. Wang Y, Ru LI, Zhang H, et al. (2018) Causal options in Chinese reading comprehension[J] Journal of Tsinghua University(Science and Technology)

  51. Wang W, Yang N (2017) Gated Self-Matching networks for reading comprehension and question answering. I Association for Computational Linguistics, pp 189–198

  52. He W, Liu K, Liu J, Lyu Y, Zhao S, Wang H (2018) Dureader: a Chinese machine reading comprehension dataset from real-world applications. In: Proceedings of the workshop on machine reading for question answering, pp 37–46

  53. Yu AW, Dohan D, Luong MT, Zhao R, Chen K, Norouzi M, Le QV (2018) QANet: combining local convolution with global self-attention for reading comprehension. In: International conference on learning representations

Download references

Acknowledgments

This research is supported by the National Natural Science Foundation of China (Grant No.61773229), the Beijing Municipal Natural Science Foundation (Grant No.KZ201710015010).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuai Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, S., Zhang, T., Hu, M. et al. AP-BERT: enhanced pre-trained model through average pooling. Appl Intell 52, 15929–15937 (2022). https://doi.org/10.1007/s10489-022-03190-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03190-3

Keywords

Mathematics Subject Classification (2010)

Navigation