AP-BERT: enhanced pre-trained model through average pooling

Zhao, Shuai; Zhang, Tianyu; Hu, Man; Chang, Wen; You, Fucheng

doi:10.1007/s10489-022-03190-3

AP-BERT: enhanced pre-trained model through average pooling

Published: 19 March 2022

Volume 52, pages 15929–15937, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shuai Zhao ORCID: orcid.org/0000-0001-5174-5182¹,
Tianyu Zhang²,
Man Hu²,
Wen Chang² &
…
Fucheng You²

1083 Accesses
1 Altmetric
Explore all metrics

Abstract

BERT, a pre-trained language model on the large-scale corpus, has made breakthrough progress in NLP tasks. However, the experimental data shows that the BERT model’s application effect in Chinese tasks is not ideal. The reason is that we believe that only character-level embedding can be obtained through BERT. However, a single Chinese character often cannot express their comprehensive meaning. To improve the model’s ability to understand phrase-level semantic information, this paper proposes an enhanced BERT based on the average pooling(AP-BERT). Our model uses an average pooling layer to act on token embedding and reconstructs the model’s input embedding, which can effectively improve BERT’s application effect in Chinese natural language processing. Experimental data show that our proposed method has been enhanced in the four tasks of Chinese text classification, named entity recognition, reading comprehension, and summary generation. This method can not only improve the application effect of the BERT model in Chinese tasks but also can be well applied to other pre-trained language models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models

Pretraining without wordpieces: learning over a vocabulary of millions of words

Article 31 May 2024

Simple Flow-Based Contrastive Learning for BERT Sentence Representations

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Liu X, Cheng H, He P, Chen W, Wang Y, Poon H, Gao J (2020) Adversarial training for large neural language models. arXiv:2004.08994
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Gu Y, Yang M, Lin P (2020) Lightweight Multiple Perspective Fusion with Information Enriching for BERT-based Answer Selection International Conference on Natural Language Processing and Chinese Computing. Springer, Cham
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. Inproceedings of NAACL
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In: International conference on learning representations
Mikolov T, Chen K, Corrado G, Dean J (2013) Effificient estimation of word representations in vector space. arXiv:1301.3781
Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. Inproceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:, Long Papers), pp 328–339
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. Inproceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
McCann B, Bradbury J, Xiong C, Socher R (2017) Learned in translation: Contextualized word vectors. In: Advances in neural In formation processing systems, pp 6294–6305
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of NAACL-HLT, pp 2227–2237
Brown BT, Mann B, Ryder N (2019) Language Models are Few-Shot Learners. arXiv:2005.14165
Yang X, Tang K, Zhang H, Cai J (2019) Auto-encoding scene graphs for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer vision and pattern recognition, pp 10685–10694
Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3730–3740
Gu J, Shen Y, Zhou B (2020) Image processing using multi-code gan prior. In: Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3012–3021
Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X (2020) Pre-trained models for natural language processing: a survey. Sci China Technol Sci, 1–26
Joseph T, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. Published as a conference paper at ACL
Kaliyev A, Rybin SV, Matveev Y (2017) The pausing method based on brown clustering and word embedding. In: International conference on speech and computer
Bengio Y, Ducharme R, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
Collobert R, Weston J (2007) Fast Semantic Extraction Using a Novel Neural Network Architecture. Published as a conference paper at ACL
Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources. In: proceedings of COLING, vol 2014, pp 497–507
Chopra S (2016) Abstractive sentence summarization with attentive recurrent neural networks. The Association for Computational Linguistics 6:93–98
Google Scholar
Chen Y-C (2018) Fast Abstractive Summarization with Reinforce Selected Sentence Rewriting. The Association for Computational Linguistics
Vaswani A, Shazeer N, Polosukhin I (2017) Attention is all you need. InNeurIPS
Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) ERNIE: enhanced language representation with informative entities. In: ACL
Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2019) SpanBERT:, Improving Pre-training by Representing and Predicting Spans. arXiv:1907.10529
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Stoyanov V (2019) Roberta:, A robustly optimized bert pretraining approach. arXiv:1907.11692
Yang Z, Dai Z, Le QV (2019) XLNEt: Generalized autoregressive pretraining for language understanding. In: NeurIPS, pp 5754–5764
Lauscher A, Vulic I, Glavas G (2019) Informing unsupervised pre-training with external linguistic knowledge. arXiv:1909.02339
Peters ME, Smith NA (2019) Knowledge enhanced contextual word representations. In: EMNLP-IJCNLP
Liu W, Zhou P, Wang P (2019) K-BERT: Enabling language representation with knowledge graph. In: AAAI
Liu Y, Gu J, Goyal N, Li X, Edunov S, Ghazvininejad M, Zettlemoyer L (2020) Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics 8:726–742
Article Google Scholar
Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT:, smaller, faster,cheaper and lighter. arXiv:1910.01108
Shen S, Dong Z, Keutzer K (2020) Q-BERT: Hessian based ultra low precision quantization of BERT. In: AAAI
Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Liu Q (2020) TinyBERT: distilling BERT for natural language understanding. In: Proceedings of the 2020 conference on empirical methods in natural language processing:, Findings, pp 4163–4174
Zhou D-X (2020) Theory of deep convolutional neural networks: downsampling. Neural Netw 124:319–327
Article MATH Google Scholar
Huang Z (2019) Mask Scoring R-CNN. Published as a conference paper at CVPR
Xie H, Shi F, Wang D, et al. (2018) A novel attention based CNN model for emotion intensity prediction international conference on natural language processing and chinese computing. Springer, Cham
Yin R, Wang Q, Li P, Li R, Wang B (2016) Multi-granularity chinese word embedding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 981– 986
Su T-R, Lee H-Y (2017) Learning chinese word representations from glyphs of characters. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 264– 273
Lin JC-W, Shao Y, Djenouri Y, Yun U (2021) As RNN: a recurrent neural network with an attention model for sequence labeling. Knowl-Based Syst 212:106548
Article Google Scholar
Li J, Sun M (2007) Scalable term selection for text categorization. EMNLP-CoNLL, pp 774–782
Levow G-A (2006) The third international chinese language processing bakeoff: word segmentation and named entity recognition. Association for Computational Linguistics, pp 108–117
Shao C C, Liu T, Lai Y, Tseng Y, Tsai S (2019) DRCD:, a Chinese Machine Reading Comprehension Dataset. arXiv:1806.00920
Yudong, L, Chinese scientific literature dataset. https://github.com/P01son6415/CSL
Kim Y (2014) Convolutional Neural Networks for Sentence Classification
Lai S, Xu L (2015) Recurrent Convolutional Neural Networks for Text Classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence
Joulin A, Grave E, Bojanowski Mikolov T (2016) Fasttext. zip:, Compressing text classification models. arXiv:1612.03651
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang Y, Sun Y, Ma Z, Gao L, Xu Y, Sun T (2020) Application of pre-training models in named entity recognition. In: 2020 12th international conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC) (Vol. 1, pp. 23–26). IEEE
Wang Y, Ru LI, Zhang H, et al. (2018) Causal options in Chinese reading comprehension[J] Journal of Tsinghua University(Science and Technology)
Wang W, Yang N (2017) Gated Self-Matching networks for reading comprehension and question answering. I Association for Computational Linguistics, pp 189–198
He W, Liu K, Liu J, Lyu Y, Zhao S, Wang H (2018) Dureader: a Chinese machine reading comprehension dataset from real-world applications. In: Proceedings of the workshop on machine reading for question answering, pp 37–46
Yu AW, Dohan D, Luong MT, Zhao R, Chen K, Norouzi M, Le QV (2018) QANet: combining local convolution with global self-attention for reading comprehension. In: International conference on learning representations

Download references

Acknowledgments

This research is supported by the National Natural Science Foundation of China (Grant No.61773229), the Beijing Municipal Natural Science Foundation (Grant No.KZ201710015010).

Author information

Authors and Affiliations

Jinan University, Guangzhou, China
Shuai Zhao
Beijing Institute of Graphic Communication, Beijing, China
Tianyu Zhang, Man Hu, Wen Chang & Fucheng You

Authors

Shuai Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Tianyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Man Hu
View author publications
You can also search for this author in PubMed Google Scholar
Wen Chang
View author publications
You can also search for this author in PubMed Google Scholar
Fucheng You
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuai Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, S., Zhang, T., Hu, M. et al. AP-BERT: enhanced pre-trained model through average pooling. Appl Intell 52, 15929–15937 (2022). https://doi.org/10.1007/s10489-022-03190-3

Download citation

Accepted: 04 January 2022
Published: 19 March 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s10489-022-03190-3

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AP-BERT: enhanced pre-trained model through average pooling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models

Pretraining without wordpieces: learning over a vocabulary of millions of words

Simple Flow-Based Contrastive Learning for BERT Sentence Representations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Subscribe and save

Buy Now

Navigation

AP-BERT: enhanced pre-trained model through average pooling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models

Pretraining without wordpieces: learning over a vocabulary of millions of words

Simple Flow-Based Contrastive Learning for BERT Sentence Representations

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Subscribe and save

Buy Now

Search

Navigation