A New Method of Improving BERT for Text Classification

Zheng, Shaomin; Yang, Meng

doi:10.1007/978-3-030-36204-1_37

A New Method of Improving BERT for Text Classification

Shaomin Zheng¹³ &
Meng Yang¹³

Conference paper
First Online: 29 November 2019

3135 Accesses
21 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11936))

Abstract

Text classification is a basic task in natural language processing. Recently, pre-training models such as BERT have achieved outstanding results compared with previous methods. However, BERT fails to take into account local information in the text such as a sentence and a phrase. In this paper, we present a BERT-CNN model for text classification. By adding CNN to the task-specific layers of BERT model, our model can get the information of important fragments in the text. In addition, we input the local representation along with the output of the BERT into the transformer encoder in order to take advantage of the self-attention mechanism and finally get the representation of the whole text through transformer layer. Extensive experiments demonstrate that our model obtains competitive performance against state-of-the-art baselines on four benchmark datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Maas, A.L., et al.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1. Association for Computational Linguistics (2011)
Google Scholar
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM (2003)
Google Scholar
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2. Association for Computational Linguistics (2012)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems (2015)
Google Scholar
Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Conneau, A., et al.: Very deep convolutional networks for text classification. arXiv preprint arXiv:1606.01781 (2016)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
Huang, M., Qian, Q., Zhu, X.: Encoding syntactic knowledge in neural networks for sentiment classification. ACM Trans. Inf. Syst. (TOIS) 35(3), 26 (2017)
Article Google Scholar
Zhou, C., et al.: A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015)
Xiao, Y., Cho, K.: Efficient character-level document classification by combining convolution and recurrent layers. arXiv preprint arXiv:1602.00367 (2016)
Wang, B.: Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Long Papers, vol. 1 (2018)
Google Scholar
Yang, Z., et al.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)
Google Scholar
Wang, S., Huang, M., Deng, Z.: Densely connected CNN with multi-scale feature attention for text classification. In: IJCAI (2018)
Google Scholar
Lei, Z., et al.: A multi-sentiment-resource enhanced attention network for sentiment classification. arXiv preprint arXiv:1807.04990 (2018)
Shen, T., et al.: DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Shen, T., et al.: Bi-directional block self-attention for fast and memory-efficient sequence modeling. arXiv preprint arXiv:1804.00857 (2018)
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Radford, A., et al.: Improving language understanding by generative pre-training (2018). https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/languageunsupervised/languageunderstandingpaper.pdf
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018)
Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Long Papers, vol. 1 (2017)
Google Scholar
Li, S., et al.: Initializing convolutional filters with semantic features for text classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017)
Google Scholar
Xiao, L., et al.: Learning what to share: leaky multi-task network for text classification. In: Proceedings of the 27th International Conference on Computational Linguistics (2018)
Google Scholar
Rajpurkar, P., et al.: SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)
Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. arXiv preprint cs/0306050 (2003)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
Google Scholar
Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Joulin, A., et al.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Zhao, W., et al.: Investigating capsule networks with dynamic routing for text classification. arXiv preprint arXiv:1804.00538 (2018)

Download references

Acknowledgement

This work is partially supported by the National Natural Science Foundation of China (Grant no. 61772568), the Guangzhou Science and Technology Program (Grant no. 201804010288), and the Fundamental Research Funds for the Central Universities (Grant no. 18lgzd15).

Author information

Authors and Affiliations

School of Data and Computer Science, Sun yat-sen University, Guangzhou, Guangdong, China
Shaomin Zheng & Meng Yang

Authors

Shaomin Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Meng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Yang .

Editor information

Editors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Zhen Cui
Nanjing University of Science and Technology, Nanjing, China
Jinshan Pan
Nanjing University of Science and Technology, Nanjing, China
Shanshan Zhang
Nanjing University of Science and Technology, Nanjing, China
Liang Xiao
Nanjing University of Science and Technology, Nanjing, China
Jian Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, S., Yang, M. (2019). A New Method of Improving BERT for Text Classification. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Big Data and Machine Learning. IScIDE 2019. Lecture Notes in Computer Science(), vol 11936. Springer, Cham. https://doi.org/10.1007/978-3-030-36204-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-36204-1_37
Published: 29 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36203-4
Online ISBN: 978-3-030-36204-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics