Abstract
The transformer-based models have become the de-facto standard for natural language processing (NLP) tasks. However, most of these models are only designed to capture the implicit semantics among tokens without considering the extra off-the-shelf knowledge (e.g., parts-of-speech) to facilitate the NLP tasks. Additionally, despite using multiple attention-based encoders, they only utilize the embeddings from the last layer, ignoring that from other layers. To address these issues, in this paper, we propose a novel POS-aware and layer ensemble transformer neural network (named as PL-Transformer). PL-Transformer utilizes the parts-of-speech information explicitly and leverages the outputs from different encoder layers with correlation coefficient attention (C-Encoder) jointly. Moreover, we use correlation coefficient attention to bound dot product in C-Encoder, which improves the overall model performance. Extensive experiments on four datasets demonstrate that PL-Transformer can improve the text classification performance. For example, the accuracy on the MPQA dataset is improved by 3.95% over the vanilla transformer.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning-based text classification: a comprehensive review. ACM Comput Surv (CSUR) 54(3):1–40
Tang D, Wei F, Qin B, Dong L, Liu T, Zhou M (2014) A joint segmentation and classification framework for sentiment analysis. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 477–487
Zhang D, Lee WS (2003) Question classification using support vector machines. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 26–32
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(Nov):45–66
Kim Y (2014) Convolutional neural networks for sentence classification. In: EMNLP
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, pp 2873–2879
Zhang J, Luan H, Sun M, Zhai F, Xu J, Zhang M, Liu Y (2018) Improving the transformer translation model with document-level context. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 533–542
Zhong P, Wang D, Miao C (2019) Knowledge-enriched transformer for emotion detection in textual conversations. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 165–176
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang Y, Tian L, Li C (2020) An improved text classification model based on memory convolution neural network. In: Proceedings of the 2020 6th international conference on computing and artificial intelligence, pp 19–23
Dwivedi SK, Arya C (2016) Automatic text classification in information retrieval: a survey. In: Proceedings of the second international conference on information and communication technology for competitive strategies, pp 1–6
Pawar PY, Gawande S (2012) A comparative study on different types of approaches to text categorization. Int J Mach Learn Comput 2(4):423
Buabin E (2012) Boosted hybrid recurrent neural classifier for text document classification on the reuters news text corpus. Int J Mach Learn Comput 2(5):588
McCallum A, Nigam K, (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol. 752. Citeseer, pp 41–48
Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the seventh international conference on information and knowledge management, pp 148–155
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: European conference on machine learning. Springer, pp 137–142
Liu Y, Xu Q, Wang C (2020) Text classification based on title semantic information. In: 2020 5th international conference on intelligent informatics and biomedical sciences (ICIIBMS). IEEE, pp 29–33
Ren J, Wu W, Liu G, Chen Z, Wang R (2021) Bidirectional gated temporal convolution with attention for text classification. Neurocomputing 455:265–273
Guo X, Lai H, Xiang Y, Yu Z, Huang Y (2021) Emotion classification of covid-19 chinese microblogs based on the emotion category description. In: China national conference on chinese computational linguistics. Springer, pp 61–76
Wang B (2018) Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 2311–2320
Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of cnn and rnn for natural language processing. arXiv:1702.01923
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: short papers), pp 207–212
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 1480–1489
Fang W, Luo H, Xu S, Love PE, Lu Z, Ye C (2020) Automated text classification of near-misses from safety reports: An improved deep learning approach. Adv Eng Inform 44:101060
Dehghani M, Gouws S, Vinyals O, Uszkoreit J, Kaiser L (2018) Universal transformers. In: International conference on learning representations
Guo Q, Qiu X, Liu P, Shao Y, Xue X, Zhang Z (2019) Star-transformer. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologies, volume 1 (Long and Short Papers), pp 1315–1325
Tian Y, Chen G, Song Y (2021) Enhancing aspect-level sentiment analysis with word dependencies. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp 3726–3739
Haspelmath M (2001) Word classes and parts of speech
Kornfilt J (2020) Parts of speech, lexical categories, and word classes in morphology. In: Oxford research encyclopedia of linguistics
Li L, Zhao T, Xie Y, Feng Y (2020) Interpretable machine learning based on integration of nlp and psychology in peer-to-peer lending risk evaluation. In: CCF International conference on natural language processing and chinese computing. Springer, pp 429–441
Yin R, Wang Q, Li P, Li R, Wang B (2016) Multi-granularity chinese word embedding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 981–986
Diao S, Bai J, Song Y, Zhang T, Wang Y (2020) Zen: pre-training chinese text encoder enhanced by n-gram representations. In: Proceedings of the 2020 conference on empirical methods in natural language processing: findings, pp 4729–4740
Luo C, Zhan J, Xue X, Wang L, Ren R, Yang Q (2018) Cosine normalization: using cosine similarity instead of dot product in neural networks. In: International conference on artificial neural networks. Springer, pp 382–391
Sun M, Li J, Guo Z, Yu Z, Zheng Y, Si X, Liu Z (2016) Thuctc: an efficient chinese text classifier. GitHub Repository
Song C, Yang C, Chen H, Tu C, Liu Z, Sun M (2019) Ced: Credible early detection of social media rumors. IEEE Trans Knowl Data Eng
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05), pp 115–124
Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2):165–210
Lin T-Y, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: CVPR
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(09):1904–1916
Wang W, Tu Z (2020) Rethinking the value of transformer components. In: Proceedings of the 28th international conference on computational linguistics, pp 6019–6029
Ebbinghaus H (1913) Memory (ha ruger & ce bussenius, trans.), vol 39. Teachers College, New York (Original work published 1885)
Zydney W (2018) Present-bias and salience in discounting across short durations: a proposed experimental approach
Cornia M, Stefanini M, Baraldi L, Cucchiara R (2020) Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Li P, Zhong P, Mao K, Wang D, Yang X, Liu Y, Yin J, See S (2021) Act: an attentive convolutional transformer for efficient text classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 13261–13269
Zheng S, Yang M (2019) A new method of improving bert for text classification. In: International conference on intelligent science and big data engineering. Springer, pp 442–452
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University, Princeton
Acknowledgements
This work was supported by the Natural Science Foundation of China (No. 61976026) and the 111 Project (B18008)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shi, Y., Zhang, X. & Yu, N. PL-Transformer: a POS-aware and layer ensemble transformer for text classification. Neural Comput & Applic 35, 1971–1982 (2023). https://doi.org/10.1007/s00521-022-07872-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07872-4