Skip to main content
Log in

PL-Transformer: a POS-aware and layer ensemble transformer for text classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The transformer-based models have become the de-facto standard for natural language processing (NLP) tasks. However, most of these models are only designed to capture the implicit semantics among tokens without considering the extra off-the-shelf knowledge (e.g., parts-of-speech) to facilitate the NLP tasks. Additionally, despite using multiple attention-based encoders, they only utilize the embeddings from the last layer, ignoring that from other layers. To address these issues, in this paper, we propose a novel POS-aware and layer ensemble transformer neural network (named as PL-Transformer). PL-Transformer utilizes the parts-of-speech information explicitly and leverages the outputs from different encoder layers with correlation coefficient attention (C-Encoder) jointly. Moreover, we use correlation coefficient attention to bound dot product in C-Encoder, which improves the overall model performance. Extensive experiments on four datasets demonstrate that PL-Transformer can improve the text classification performance. For example, the accuracy on the MPQA dataset is improved by 3.95% over the vanilla transformer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://github.com/interpretml/interpret-text.

  2. https://github.com/hankcs/pyhanlp.

  3. http://www.nltk.org/.

  4. https://github.com/Embedding/Chinese-Word-Vectors.

  5. https://code.google.com/archive/p/word2vec/.

  6. https://huggingface.co/bert-base-uncased.

References

  1. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  2. Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning-based text classification: a comprehensive review. ACM Comput Surv (CSUR) 54(3):1–40

    Article  Google Scholar 

  3. Tang D, Wei F, Qin B, Dong L, Liu T, Zhou M (2014) A joint segmentation and classification framework for sentiment analysis. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 477–487

  4. Zhang D, Lee WS (2003) Question classification using support vector machines. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 26–32

  5. Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(Nov):45–66

    MATH  Google Scholar 

  6. Kim Y (2014) Convolutional neural networks for sentence classification. In: EMNLP

  7. Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, pp 2873–2879

  8. Zhang J, Luan H, Sun M, Zhai F, Xu J, Zhang M, Liu Y (2018) Improving the transformer translation model with document-level context. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 533–542

  9. Zhong P, Wang D, Miao C (2019) Knowledge-enriched transformer for emotion detection in textual conversations. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 165–176

  10. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  11. Wang Y, Tian L, Li C (2020) An improved text classification model based on memory convolution neural network. In: Proceedings of the 2020 6th international conference on computing and artificial intelligence, pp 19–23

  12. Dwivedi SK, Arya C (2016) Automatic text classification in information retrieval: a survey. In: Proceedings of the second international conference on information and communication technology for competitive strategies, pp 1–6

  13. Pawar PY, Gawande S (2012) A comparative study on different types of approaches to text categorization. Int J Mach Learn Comput 2(4):423

    Article  Google Scholar 

  14. Buabin E (2012) Boosted hybrid recurrent neural classifier for text document classification on the reuters news text corpus. Int J Mach Learn Comput 2(5):588

    Article  Google Scholar 

  15. McCallum A, Nigam K, (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol. 752. Citeseer, pp 41–48

  16. Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the seventh international conference on information and knowledge management, pp 148–155

  17. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: European conference on machine learning. Springer, pp 137–142

  18. Liu Y, Xu Q, Wang C (2020) Text classification based on title semantic information. In: 2020 5th international conference on intelligent informatics and biomedical sciences (ICIIBMS). IEEE, pp 29–33

  19. Ren J, Wu W, Liu G, Chen Z, Wang R (2021) Bidirectional gated temporal convolution with attention for text classification. Neurocomputing 455:265–273

    Article  Google Scholar 

  20. Guo X, Lai H, Xiang Y, Yu Z, Huang Y (2021) Emotion classification of covid-19 chinese microblogs based on the emotion category description. In: China national conference on chinese computational linguistics. Springer, pp 61–76

  21. Wang B (2018) Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 2311–2320

  22. Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of cnn and rnn for natural language processing. arXiv:1702.01923

  23. Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: short papers), pp 207–212

  24. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 1480–1489

  25. Fang W, Luo H, Xu S, Love PE, Lu Z, Ye C (2020) Automated text classification of near-misses from safety reports: An improved deep learning approach. Adv Eng Inform 44:101060

    Article  Google Scholar 

  26. Dehghani M, Gouws S, Vinyals O, Uszkoreit J, Kaiser L (2018) Universal transformers. In: International conference on learning representations

  27. Guo Q, Qiu X, Liu P, Shao Y, Xue X, Zhang Z (2019) Star-transformer. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologies, volume 1 (Long and Short Papers), pp 1315–1325

  28. Tian Y, Chen G, Song Y (2021) Enhancing aspect-level sentiment analysis with word dependencies. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp 3726–3739

  29. Haspelmath M (2001) Word classes and parts of speech

  30. Kornfilt J (2020) Parts of speech, lexical categories, and word classes in morphology. In: Oxford research encyclopedia of linguistics

  31. Li L, Zhao T, Xie Y, Feng Y (2020) Interpretable machine learning based on integration of nlp and psychology in peer-to-peer lending risk evaluation. In: CCF International conference on natural language processing and chinese computing. Springer, pp 429–441

  32. Yin R, Wang Q, Li P, Li R, Wang B (2016) Multi-granularity chinese word embedding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 981–986

  33. Diao S, Bai J, Song Y, Zhang T, Wang Y (2020) Zen: pre-training chinese text encoder enhanced by n-gram representations. In: Proceedings of the 2020 conference on empirical methods in natural language processing: findings, pp 4729–4740

  34. Luo C, Zhan J, Xue X, Wang L, Ren R, Yang Q (2018) Cosine normalization: using cosine similarity instead of dot product in neural networks. In: International conference on artificial neural networks. Springer, pp 382–391

  35. Sun M, Li J, Guo Z, Yu Z, Zheng Y, Si X, Liu Z (2016) Thuctc: an efficient chinese text classifier. GitHub Repository

  36. Song C, Yang C, Chen H, Tu C, Liu Z, Sun M (2019) Ced: Credible early detection of social media rumors. IEEE Trans Knowl Data Eng

  37. Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05), pp 115–124

  38. Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2):165–210

    Article  Google Scholar 

  39. Lin T-Y, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: CVPR

  40. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  41. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(09):1904–1916

    Article  Google Scholar 

  42. Wang W, Tu Z (2020) Rethinking the value of transformer components. In: Proceedings of the 28th international conference on computational linguistics, pp 6019–6029

  43. Ebbinghaus H (1913) Memory (ha ruger & ce bussenius, trans.), vol 39. Teachers College, New York (Original work published 1885)

  44. Zydney W (2018) Present-bias and salience in discounting across short durations: a proposed experimental approach

  45. Cornia M, Stefanini M, Baraldi L, Cucchiara R (2020) Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

  46. Li P, Zhong P, Mao K, Wang D, Yang X, Liu Y, Yin J, See S (2021) Act: an attentive convolutional transformer for efficient text classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 13261–13269

  47. Zheng S, Yang M (2019) A new method of improving bert for text classification. In: International conference on intelligent science and big data engineering. Springer, pp 442–452

  48. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  MATH  Google Scholar 

  49. Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University, Princeton

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of China (No. 61976026) and the 111 Project (B18008)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Y., Zhang, X. & Yu, N. PL-Transformer: a POS-aware and layer ensemble transformer for text classification. Neural Comput & Applic 35, 1971–1982 (2023). https://doi.org/10.1007/s00521-022-07872-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07872-4

Keywords

Navigation