PL-Transformer: a POS-aware and layer ensemble transformer for text classification

Shi, Yu; Zhang, Xi; Yu, Ning

doi:10.1007/s00521-022-07872-4

PL-Transformer: a POS-aware and layer ensemble transformer for text classification

Original Article
Published: 06 October 2022

Volume 35, pages 1971–1982, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yu Shi¹,
Xi Zhang¹ &
Ning Yu¹

769 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

The transformer-based models have become the de-facto standard for natural language processing (NLP) tasks. However, most of these models are only designed to capture the implicit semantics among tokens without considering the extra off-the-shelf knowledge (e.g., parts-of-speech) to facilitate the NLP tasks. Additionally, despite using multiple attention-based encoders, they only utilize the embeddings from the last layer, ignoring that from other layers. To address these issues, in this paper, we propose a novel POS-aware and layer ensemble transformer neural network (named as PL-Transformer). PL-Transformer utilizes the parts-of-speech information explicitly and leverages the outputs from different encoder layers with correlation coefficient attention (C-Encoder) jointly. Moreover, we use correlation coefficient attention to bound dot product in C-Encoder, which improves the overall model performance. Extensive experiments on four datasets demonstrate that PL-Transformer can improve the text classification performance. For example, the accuracy on the MPQA dataset is improved by 3.95% over the vanilla transformer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Verification of Transformer in Text Classification

Multimodal Fusion with Global and Local Features for Text Classification

AHNN: An Attention-Based Hybrid Neural Network for Sentence Modeling

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning-based text classification: a comprehensive review. ACM Comput Surv (CSUR) 54(3):1–40
Article Google Scholar
Tang D, Wei F, Qin B, Dong L, Liu T, Zhou M (2014) A joint segmentation and classification framework for sentiment analysis. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 477–487
Zhang D, Lee WS (2003) Question classification using support vector machines. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 26–32
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(Nov):45–66
MATH Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. In: EMNLP
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, pp 2873–2879
Zhang J, Luan H, Sun M, Zhai F, Xu J, Zhang M, Liu Y (2018) Improving the transformer translation model with document-level context. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 533–542
Zhong P, Wang D, Miao C (2019) Knowledge-enriched transformer for emotion detection in textual conversations. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 165–176
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang Y, Tian L, Li C (2020) An improved text classification model based on memory convolution neural network. In: Proceedings of the 2020 6th international conference on computing and artificial intelligence, pp 19–23
Dwivedi SK, Arya C (2016) Automatic text classification in information retrieval: a survey. In: Proceedings of the second international conference on information and communication technology for competitive strategies, pp 1–6
Pawar PY, Gawande S (2012) A comparative study on different types of approaches to text categorization. Int J Mach Learn Comput 2(4):423
Article Google Scholar
Buabin E (2012) Boosted hybrid recurrent neural classifier for text document classification on the reuters news text corpus. Int J Mach Learn Comput 2(5):588
Article Google Scholar
McCallum A, Nigam K, (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol. 752. Citeseer, pp 41–48
Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the seventh international conference on information and knowledge management, pp 148–155
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: European conference on machine learning. Springer, pp 137–142
Liu Y, Xu Q, Wang C (2020) Text classification based on title semantic information. In: 2020 5th international conference on intelligent informatics and biomedical sciences (ICIIBMS). IEEE, pp 29–33
Ren J, Wu W, Liu G, Chen Z, Wang R (2021) Bidirectional gated temporal convolution with attention for text classification. Neurocomputing 455:265–273
Article Google Scholar
Guo X, Lai H, Xiang Y, Yu Z, Huang Y (2021) Emotion classification of covid-19 chinese microblogs based on the emotion category description. In: China national conference on chinese computational linguistics. Springer, pp 61–76
Wang B (2018) Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 2311–2320
Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of cnn and rnn for natural language processing. arXiv:1702.01923
Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: short papers), pp 207–212
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 1480–1489
Fang W, Luo H, Xu S, Love PE, Lu Z, Ye C (2020) Automated text classification of near-misses from safety reports: An improved deep learning approach. Adv Eng Inform 44:101060
Article Google Scholar
Dehghani M, Gouws S, Vinyals O, Uszkoreit J, Kaiser L (2018) Universal transformers. In: International conference on learning representations
Guo Q, Qiu X, Liu P, Shao Y, Xue X, Zhang Z (2019) Star-transformer. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologies, volume 1 (Long and Short Papers), pp 1315–1325
Tian Y, Chen G, Song Y (2021) Enhancing aspect-level sentiment analysis with word dependencies. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp 3726–3739
Haspelmath M (2001) Word classes and parts of speech
Kornfilt J (2020) Parts of speech, lexical categories, and word classes in morphology. In: Oxford research encyclopedia of linguistics
Li L, Zhao T, Xie Y, Feng Y (2020) Interpretable machine learning based on integration of nlp and psychology in peer-to-peer lending risk evaluation. In: CCF International conference on natural language processing and chinese computing. Springer, pp 429–441
Yin R, Wang Q, Li P, Li R, Wang B (2016) Multi-granularity chinese word embedding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 981–986
Diao S, Bai J, Song Y, Zhang T, Wang Y (2020) Zen: pre-training chinese text encoder enhanced by n-gram representations. In: Proceedings of the 2020 conference on empirical methods in natural language processing: findings, pp 4729–4740
Luo C, Zhan J, Xue X, Wang L, Ren R, Yang Q (2018) Cosine normalization: using cosine similarity instead of dot product in neural networks. In: International conference on artificial neural networks. Springer, pp 382–391
Sun M, Li J, Guo Z, Yu Z, Zheng Y, Si X, Liu Z (2016) Thuctc: an efficient chinese text classifier. GitHub Repository
Song C, Yang C, Chen H, Tu C, Liu Z, Sun M (2019) Ced: Credible early detection of social media rumors. IEEE Trans Knowl Data Eng
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05), pp 115–124
Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2):165–210
Article Google Scholar
Lin T-Y, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: CVPR
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(09):1904–1916
Article Google Scholar
Wang W, Tu Z (2020) Rethinking the value of transformer components. In: Proceedings of the 28th international conference on computational linguistics, pp 6019–6029
Ebbinghaus H (1913) Memory (ha ruger & ce bussenius, trans.), vol 39. Teachers College, New York (Original work published 1885)
Zydney W (2018) Present-bias and salience in discounting across short durations: a proposed experimental approach
Cornia M, Stefanini M, Baraldi L, Cucchiara R (2020) Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Li P, Zhong P, Mao K, Wang D, Yang X, Liu Y, Yin J, See S (2021) Act: an attentive convolutional transformer for efficient text classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 13261–13269
Zheng S, Yang M (2019) A new method of improving bert for text classification. In: International conference on intelligent science and big data engineering. Springer, pp 442–452
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article MATH Google Scholar
Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University, Princeton
Google Scholar

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of China (No. 61976026) and the 111 Project (B18008)

Author information

Authors and Affiliations

Key Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, China
Yu Shi, Xi Zhang & Ning Yu

Authors

Yu Shi
View author publications
You can also search for this author inPubMed Google Scholar
Xi Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Ning Yu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xi Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shi, Y., Zhang, X. & Yu, N. PL-Transformer: a POS-aware and layer ensemble transformer for text classification. Neural Comput & Applic 35, 1971–1982 (2023). https://doi.org/10.1007/s00521-022-07872-4

Download citation

Received: 27 October 2021
Accepted: 21 September 2022
Published: 06 October 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s00521-022-07872-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PL-Transformer: a POS-aware and layer ensemble transformer for text classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Comprehensive Verification of Transformer in Text Classification

Multimodal Fusion with Global and Local Features for Text Classification

AHNN: An Attention-Based Hybrid Neural Network for Sentence Modeling

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now