Multi-label text classification is a critical task in natural language processing field. As the latest language representation model, BERT obtains new state-of-the-art results in the classification task. Nevertheless, the text classification framework of BERT neglects to make full use of the token-level text representation and label embedding, since it only utilizes the final hidden state corresponding to CLS token as sequence-level text representation for classification. We assume that the finer-grained token-level text representation and label embedding contribute to classification. Consequently, in this paper, we propose a Label-Embedding Bi-directional Attentive model to improve the performance of BERT’s text classification framework. In particular, we extend BERT’s text classification framework with label embedding and bi-directional attention. Experimental results on the five datasets indicate that our model has notable improvements over both baselines and state-of-the-art models.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771
Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 2377–2383
Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. Lect Notes Comput Sci 2168(2168):42–53
Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. ICML 10:279–286
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. Neural information processing systems: natural and synthetic (NIPS). MIT Press, Vancouver, British Columbia, Canada, pp 681–687. http://papers.nips.cc/paper/1964-a-kernel-method-for-multi-labelled-classification
Gui Y, Gao Z, Li R, Yang X (2012) Hierarchical text classification for news articles based-on named entities. In: International conference on advanced data mining and applications. Springer, pp 318–329
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations
Kurata G, Xiang B, Zhou B (2016) Improved neural network-based multi-label classification with better initialization leveraging label co-occurrence. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 521–526
Li J, Ren F (2011) Creating a Chinese emotion lexicon based on corpus REN-CECPS. In: 2011 IEEE international conference on cloud computing and intelligence systems. IEEE, pp 80–84
Lin J, Su Q, Yang P, Ma S, Sun X (2018) Semantic-unit-based dilated convolution for multi-label text classification. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 4554–4564
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality
Mullenbach J, Wiegreffe S, Duke J, Sun J, Eisenstein J (2018) Explainable prediction of medical codes from clinical text. In: NAACL HLT 2018: 16th annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, vol 1, pp 1101–1111
Nam J, Mencía EL, Kim HJ, Fürnkranz J (2017) Maximizing subset accuracy with recurrent neural networks in multi-label classification. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 5413–5423. http://papers.nips.cc/paper/7125-maximizing-subset-accuracy-with-recurrent-neural-networks-in-multi-label-classification
Qin K, Li C, Pavlu V, Aslam J (2019) Adapting RNN sequence prediction model to multi-label set prediction. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 3181–3190
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding with unsupervised learning. Technical report, OpenAI
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333
Tang J, Qu M, Mei Q (2015) PTE: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1165–1174
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehousing Min (IJDWM) 3(3):1–13
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang B, Li C, Pavlu V, Aslam J (2018) A pipeline for optimizing f1-measure in multi-label text classification. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 913–918
Wang G, Li C, Wang W, Zhang Y, Shen D, Zhang X, Henao R, Carin L (2018) Joint embedding of words and labels for text classification. Meeting Assoc Comput Linguist 1:2321–2331
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144
Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) SGM: sequence generation model for multi-label classification. In: Proceedings of the 27th international conference on computational linguistics, pp 3915–3926
Zhang H, Xiao L, Chen W, Wang Y, Jin Y (2018) Multi-task label embedding for text classification. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018. Association for Computational Linguistics, pp 4545–4553. https://www.aclweb.org/anthology/D18-1484/
Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351
Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
This work is partially supported by National Natural Science Foundation of China (No. U1711263).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, N., Wang, Q. & Ren, J. Label-Embedding Bi-directional Attentive Model for Multi-label Text Classification. Neural Process Lett 53, 375–389 (2021). https://doi.org/10.1007/s11063-020-10411-8
Issue Date:
DOI: https://doi.org/10.1007/s11063-020-10411-8