Adjusting BERT’s Pooling Layer for Large-Scale Multi-Label Text Classification

Lehečka, Jan; Švec, Jan; Ircing, Pavel; Šmídl, Luboš

doi:10.1007/978-3-030-58323-1_23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12284))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

2146 Accesses
6 Citations

Abstract

In this paper, we present our experiments with BERT models in the task of Large-scale Multi-label Text Classification (LMTC). In the LMTC task, each text document can have multiple class labels, while the total number of classes is in the order of thousands. We propose a pooling layer architecture on top of BERT models, which improves the quality of classification by using information from the standard [CLS] token in combination with pooled sequence output. We demonstrate the improvements on Wikipedia datasets in three different languages using public pre-trained BERT models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Survey of Multi-label Text Classification Based on Deep Learning

A Neural Architecture for Multi-label Text Classification

L3Cube-MahaNews: News-Based Short Text and Long Document Classification Datasets in Marathi

Notes

References

Adhikari, A., Ram, A., Tang, R., Lin, J.: DocBERT: BERT for document classification. arXiv preprint arXiv:1904.08398 (2019)
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Androutsopoulos, I.: Large-scale multi-label text classification on EU legislation. arXiv preprint arXiv:1906.02192 (2019)
Chang, W.C., Yu, H.F., Zhong, K., Yang, Y., Dhillon, I.: X-BERT: extreme multi-label text classification using bidirectional encoder representations from transformers. arXiv preprint arXiv:1905.02331 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. arXiv preprint arXiv:1902.00751 (2019)
Liu, L., et al.: On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019)
Liu, Y., Lapata, M.: Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345 (2019)
Ma, X., Xu, P., Wang, Z., Nallapati, R., Xiang, B.: Universal text representation from BERT: an empirical study. arXiv preprint arXiv:1910.07973 (2019)
Rietzler, A., Stabinger, S., Opitz, P., Engl, S.: Adapt or get left behind: domain adaptation through BERT language model finetuning for aspect-target sentiment classification. arXiv preprint arXiv:1908.11860 (2019)
Song, Y., Wang, J., Liang, Z., Liu, Z., Jiang, T.: Utilizing BERT intermediate layers for aspect based sentiment analysis and natural language inference. arXiv preprint arXiv:2002.04815 (2020)
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16
Chapter Google Scholar
Tsoumakas, G., et al.: WISE 2014 challenge: multi-label classification of print media articles to topics. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014. LNCS, vol. 8787, pp. 541–548. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11746-1_40
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: Glue: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)
Wang, C., Li, M., Smola, A.J.: Language models with transformers. arXiv preprint arXiv:1904.09408 (2019)

Download references

Acknowledgments

This research was supported by the Ministry of Culture of the Czech Republic, project No. DG18P02OVV016.

Author information

Authors and Affiliations

Department of Cybernetics, University of West Bohemia in Pilsen, Pilsen, Czech Republic
Jan Lehečka, Jan Švec, Pavel Ircing & Luboš Šmídl

Authors

Jan Lehečka
View author publications
You can also search for this author in PubMed Google Scholar
Jan Švec
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Ircing
View author publications
You can also search for this author in PubMed Google Scholar
Luboš Šmídl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Lehečka .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lehečka, J., Švec, J., Ircing, P., Šmídl, L. (2020). Adjusting BERT’s Pooling Layer for Large-Scale Multi-Label Text Classification. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science(), vol 12284. Springer, Cham. https://doi.org/10.1007/978-3-030-58323-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-58323-1_23
Published: 01 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58322-4
Online ISBN: 978-3-030-58323-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adjusting BERT’s Pooling Layer for Large-Scale Multi-Label Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey of Multi-label Text Classification Based on Deep Learning

A Neural Architecture for Multi-label Text Classification

L3Cube-MahaNews: News-Based Short Text and Long Document Classification Datasets in Marathi

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Adjusting BERT’s Pooling Layer for Large-Scale Multi-Label Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey of Multi-label Text Classification Based on Deep Learning

A Neural Architecture for Multi-label Text Classification

L3Cube-MahaNews: News-Based Short Text and Long Document Classification Datasets in Marathi

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation