Multi-label Classification of Long Text Based on Key-Sentences Extraction

Chen, Jiayin; Gong, Xiaolong; Qiu, Ye; Chen, Xi; Ma, Zhiyi

doi:10.1007/978-3-030-73197-7_1

Multi-label Classification of Long Text Based on Key-Sentences Extraction

Jiayin Chen¹⁶,
Xiaolong Gong¹⁷,
Ye Qiu¹⁷,
Xi Chen¹⁶ &
…
Zhiyi Ma^16,17

Conference paper
First Online: 06 April 2021

2942 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12682))

Abstract

Most existing works on multi-label classification of long text task will perform text truncation preprocessing, which leads to the loss of label-related global feature information. Some approaches that split an entire text into multiple segments for feature extracting, which generates noise features of irrelevant segments. To address these issues, we introduce key-sentences extraction task with semi-supervised learning to quickly distinguish relevant segments, which added to multi-label classification task to form a multi-task learning framework. The key-sentences extraction task can capture global information and filter irrelevant information to improve multi-label prediction. In addition, we apply sentence distribution and multi-label attention mechanism to improve the efficiency of our model. Experimental results on real-world datasets demonstrate that our proposed model achieves significant and consistent improvements compared with other state-of-the-art baselines.

J. Chen and X. Gong—These authors contributed equally to this work and should be regared as co-first authors.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Aly, R., Remus, S., Biemann, C.: Hierarchical multi-label classification of text with capsule networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 323–330 (2019)
Google Scholar
Banerjee, S., Akkaya, C., Perez-Sorrosal, F., Tsioutsiouliklis, K.: Hierarchical transfer learning for multi-label text classification. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6295–6300 (2019)
Google Scholar
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)
Article Google Scholar
Brandt, J.: Imbalanced multi-label classification using multi-task learning with extractive summarization. arXiv preprint arXiv:1903.06963 (2019)
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Androutsopoulos, I.: Large-scale multi-label text classification on eu legislation. arXiv preprint arXiv:1906.02192 (2019)
Chiang, T.H., Lo, H.Y., Lin, S.D.: A ranking-based knn approach for multi-label classification. In: Asian Conference on Machine Learning, pp. 81–96 (2012)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dong, H., Wang, W., Huang, K., Coenen, F.: Joint multi-label attention networks for social text annotation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 1348–1354 (2019)
Google Scholar
Fürnkranz, J., Hüllermeier, E., Mencía, E.L., Brinker, K.: Multilabel classification via calibrated label ranking. Mach. Learn. 73(2), 133–153 (2008)
Article Google Scholar
Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 195–200 (2005)
Google Scholar
He, R., Lee, W.S., Ng, H.T., Dahlmeier, D.: An interactive multi-task learning network for end-to-end aspect-based sentiment analysis. arXiv preprint arXiv:1906.06906 (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hu, Z., Li, X., Tu, C., Liu, Z., Sun, M.: Few-shot charge prediction with discriminative legal attributes. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 487–498 (2018)
Google Scholar
Huang, W., et al.: Hierarchical multi-label text classification: an attention-based recurrent network approach. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1051–1060 (2019)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Liu, J., Chang, W.C., Wu, Y., Yang, Y.: Deep learning for extreme multi-label text classification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–124 (2017)
Google Scholar
Liu, W.: Copula multi-label learning. In: Advances in Neural Information Processing Systems, pp. 6337–6346 (2019)
Google Scholar
Maddela, M., Xu, W., Preoţiuc-Pietro, D.: Multi-task pairwise neural ranking for hashtag segmentation. arXiv preprint arXiv:1906.00790 (2019)
Schindler, A., Knees, P.: Multi-task music representation learning from multi-label embeddings. In: 2019 International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2019)
Google Scholar
Shimura, K., Li, J., Fukumoto, F.: HFT-CNN: learning hierarchical category structure for multi-label short text categorization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 811–816 (2018)
Google Scholar
Tian, B., Zhang, Y., Wang, J., Xing, C.: Hierarchical inter-attention network for document classification with multi-task learning. In: IJCAI, pp. 3569–3575 (2019)
Google Scholar
Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38
Chapter Google Scholar
Wang, H., Liu, W., Zhao, Y., Zhang, C., Hu, T., Chen, G.: Discriminative and correlative partial multi-label learning. In: IJCAI, pp. 3691–3697 (2019)
Google Scholar
Xie, M.K., Huang, S.J.: Partial multi-label learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Yang, P., Sun, X., Li, W., Ma, S., Wu, W., Wang, H.: SGM: sequence generation model for multi-label classification. arXiv preprint arXiv:1806.04822 (2018)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar
Ye, W., Li, B., Xie, R., Sheng, Z., Chen, L., Zhang, S.: Exploiting entity bio tag embeddings and multi-task learning for relation extraction with imbalanced data. arXiv preprint arXiv:1906.08931 (2019)
You, R., Zhang, Z., Wang, Z., Dai, S., Mamitsuka, H., Zhu, S.: Attentionxml: label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In: Advances in Neural Information Processing Systems, pp. 5820–5830 (2019)
Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2013)
Article Google Scholar
Zhang, Z., Liu, J., Razavian, N.: Bert-xml: Large scale automated ICD coding using bert pretraining. arXiv preprint arXiv:2006.03685 (2020)
Zhu, Y., Kwok, J.T., Zhou, Z.H.: Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 30(6), 1081–1094 (2017)
Article Google Scholar

Download references

Acknowledgments

We thank all the anonymous reviewers for their insightful comments. This work is supported by the National Natural Science Foundation of China (No. 61672046).

Author information

Authors and Affiliations

Advanced Institute of Information Technology, Peking University, Hangzhou, China
Jiayin Chen, Xi Chen & Zhiyi Ma
School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Xiaolong Gong, Ye Qiu & Zhiyi Ma

Authors

Jiayin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Gong
View author publications
You can also search for this author in PubMed Google Scholar
Ye Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Xi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyi Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiyi Ma .

Editor information

Editors and Affiliations

Aalborg University, Aalborg, Denmark
Christian S. Jensen
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Academia Sinica, Taipei, Taiwan
De-Nian Yang
The Pennsylvania State University, University Park, PA, USA
Wang-Chien Lee
National Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Athens University of Economics and Business, Athens, Greece
Vana Kalogeraki
National Cheng Kung University, Tainan City, Taiwan
Jen-Wei Huang
National Tsing Hua University, Hsinchu, Taiwan
Chih-Ya Shen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Gong, X., Qiu, Y., Chen, X., Ma, Z. (2021). Multi-label Classification of Long Text Based on Key-Sentences Extraction. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12682. Springer, Cham. https://doi.org/10.1007/978-3-030-73197-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-73197-7_1
Published: 06 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73196-0
Online ISBN: 978-3-030-73197-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics