Skip to main content

Multi-label Classification of Long Text Based on Key-Sentences Extraction

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12682))

Abstract

Most existing works on multi-label classification of long text task will perform text truncation preprocessing, which leads to the loss of label-related global feature information. Some approaches that split an entire text into multiple segments for feature extracting, which generates noise features of irrelevant segments. To address these issues, we introduce key-sentences extraction task with semi-supervised learning to quickly distinguish relevant segments, which added to multi-label classification task to form a multi-task learning framework. The key-sentences extraction task can capture global information and filter irrelevant information to improve multi-label prediction. In addition, we apply sentence distribution and multi-label attention mechanism to improve the efficiency of our model. Experimental results on real-world datasets demonstrate that our proposed model achieves significant and consistent improvements compared with other state-of-the-art baselines.

J. Chen and X. Gong—These authors contributed equally to this work and should be regared as co-first authors.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/cjymz886/ACNet.

  2. 2.

    http://wenshu.court.gov.cn.

References

  1. Aly, R., Remus, S., Biemann, C.: Hierarchical multi-label classification of text with capsule networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 323–330 (2019)

    Google Scholar 

  2. Banerjee, S., Akkaya, C., Perez-Sorrosal, F., Tsioutsiouliklis, K.: Hierarchical transfer learning for multi-label text classification. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6295–6300 (2019)

    Google Scholar 

  3. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)

    Article  Google Scholar 

  4. Brandt, J.: Imbalanced multi-label classification using multi-task learning with extractive summarization. arXiv preprint arXiv:1903.06963 (2019)

  5. Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Androutsopoulos, I.: Large-scale multi-label text classification on eu legislation. arXiv preprint arXiv:1906.02192 (2019)

  6. Chiang, T.H., Lo, H.Y., Lin, S.D.: A ranking-based knn approach for multi-label classification. In: Asian Conference on Machine Learning, pp. 81–96 (2012)

    Google Scholar 

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  8. Dong, H., Wang, W., Huang, K., Coenen, F.: Joint multi-label attention networks for social text annotation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 1348–1354 (2019)

    Google Scholar 

  9. Fürnkranz, J., Hüllermeier, E., Mencía, E.L., Brinker, K.: Multilabel classification via calibrated label ranking. Mach. Learn. 73(2), 133–153 (2008)

    Article  Google Scholar 

  10. Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 195–200 (2005)

    Google Scholar 

  11. He, R., Lee, W.S., Ng, H.T., Dahlmeier, D.: An interactive multi-task learning network for end-to-end aspect-based sentiment analysis. arXiv preprint arXiv:1906.06906 (2019)

  12. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  13. Hu, Z., Li, X., Tu, C., Liu, Z., Sun, M.: Few-shot charge prediction with discriminative legal attributes. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 487–498 (2018)

    Google Scholar 

  14. Huang, W., et al.: Hierarchical multi-label text classification: an attention-based recurrent network approach. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1051–1060 (2019)

    Google Scholar 

  15. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  16. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)

  17. Liu, J., Chang, W.C., Wu, Y., Yang, Y.: Deep learning for extreme multi-label text classification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–124 (2017)

    Google Scholar 

  18. Liu, W.: Copula multi-label learning. In: Advances in Neural Information Processing Systems, pp. 6337–6346 (2019)

    Google Scholar 

  19. Maddela, M., Xu, W., Preoţiuc-Pietro, D.: Multi-task pairwise neural ranking for hashtag segmentation. arXiv preprint arXiv:1906.00790 (2019)

  20. Schindler, A., Knees, P.: Multi-task music representation learning from multi-label embeddings. In: 2019 International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2019)

    Google Scholar 

  21. Shimura, K., Li, J., Fukumoto, F.: HFT-CNN: learning hierarchical category structure for multi-label short text categorization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 811–816 (2018)

    Google Scholar 

  22. Tian, B., Zhang, Y., Wang, J., Xing, C.: Hierarchical inter-attention network for document classification with multi-task learning. In: IJCAI, pp. 3569–3575 (2019)

    Google Scholar 

  23. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38

    Chapter  Google Scholar 

  24. Wang, H., Liu, W., Zhao, Y., Zhang, C., Hu, T., Chen, G.: Discriminative and correlative partial multi-label learning. In: IJCAI, pp. 3691–3697 (2019)

    Google Scholar 

  25. Xie, M.K., Huang, S.J.: Partial multi-label learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  26. Yang, P., Sun, X., Li, W., Ma, S., Wu, W., Wang, H.: SGM: sequence generation model for multi-label classification. arXiv preprint arXiv:1806.04822 (2018)

  27. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)

    Google Scholar 

  28. Ye, W., Li, B., Xie, R., Sheng, Z., Chen, L., Zhang, S.: Exploiting entity bio tag embeddings and multi-task learning for relation extraction with imbalanced data. arXiv preprint arXiv:1906.08931 (2019)

  29. You, R., Zhang, Z., Wang, Z., Dai, S., Mamitsuka, H., Zhu, S.: Attentionxml: label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In: Advances in Neural Information Processing Systems, pp. 5820–5830 (2019)

    Google Scholar 

  30. Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)

    Article  Google Scholar 

  31. Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2013)

    Article  Google Scholar 

  32. Zhang, Z., Liu, J., Razavian, N.: Bert-xml: Large scale automated ICD coding using bert pretraining. arXiv preprint arXiv:2006.03685 (2020)

  33. Zhu, Y., Kwok, J.T., Zhou, Z.H.: Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 30(6), 1081–1094 (2017)

    Article  Google Scholar 

Download references

Acknowledgments

We thank all the anonymous reviewers for their insightful comments. This work is supported by the National Natural Science Foundation of China (No. 61672046).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyi Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, J., Gong, X., Qiu, Y., Chen, X., Ma, Z. (2021). Multi-label Classification of Long Text Based on Key-Sentences Extraction. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12682. Springer, Cham. https://doi.org/10.1007/978-3-030-73197-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73197-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73196-0

  • Online ISBN: 978-3-030-73197-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics