Skip to main content

Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification

  • Conference paper
  • First Online:
Towards Open and Trustworthy Digital Societies (ICADL 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13133))

Included in the following conference series:

Abstract

To meet the challenge of lack of labeled data in document classification tasks, semi-supervised learning has been studied, in which unlabeled samples are also utilized for training. Self-training is one of the iconic strategies for semi-supervised learning, in which a classifier trains itself by its own predictions. However, self-training has been mostly applied to multi-class classification, and rarely applied to the multi-label scenario. In this paper, we propose a self-training-based approach for semi-supervised multi-label document classification, in which semantic-space finetuning is introduced and integrated into the self-training process. Newly discovered credible predictions are used not only for classifier finetuning, but also for semantic-space finetuning, which further benefit label propagation for exploring more credible predictions. Experimental results confirm the effectiveness of the proposed approach and show a satisfactory improvement over the baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aly, R., Remus, S., Biemann, C.: Hierarchical multi-label classification of text with capsule networks. In: ACL: Student Research Workshop (2019)

    Google Scholar 

  2. Apte, C., Damerau, F., Weiss, S.M.: Towards language independent automated learning of text categorization models. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994. Springer, London (1994). https://doi.org/10.1007/978-1-4471-2099-5_3

  3. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)

    Google Scholar 

  4. Iscen, A., et al.: Label propagation for deep semi-supervised learning. In: CVPR (2019)

    Google Scholar 

  5. Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Trans. Inf. Theory 11(3), 363–371 (1965)

    Article  MathSciNet  Google Scholar 

  6. Kang, F., Jin, R., Sukthankar, R.: correlated label propagation with application to multi-label learning. In: CVPR (2006)

    Google Scholar 

  7. Kong, X., Ng, M.K., Zhou, Z.H.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2011)

    Google Scholar 

  8. Lee, D.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, no. 2 (2013)

    Google Scholar 

  9. Li, X., et al.: Learning to self-train for semi-supervised few-shot classification. In: NeurIPS (2019)

    Google Scholar 

  10. Liu, Y., et al.: Learning to propagate labels: transductive propagation network for few-shot learning. In: ICLR (2019)

    Google Scholar 

  11. Meng, Y., et al.: Weakly-supervised neural text classification. In: CIKM (2018)

    Google Scholar 

  12. Meng, Y., et al.: Weakly-supervised hierarchical text classification. In: AAAI (2019)

    Google Scholar 

  13. Mukherjee, S., Ahmed, A.: Uncertainty-aware self-training for few-shot text classification. In: NeurIPS (2020)

    Google Scholar 

  14. Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: EMNLP-IJCNLP (2019)

    Google Scholar 

  15. Su, J.: Blog post. https://www.spaces.ac.cn/archives/7359. Accessed 13 July 2021

  16. Wang, B., Tu, Z., Tsotsos, J.K.: Dynamic label propagation for semi-supervised multi-class multi-label classification. In: ICCV (2013)

    Google Scholar 

  17. Wang, L., et al.: Dual relation semi-supervised multi-label learning. In: AAAI (2020)

    Google Scholar 

  18. Wei, C., et al.: CReST: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: CVPR (2021)

    Google Scholar 

  19. Xie, Q., et al.: Self-training with noisy student improves imagenet classification. In: CVPR (2020)

    Google Scholar 

  20. Xing, Y., et al.: Multi-label co-training. In: IJCAI (2018)

    Google Scholar 

  21. Yang, P., et al.: SGM: sequence generation model for multi-label classification. In: COLING (2018)

    Google Scholar 

  22. Zhan, W., Zhang, M.L.: Inductive semi-supervised multi-label learning with co-training. In: SIGKDD (2017)

    Google Scholar 

  23. Zhang, Y., Zhou, Z.: Non-metric label propagation. In: IJCAI (2009)

    Google Scholar 

  24. Zhu, X., Ghahramani, Z.: learning from labeled and unlabeled data with label propagation. Technical report CMU-CALD-02–107, Carnegie Mellon University (2002)

    Google Scholar 

  25. Zou, Y., Yu, Z., Vijaya Kumar, B.V.K., Wang, J.: Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 297–313. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_18

    Chapter  Google Scholar 

  26. Zou, Y., et al.: Confidence regularized self-training. In: ICCV (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhewei Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, Z., Iwaihara, M. (2021). Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91669-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91668-8

  • Online ISBN: 978-3-030-91669-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics