Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification

Xu, Zhewei; Iwaihara, Mizuho

doi:10.1007/978-3-030-91669-5_20

Zhewei Xu¹¹ &
Mizuho Iwaihara¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13133))

Included in the following conference series:

International Conference on Asian Digital Libraries

993 Accesses
1 Citations

Abstract

To meet the challenge of lack of labeled data in document classification tasks, semi-supervised learning has been studied, in which unlabeled samples are also utilized for training. Self-training is one of the iconic strategies for semi-supervised learning, in which a classifier trains itself by its own predictions. However, self-training has been mostly applied to multi-class classification, and rarely applied to the multi-label scenario. In this paper, we propose a self-training-based approach for semi-supervised multi-label document classification, in which semantic-space finetuning is introduced and integrated into the self-training process. Newly discovered credible predictions are used not only for classifier finetuning, but also for semantic-space finetuning, which further benefit label propagation for exploring more credible predictions. Experimental results confirm the effectiveness of the proposed approach and show a satisfactory improvement over the baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aly, R., Remus, S., Biemann, C.: Hierarchical multi-label classification of text with capsule networks. In: ACL: Student Research Workshop (2019)
Google Scholar
Apte, C., Damerau, F., Weiss, S.M.: Towards language independent automated learning of text categorization models. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994. Springer, London (1994). https://doi.org/10.1007/978-1-4471-2099-5_3
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Iscen, A., et al.: Label propagation for deep semi-supervised learning. In: CVPR (2019)
Google Scholar
Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Trans. Inf. Theory 11(3), 363–371 (1965)
Article MathSciNet Google Scholar
Kang, F., Jin, R., Sukthankar, R.: correlated label propagation with application to multi-label learning. In: CVPR (2006)
Google Scholar
Kong, X., Ng, M.K., Zhou, Z.H.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2011)
Google Scholar
Lee, D.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, no. 2 (2013)
Google Scholar
Li, X., et al.: Learning to self-train for semi-supervised few-shot classification. In: NeurIPS (2019)
Google Scholar
Liu, Y., et al.: Learning to propagate labels: transductive propagation network for few-shot learning. In: ICLR (2019)
Google Scholar
Meng, Y., et al.: Weakly-supervised neural text classification. In: CIKM (2018)
Google Scholar
Meng, Y., et al.: Weakly-supervised hierarchical text classification. In: AAAI (2019)
Google Scholar
Mukherjee, S., Ahmed, A.: Uncertainty-aware self-training for few-shot text classification. In: NeurIPS (2020)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: EMNLP-IJCNLP (2019)
Google Scholar
Su, J.: Blog post. https://www.spaces.ac.cn/archives/7359. Accessed 13 July 2021
Wang, B., Tu, Z., Tsotsos, J.K.: Dynamic label propagation for semi-supervised multi-class multi-label classification. In: ICCV (2013)
Google Scholar
Wang, L., et al.: Dual relation semi-supervised multi-label learning. In: AAAI (2020)
Google Scholar
Wei, C., et al.: CReST: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: CVPR (2021)
Google Scholar
Xie, Q., et al.: Self-training with noisy student improves imagenet classification. In: CVPR (2020)
Google Scholar
Xing, Y., et al.: Multi-label co-training. In: IJCAI (2018)
Google Scholar
Yang, P., et al.: SGM: sequence generation model for multi-label classification. In: COLING (2018)
Google Scholar
Zhan, W., Zhang, M.L.: Inductive semi-supervised multi-label learning with co-training. In: SIGKDD (2017)
Google Scholar
Zhang, Y., Zhou, Z.: Non-metric label propagation. In: IJCAI (2009)
Google Scholar
Zhu, X., Ghahramani, Z.: learning from labeled and unlabeled data with label propagation. Technical report CMU-CALD-02–107, Carnegie Mellon University (2002)
Google Scholar
Zou, Y., Yu, Z., Vijaya Kumar, B.V.K., Wang, J.: Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 297–313. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_18
Chapter Google Scholar
Zou, Y., et al.: Confidence regularized self-training. In: ICCV (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information, Production and Systems, Waseda University, Kitakyushu, 808-0135, Japan
Zhewei Xu & Mizuho Iwaihara

Authors

Zhewei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Mizuho Iwaihara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhewei Xu .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Hao-Ren Ke
Nanyang Technological University, Singapore, Singapore
Chei Sian Lee
Kyoto University, Kyoto, Japan
Kazunari Sugiyama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Z., Iwaihara, M. (2021). Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-91669-5_20
Published: 30 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics