Abstract
To meet the challenge of lack of labeled data in document classification tasks, semi-supervised learning has been studied, in which unlabeled samples are also utilized for training. Self-training is one of the iconic strategies for semi-supervised learning, in which a classifier trains itself by its own predictions. However, self-training has been mostly applied to multi-class classification, and rarely applied to the multi-label scenario. In this paper, we propose a self-training-based approach for semi-supervised multi-label document classification, in which semantic-space finetuning is introduced and integrated into the self-training process. Newly discovered credible predictions are used not only for classifier finetuning, but also for semantic-space finetuning, which further benefit label propagation for exploring more credible predictions. Experimental results confirm the effectiveness of the proposed approach and show a satisfactory improvement over the baseline methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aly, R., Remus, S., Biemann, C.: Hierarchical multi-label classification of text with capsule networks. In: ACL: Student Research Workshop (2019)
Apte, C., Damerau, F., Weiss, S.M.: Towards language independent automated learning of text categorization models. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994. Springer, London (1994). https://doi.org/10.1007/978-1-4471-2099-5_3
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Iscen, A., et al.: Label propagation for deep semi-supervised learning. In: CVPR (2019)
Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Trans. Inf. Theory 11(3), 363–371 (1965)
Kang, F., Jin, R., Sukthankar, R.: correlated label propagation with application to multi-label learning. In: CVPR (2006)
Kong, X., Ng, M.K., Zhou, Z.H.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2011)
Lee, D.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, no. 2 (2013)
Li, X., et al.: Learning to self-train for semi-supervised few-shot classification. In: NeurIPS (2019)
Liu, Y., et al.: Learning to propagate labels: transductive propagation network for few-shot learning. In: ICLR (2019)
Meng, Y., et al.: Weakly-supervised neural text classification. In: CIKM (2018)
Meng, Y., et al.: Weakly-supervised hierarchical text classification. In: AAAI (2019)
Mukherjee, S., Ahmed, A.: Uncertainty-aware self-training for few-shot text classification. In: NeurIPS (2020)
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: EMNLP-IJCNLP (2019)
Su, J.: Blog post. https://www.spaces.ac.cn/archives/7359. Accessed 13 July 2021
Wang, B., Tu, Z., Tsotsos, J.K.: Dynamic label propagation for semi-supervised multi-class multi-label classification. In: ICCV (2013)
Wang, L., et al.: Dual relation semi-supervised multi-label learning. In: AAAI (2020)
Wei, C., et al.: CReST: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: CVPR (2021)
Xie, Q., et al.: Self-training with noisy student improves imagenet classification. In: CVPR (2020)
Xing, Y., et al.: Multi-label co-training. In: IJCAI (2018)
Yang, P., et al.: SGM: sequence generation model for multi-label classification. In: COLING (2018)
Zhan, W., Zhang, M.L.: Inductive semi-supervised multi-label learning with co-training. In: SIGKDD (2017)
Zhang, Y., Zhou, Z.: Non-metric label propagation. In: IJCAI (2009)
Zhu, X., Ghahramani, Z.: learning from labeled and unlabeled data with label propagation. Technical report CMU-CALD-02–107, Carnegie Mellon University (2002)
Zou, Y., Yu, Z., Vijaya Kumar, B.V.K., Wang, J.: Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 297–313. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_18
Zou, Y., et al.: Confidence regularized self-training. In: ICCV (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, Z., Iwaihara, M. (2021). Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-91669-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)