Abstract
Assigning several labels to digital data is becoming easier as this can be achieved in a collaborative manner with Internet users. However, this process is still a challenge, especially in cases where several labels are assigned to each datum, as some suitable labels may be missed. The missing labels lead to inaccuracies in classification. In this study, we propose a novel graph-based multi-label classifier that exhibits stability for obtaining high-accuracy results; this is achieved even where there are missing labels in training data. The core process of our algorithm is to smoothen the label values of the training data from their top-k similar data by propagating their values and averaging them to generate values for the missing labels in the training data. In experimental evaluations, we used multi-labeled document and image datasets to evaluate classifiers, and then measured micro-averaged F-scores for eight classifiers. Even though we incrementally removed correct labels from the two datasets, the proposed algorithm tended to maintain the F-scores, whereas other classifiers decreased the scores. In addition, we evaluated the algorithm using Wikipedia, which comprises a real dataset that includes missing labels, in order to determine how well the algorithm predicted the correct labels and how useful it was for manual annotations, as initial decisions. We have confirmed that LPAC is useful for not only automatic annotation, but also the facilitation of decision making in the initial manual category assignment.
Similar content being viewed by others
Notes
References
Barforoush, A., Shirazi, H., Emami, H.: A new classification framework to evaluate the entity profiling on the web: Past, present and future. ACM Comput. Surv. 50(3), 39:1–39:39 (2017)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 31:1–31:50 (2016)
Cardoso-Cachopo, A., Oliveira, A.L.: Semi-supervised single-label text categorization using centroid-based classifiers. In: SAC’07, pp. 844–851. ACM, New York, NY, USA (2007)
Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: NIPS’02, pp. 601–608. MIT Press, Cambridge, MA, USA (2002)
Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 76(2), 211–225 (2009)
Cong, G., Lee, W.S., Wu, H., Liu, B.: Semi-supervised Text Classification Using Partitioned EM. Database Systems for Advanced Applications, pp. 482–493. Springer, Berlin (2004)
Ghani, R.: Combining labeled and unlabeled data for multiclass text categorization. In: ICML’02, pp. 187–194. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002)
Kang, F., Jin, R., Sukthankar, R.: Correlated label propagation with application to multi-label learning. In: CVPR’06, pp. 1719–1726. New York, NY, USA (2006)
Kong, X., Ng, M.K., Zhou, Z.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2013)
Košmerlj, A., Belyaeva, E., Leban, G., Grobelnik, M., Fortuna, B.: Towards a complete event type taxonomy. In: WWW’15 Companion, pp. 899–902. ACM, New York, NY, USA (2015)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML’14, pp. II–1188–II–1196. JMLR.org (2014)
Lo, H., Lin, S., Wang, H.: Generalized k-labelsets ensemble for multi-label and cost-sensitive classification. IEEE Trans. Knowl. Data Eng. 26(7), 1679–1691 (2014)
Menc’ia, E.L., Park, S., Fürnkranz, J.: Efficient voting prediction for pairwise multilabel classification. Neurocomputing 73(7–9), 1164–1176 (2010)
Mikolov, T., Kai, C., Suchanek Greg, C., Dean, J.: Linguistic regularities in continuous space word representations. In: NAACL-HLT’13, pp. 746–751 (2013)
Mikolov, T., Sutskever, I., Chen, K., S. Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS’13, pp. 3111–3119 (2013)
Mikolov, T., Yih, W.t., Zweig, G.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)
Pardalos, P.M., Vavasis, S.A.: Quadratic programming with one negative eigenvalue is NP-hard. J. Glob. Optim. 1, 15–22 (1991)
Pise, N.N., Kulkarni, P.: A survey of semi-supervised learning methods. In: 2008 International Conference on Computational Intelligence and Security, CISIS’08, vol. 2, pp. 30–34 (2008)
Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. 41(2), 12:1–312:1 (2009)
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Seyedi, S.A., Lotfi, A., Moradi, P., Qader, N.N.: Dynamic graph-based label propagation for density peaks clustering. Expert Syst. Appl. 115, 314–328 (2019)
Sumikawa, Y., Jatowt, A.: Classifying short descriptions of past events. In: ECIR’18, pp. 729–736 (2018)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data, pp. 667–685 (2010)
Wang, B., Tsotsos, J.: Dynamic label propagation for semi-supervised multi-class multi-label classification. Pattern Recognit. 52, 75–84 (2016)
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML’06, pp. 985–992. ACM, New York, NY, USA (2006)
Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Zhang, M.L., Zhou, Z.H.: Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)
Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: NIPS’04, pp. 321–328. MIT Press (2004)
Zhu, X.: Semi-supervised learning with graphs. Ph.D. thesis, Pittsburgh, PA, USA (2005)
Zhu, X.: Semi-supervised learning literature survey. Comput. Sci. 2, 4 (2008)
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Intell. Mach. Learn. 3, 1–130 (2009)
Zoidi, O., Fotiadou, E., Nikolaidis, N., Pitas, I.: Graph-based label propagation in digital media: a review. ACM Comput. Surv. 47(3), 48:1–48:35 (2015)
Acknowledgements
This work was supported in part by MEXT Grant-in-Aid (#19K20631).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sumikawa, Y., Miyazaki, T. Multilabel graph-based classification for missing labels. Int J Digit Libr 22, 85–104 (2021). https://doi.org/10.1007/s00799-020-00295-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-020-00295-3