Abstract
Due to the small size of available labeled data for semi-supervised learning, approaches to this problem make strong assumptions about the data, performing well only when such assumptions hold true. However, a lot of effort may have to be spent in understanding the data so that the most suitable model can be applied. This process can be as critical as gathering labeled data. One way to overcome this hindrance is to control the contribution of different assumptions to the model, rendering it capable of performing reasonably in a wide range of applications. In this paper we propose a collective matrix factorization model that simultaneously decomposes the predictor, neighborhood and target matrices (PNT-CMF) to achieve semi-supervised classification. By controlling how strongly the model relies on different assumptions, PNT-CMF is able to perform well on a wider variety of datasets. Experiments on synthetic and real world datasets show that, while state-of-the-art models (TSVM and LapSVM) excel on datasets that match their characteristics and have a performance drop on the others, our approach outperforms them being consistently competitive in different situations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. The Journal of Machine Learning Research 7, 2399–2434 (2006)
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Cozman, F., Cohen, I., Cirelo, M.: Semi-supervised learning of mixture models. In: 20th International Conference on Machine Learning, vol. 20, pp. 99–106 (2003)
Gammerman, A., Vovk, V., Vapnik, V.: Learning by Transduction. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 148–156. Morgan Kaufmann (1998)
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: Proceedings of the 1999 International Conference on Machine Learning, ICML (1999)
Liu, Y., Jin, R., Yang, L.: Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21, p. 421. AAAI Press (2006)
Melacci, S., Belkin, M.: Laplacian Support Vector Machines Trained in the Primal. Journal of Machine Learning Research 12, 1149–1184 (2011)
Nigam, K., McCallum, A., Mitchell, T.: Semi-supervised text classification using EM. In: Chapelle, O., Schölkopf, B., Zien, A. (eds.) Semi-Supervised Learning, pp. 33–56. The MIT Press, Cambridge (2006)
Rennie, J.: Smooth Hinge Classification (February 2005), http://people.csail.mit.edu/jrennie/writing/smoothHinge.pdf
Singh, A.P., Gordon, G.J.: Relational learning via collective matrix factorization. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 650–658. ACM, New York (2008)
Szummer, M., Jaakkola, T.: Information regularization with partially labeled data. In: Advances in Neural Information Processing Systems, vol. 15, pp. 1025–1032 (2002)
Wang, F., Li, T., Zhang, C.: Semi-supervised clustering via matrix factorization. In: Proceedings of the 2008 SIAM International Conference on Data Mining, pp. 1–12. SIAM (2008)
Weinberger, K., Packer, B., Saul, L.: Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pp. 381–388 (2005)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, vol. 16, pp. 321–328 (2004)
Zhu, X.: Semi-supervised learning literature survey. Tech. Rep. 1530, University of Wisconsin, Madison (December 2006)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML), pp. 912–919 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Drumond, L.R., Schmidt-Thieme, L., Freudenthaler, C., Krohn-Grimberghe, A. (2014). Collective Matrix Factorization of Predictors, Neighborhood and Targets for Semi-supervised Classification. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8443. Springer, Cham. https://doi.org/10.1007/978-3-319-06608-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-06608-0_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06607-3
Online ISBN: 978-3-319-06608-0
eBook Packages: Computer ScienceComputer Science (R0)