Abstract
Semi-supervised learning methods constitute a category of machine learning methods which use labelled points together with unlabelled data to tune the classifier. The main idea of the semi-supervised methods is based on an assumption that the classification function should change smoothly over a similarity graph, which represents relations among data points. This idea can be expressed using kernels on graphs such as graph Laplacian. Different semi-supervised learning methods have different kernels which reflect how the underlying similarity graph influences the classification results. In the present work, we analyse a general family of semi-supervised methods, provide insights about the differences among the methods and give recommendations for the choice of the kernel parameters and labelled points. In particular, it appears that it is preferable to choose a kernel based on the properties of the labelled points. We illustrate our general theoretical conclusions with an analytically tractable characteristic example, clustered preferential attachment model and classification of content in P2P networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Andersen, R., Chung, F., Lang, K.: Using pagerank to locally partition a graph. Internet Mathematics 4(1), 35–64 (2007)
Avrachenkov, K.: Analytic Perturbation Theory and its Applications, PhD Thesis. University of South Australia, Adelaide, Australia (1999)
Avrachenkov, K., Dobrynin, V., Nemirovsky, D., Pham, S.K., Smirnova, E.: Pagerank based clustering of hypertext document collections. In: Proceedings of the 31st Annual International ACM Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 873–874. ACM (2008)
Avrachenkov, K., Gonçalves, P., Legout, A., Sokol, M.: Classification of content and users in bittorrent by semi-supervised learning methods. In: 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC), Workshop on Traffic Analysis and Classification, pp. 625–630 (2012)
Avrachenkov, K., Gonçalves, P., Mishenin, A., Sokol, M.: Generalized optimization framework for graph-based semi-supervised learning. In: Proceedings of SIAM Conference on Data Mining (SDM 2012), 9 pages (2012)
Avrachenkov, K., Litvak, N.: The effect of new links on google pagerank. Stochastic Models 22(2) (2006)
Blackwell, D.: Discrete dynamic programming. Ann. Math. Statist. 33, 719–726 (1962)
Guo, Z., Zhang, Z., Xing, E.P., Faloutsos, C.: Semi-supervised learning based on semiparametric regularization. In: SDM 2008 Proceedings, pp. 132–142 (2008)
Haveliwala, T.H.: Topic-sensitive pagerank. In: Proceedings of the 11th International Conference on World Wide Web (WWW 2002), pp. 517–526 (2002)
Kemeny, J.G., Snell, J.L.: Finite Markov chains, 1st edn. Springer (1976)
Le Blond, S., Legout, A., Lefessant, F., Dabbous, W., Kaafar, M.A.: Spying the world from your laptop: identifying and profiling content providers and big downloaders in bittorrent. In: Proceedings of the 3rd USENIX Conference on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More, LEET 2010, p. 4. USENIX Association, Berkeley (2010)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)
Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, pp. 321–328. MIT Press (2004)
Zhou, D., Burges, C.J.C.: Spectral clustering and transductive learning with multiple views. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 1159–1166. ACM (2007)
Zhou, D., Schölkopf, B.: A regularization framework for learning from graph data. In: Proceedings of the Workshop on Statistical Relational Learning at Twenty-First International Conference on Machine Learning (ICML 2004), Canada, 6 pages (2004)
Zhu, X.: Semi-supervised learning literature survey. Technical report 1530, Department of computer sciences, University of wisconsin, Madison (2005)
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 3(1), 1–130 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Avrachenkov, K., Gonçalves, P., Sokol, M. (2013). On the Choice of Kernel and Labelled Data in Semi-supervised Learning Methods. In: Bonato, A., Mitzenmacher, M., Prałat, P. (eds) Algorithms and Models for the Web Graph. WAW 2013. Lecture Notes in Computer Science, vol 8305. Springer, Cham. https://doi.org/10.1007/978-3-319-03536-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-03536-9_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03535-2
Online ISBN: 978-3-319-03536-9
eBook Packages: Computer ScienceComputer Science (R0)