On the Choice of Kernel and Labelled Data in Semi-supervised Learning Methods

Avrachenkov, Konstantin; Gonçalves, Paulo; Sokol, Marina

doi:10.1007/978-3-319-03536-9_5

Konstantin Avrachenkov¹⁸,
Paulo Gonçalves¹⁹ &
Marina Sokol¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8305))

Included in the following conference series:

International Workshop on Algorithms and Models for the Web-Graph

1025 Accesses

Abstract

Semi-supervised learning methods constitute a category of machine learning methods which use labelled points together with unlabelled data to tune the classifier. The main idea of the semi-supervised methods is based on an assumption that the classification function should change smoothly over a similarity graph, which represents relations among data points. This idea can be expressed using kernels on graphs such as graph Laplacian. Different semi-supervised learning methods have different kernels which reflect how the underlying similarity graph influences the classification results. In the present work, we analyse a general family of semi-supervised methods, provide insights about the differences among the methods and give recommendations for the choice of the kernel parameters and labelled points. In particular, it appears that it is preferable to choose a kernel based on the properties of the labelled points. We illustrate our general theoretical conclusions with an analytically tractable characteristic example, clustered preferential attachment model and classification of content in P2P networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An experimental study of graph-based semi-supervised classification with additional node information

Article 09 October 2020

Semi-supervised generalized eigenvalues classification

Article 10 October 2017

LIA: A Label-Independent Algorithm for Feature Selection for Supervised Learning

References

Andersen, R., Chung, F., Lang, K.: Using pagerank to locally partition a graph. Internet Mathematics 4(1), 35–64 (2007)
Article MathSciNet MATH Google Scholar
Avrachenkov, K.: Analytic Perturbation Theory and its Applications, PhD Thesis. University of South Australia, Adelaide, Australia (1999)
Google Scholar
Avrachenkov, K., Dobrynin, V., Nemirovsky, D., Pham, S.K., Smirnova, E.: Pagerank based clustering of hypertext document collections. In: Proceedings of the 31st Annual International ACM Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 873–874. ACM (2008)
Google Scholar
Avrachenkov, K., Gonçalves, P., Legout, A., Sokol, M.: Classification of content and users in bittorrent by semi-supervised learning methods. In: 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC), Workshop on Traffic Analysis and Classification, pp. 625–630 (2012)
Google Scholar
Avrachenkov, K., Gonçalves, P., Mishenin, A., Sokol, M.: Generalized optimization framework for graph-based semi-supervised learning. In: Proceedings of SIAM Conference on Data Mining (SDM 2012), 9 pages (2012)
Google Scholar
Avrachenkov, K., Litvak, N.: The effect of new links on google pagerank. Stochastic Models 22(2) (2006)
Google Scholar
Blackwell, D.: Discrete dynamic programming. Ann. Math. Statist. 33, 719–726 (1962)
Article MathSciNet MATH Google Scholar
Guo, Z., Zhang, Z., Xing, E.P., Faloutsos, C.: Semi-supervised learning based on semiparametric regularization. In: SDM 2008 Proceedings, pp. 132–142 (2008)
Google Scholar
Haveliwala, T.H.: Topic-sensitive pagerank. In: Proceedings of the 11th International Conference on World Wide Web (WWW 2002), pp. 517–526 (2002)
Google Scholar
Kemeny, J.G., Snell, J.L.: Finite Markov chains, 1st edn. Springer (1976)
Google Scholar
Le Blond, S., Legout, A., Lefessant, F., Dabbous, W., Kaafar, M.A.: Spying the world from your laptop: identifying and profiling content providers and big downloaders in bittorrent. In: Proceedings of the 3rd USENIX Conference on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More, LEET 2010, p. 4. USENIX Association, Berkeley (2010)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)
Book MATH Google Scholar
Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, pp. 321–328. MIT Press (2004)
Google Scholar
Zhou, D., Burges, C.J.C.: Spectral clustering and transductive learning with multiple views. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 1159–1166. ACM (2007)
Google Scholar
Zhou, D., Schölkopf, B.: A regularization framework for learning from graph data. In: Proceedings of the Workshop on Statistical Relational Learning at Twenty-First International Conference on Machine Learning (ICML 2004), Canada, 6 pages (2004)
Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Technical report 1530, Department of computer sciences, University of wisconsin, Madison (2005)
Google Scholar
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 3(1), 1–130 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Inria Sophia Antipolis, 2004 Route des Lucioles, Sophia-Antipolis, France
Konstantin Avrachenkov & Marina Sokol
Inria Rhone-Alpes and ENS Lyon, 46 Allée Italie, Lyon, France
Paulo Gonçalves

Authors

Konstantin Avrachenkov
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Marina Sokol
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics, Ryerson University, Toronto, ON, Canada
Anthony Bonato & Paweł Prałat &
School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
Michael Mitzenmacher

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Avrachenkov, K., Gonçalves, P., Sokol, M. (2013). On the Choice of Kernel and Labelled Data in Semi-supervised Learning Methods. In: Bonato, A., Mitzenmacher, M., Prałat, P. (eds) Algorithms and Models for the Web Graph. WAW 2013. Lecture Notes in Computer Science, vol 8305. Springer, Cham. https://doi.org/10.1007/978-3-319-03536-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-03536-9_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03535-2
Online ISBN: 978-3-319-03536-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics