Skip to main content

On the Choice of Kernel and Labelled Data in Semi-supervised Learning Methods

  • Conference paper
Algorithms and Models for the Web Graph (WAW 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8305))

Included in the following conference series:

  • 1025 Accesses

Abstract

Semi-supervised learning methods constitute a category of machine learning methods which use labelled points together with unlabelled data to tune the classifier. The main idea of the semi-supervised methods is based on an assumption that the classification function should change smoothly over a similarity graph, which represents relations among data points. This idea can be expressed using kernels on graphs such as graph Laplacian. Different semi-supervised learning methods have different kernels which reflect how the underlying similarity graph influences the classification results. In the present work, we analyse a general family of semi-supervised methods, provide insights about the differences among the methods and give recommendations for the choice of the kernel parameters and labelled points. In particular, it appears that it is preferable to choose a kernel based on the properties of the labelled points. We illustrate our general theoretical conclusions with an analytically tractable characteristic example, clustered preferential attachment model and classification of content in P2P networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Andersen, R., Chung, F., Lang, K.: Using pagerank to locally partition a graph. Internet Mathematics 4(1), 35–64 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  2. Avrachenkov, K.: Analytic Perturbation Theory and its Applications, PhD Thesis. University of South Australia, Adelaide, Australia (1999)

    Google Scholar 

  3. Avrachenkov, K., Dobrynin, V., Nemirovsky, D., Pham, S.K., Smirnova, E.: Pagerank based clustering of hypertext document collections. In: Proceedings of the 31st Annual International ACM Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 873–874. ACM (2008)

    Google Scholar 

  4. Avrachenkov, K., Gonçalves, P., Legout, A., Sokol, M.: Classification of content and users in bittorrent by semi-supervised learning methods. In: 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC), Workshop on Traffic Analysis and Classification, pp. 625–630 (2012)

    Google Scholar 

  5. Avrachenkov, K., Gonçalves, P., Mishenin, A., Sokol, M.: Generalized optimization framework for graph-based semi-supervised learning. In: Proceedings of SIAM Conference on Data Mining (SDM 2012), 9 pages (2012)

    Google Scholar 

  6. Avrachenkov, K., Litvak, N.: The effect of new links on google pagerank. Stochastic Models 22(2) (2006)

    Google Scholar 

  7. Blackwell, D.: Discrete dynamic programming. Ann. Math. Statist. 33, 719–726 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  8. Guo, Z., Zhang, Z., Xing, E.P., Faloutsos, C.: Semi-supervised learning based on semiparametric regularization. In: SDM 2008 Proceedings, pp. 132–142 (2008)

    Google Scholar 

  9. Haveliwala, T.H.: Topic-sensitive pagerank. In: Proceedings of the 11th International Conference on World Wide Web (WWW 2002), pp. 517–526 (2002)

    Google Scholar 

  10. Kemeny, J.G., Snell, J.L.: Finite Markov chains, 1st edn. Springer (1976)

    Google Scholar 

  11. Le Blond, S., Legout, A., Lefessant, F., Dabbous, W., Kaafar, M.A.: Spying the world from your laptop: identifying and profiling content providers and big downloaders in bittorrent. In: Proceedings of the 3rd USENIX Conference on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More, LEET 2010, p. 4. USENIX Association, Berkeley (2010)

    Google Scholar 

  12. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)

    Book  MATH  Google Scholar 

  13. Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, pp. 321–328. MIT Press (2004)

    Google Scholar 

  14. Zhou, D., Burges, C.J.C.: Spectral clustering and transductive learning with multiple views. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 1159–1166. ACM (2007)

    Google Scholar 

  15. Zhou, D., Schölkopf, B.: A regularization framework for learning from graph data. In: Proceedings of the Workshop on Statistical Relational Learning at Twenty-First International Conference on Machine Learning (ICML 2004), Canada, 6 pages (2004)

    Google Scholar 

  16. Zhu, X.: Semi-supervised learning literature survey. Technical report 1530, Department of computer sciences, University of wisconsin, Madison (2005)

    Google Scholar 

  17. Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 3(1), 1–130 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Avrachenkov, K., Gonçalves, P., Sokol, M. (2013). On the Choice of Kernel and Labelled Data in Semi-supervised Learning Methods. In: Bonato, A., Mitzenmacher, M., Prałat, P. (eds) Algorithms and Models for the Web Graph. WAW 2013. Lecture Notes in Computer Science, vol 8305. Springer, Cham. https://doi.org/10.1007/978-3-319-03536-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03536-9_5

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03535-2

  • Online ISBN: 978-3-319-03536-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics