Skip to main content
Log in

Multilayer Graph Node Kernels: Stacking While Maintaining Convexity

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Nowadays, developing effective techniques able to deal with data coming from structured domains is becoming crucial. In this context kernel methods are the state-of-the-art tool widely adopted in real-world applications that involve learning on structured data. Contrarily, when one has to deal with unstructured domains, deep learning methods represent a competitive, or even better, choice. In this paper we propose a new family of kernels for graphs which exploits an abstract representation of the information inspired by the multilayer perceptron architecture. Our proposal exploits the advantages of the two worlds. From one side we exploit the potentiality of the state-of-the-art graph node kernels. From the other side we develop a multilayer architecture through a series of stacked kernel pre-image estimators, trained in an unsupervised fashion via convex optimization. The hidden layers of the proposed framework are trained in a forward manner and this allows us to avoid the greedy layerwise training of classical deep learning. Results on real world graph datasets confirm the quality of the proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Abrahamsen TJ, Hansen LK (2011) Regularized pre-image estimation for kernel PCA de-noising: input space regularization and sparse reconstruction. J Signal Process Syst 65(3):403–412

    Article  Google Scholar 

  2. Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141

    MathSciNet  MATH  Google Scholar 

  3. Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2012) Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In: International workshop on ambient assisted living

  4. Anguita D, Ghio A, Oneto L, Ridella S (2012) In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Trans Neural Netw Learn Syst 23(9):1390–1406

    Article  Google Scholar 

  5. Anguita D, Ghio A, Oneto L, Ridella S (2012) In-sample model selection for trimmed hinge loss support vector machine. Neural Process Lett 36(3):275–283

    Article  Google Scholar 

  6. Anguita D, Ridella S, Sterpi D (2006) Testing the augmented binary multiclass svm on microarray data. In: International joint conference on neural networks

  7. Bakir G, Hofman T, Schölkopf B, Smola AJ, Taskar B, Vishwanathan SVN (2007) Predicting structured data. MIT Press, Cambridge

    Google Scholar 

  8. Bakir GH, Weston J, Schölkopf B (2004) Learning to find pre-images. In: Advances in neural information processing systems

  9. Ben-Israel A, Greville TNE (2003) Generalized inverses: theory and applications. Springer, Berlin

    MATH  Google Scholar 

  10. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    Article  Google Scholar 

  11. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  12. Caponnetto A (2005) A note on the role of squared loss in regression. CBCL, MIT, Cambridge

    Google Scholar 

  13. Chen BL, Li M, Wang JX, Wu FX (2014) Disease gene identification by using graph kernels and Markov random fields. Sci China Life Sci 57(11):1054–1063

    Article  Google Scholar 

  14. Cortes C, Vapnik C (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  15. Da San Martino G, Navarin N, Sperduti A (2016) Ordered decompositional DAG kernels enhancements. Neurocomputing 192:92–103

    Article  Google Scholar 

  16. Davie AM, Stothers AJ (2013) Improved bound for complexity of matrix multiplication. Proc R Soc Edinb: Sect A Math 143(02):351–369

    Article  MathSciNet  Google Scholar 

  17. Fouss F, Francoisse K, Yen L, Pirotte A, Saerens M (2012) An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Netw 31:53–72

    Article  Google Scholar 

  18. Fürnkranz J (2002) Round robin classification. J Mach Learn Res 2:721–747

    MathSciNet  MATH  Google Scholar 

  19. Gallagher B, Tong H, Eliassi-Rad T, Faloutsos C (2008) Using ghost edges for classification in sparsely labeled networks. In: International conference on Knowledge discovery and data mining, pp 256–264

  20. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  21. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  Google Scholar 

  22. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  23. Hofmann T, Scholkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220

    Article  MathSciNet  Google Scholar 

  24. Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425

    Article  Google Scholar 

  25. Kasun LLC, Zhou H, Huang GB, Vong CM (2013) Representational learning with elms for big data. IEEE Intell Syst 28(6):31–34

    Google Scholar 

  26. Kim Y, Lee H, Provost EM (2013) Deep learning for robust feature generation in audiovisual emotion recognition. In: IEEE International conference on acoustics, speech and signal processing, pp 3687–3691

  27. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence

  28. Lafferty RI, Kondor J (2002) Diffusion kernels on graphs and other discrete structures. In: International conference machine learning

  29. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031

    Article  Google Scholar 

  30. Mantrach A, Van Zeebroeck N, Francq P, Shimbo M, Bersini H, Saerens M (2011) Semi-supervised classification and betweenness computation on large, sparse, directed graphs. Pattern Recognit 44(6):1212–1224

    Article  Google Scholar 

  31. NetKit-SRL: Network Learning Toolkit for Statistical Relational Learning. http://netkit-srl.sourceforge.net/data.html. [Online; Accessed 1 Dec 2016]

  32. Neumann M, Garnett R, Kersting K (2013) Coinciding walk kernels: parallel absorbing random walks for learning with graphs and few labels. In: Asian conference on machine learning

  33. Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: International conference on Knowledge discovery and data mining

  34. Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):191–218

    Article  MathSciNet  Google Scholar 

  35. Reka A, Barabasi AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97

    Article  MathSciNet  Google Scholar 

  36. Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141

    MathSciNet  MATH  Google Scholar 

  37. Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16(5):1063–1076

    Article  Google Scholar 

  38. Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: International conference on computational learning theory. Springer, Berlin

    Google Scholar 

  39. Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

    Google Scholar 

  40. Smola AJ, Kondor R (2003) Kernels and regularization on graphs. In: Conference on learning theory

  41. Spitzer F (1981) Reversibility and stochastic networks. SIAM Rev 23(3):400–401

    Article  Google Scholar 

  42. Tang J, Deng C, Huang GB (2016) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821

    Article  MathSciNet  Google Scholar 

  43. Tikhonov AN, Arsenin VIA (1977) Solutions of ill-posed problems. Halsted Press, New York

    MATH  Google Scholar 

  44. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  45. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, pp 1096–1103

  46. WebKB Project: CMU World Wide Knowledge Base (Web-KB) project. http://www.cs.cmu.edu/~webkb/. Accessed 1 Dec 2016

  47. Yanardag P, Vishwanathan SVN (2015) Deep graph kernels. In: ACM SIGKDD international conference on knowledge discovery and data mining

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Oneto.

Appendix: Hyperparameters Selection Results

Appendix: Hyperparameters Selection Results

In this appendix we report the hyperparameters selected during the model selection phase of the experiments presented in Section 5.1. From Table 3, we can draw some observations. First, the SVMs C parameter is generally high, while the \(\lambda \) parameter of the pre-image estimator (the first layer) has more variability. This indicates that the regularization happens mostly in the first layer. Second, the kernel parameters over the different datasets tend to be pretty stable (in the same order of magnitude) for the architectures involving LEDK and RLK kernels. On the contrary, architectures involving MDK kernel tend to show more variability in the selected parameters.

Table 3 Best parameters for every dataset/kernels combination, for the proposed 2 layers architecture

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oneto, L., Navarin, N., Sperduti, A. et al. Multilayer Graph Node Kernels: Stacking While Maintaining Convexity. Neural Process Lett 48, 649–667 (2018). https://doi.org/10.1007/s11063-017-9742-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-017-9742-z

Keywords

Navigation