Skip to main content
Log in

An unsupervised approach to learn the kernel functions: from global influence to local similarity

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recently there has been a steep growth in the development of kernel-based learning algorithms. The intrinsic problem in such algorithms is the selection of the optimal kernel for the learning task of interest. In this paper, we propose an unsupervised approach to learn a linear combination of kernel functions, such that the resulting kernel best serves the objectives of the learning task. This is achieved through measuring the influence of each point on the structure of the dataset. This measure is calculated by constructing a weighted graph on which a random walk is performed. The measure of influence in the feature space is probabilistically related to the input space that yields an optimization problem to be solved. The optimization problem is formulated in two different convex settings, namely linear and semidefinite programming, dependent on the type of kernel combination considered. The contributions of this paper are twofold: first, a novel unsupervised approach to learn the kernel function, and second, a method to infer the local similarity represented by the kernel function by measuring the global influence of each point toward the structure of the dataset. The proposed approach focuses on the kernel selection which is independent of the kernel-based learning algorithm. The empirical evaluation of the proposed approach with various datasets shows the effectiveness of the algorithm in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Available at: http://www.people.csail.mit.edu/jrennie/20Newsgroups/.

References

  1. Amari S, Wu S (1999) Improving support vector machine classifiers by modifying kernal functions. Neural Netw 12(6):783–789. doi:10.1016/S0893-6080(99)00032-5

  2. Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the smo algorithm. In: ICML ’04: proceedings of the twenty-first international conference on Machine learning. ACM, New York, NY, USA, p 6. doi:10.1145/1015330.1015424

  3. Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2002) Support vector clustering. J Mach Learn Res 2:125–137

    Article  MATH  Google Scholar 

  4. Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167

    Article  Google Scholar 

  5. Burges CJC (1999) Geometry and invariance in kernel based methods. pp 89–116

  6. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/cjlin/libsvm

  7. Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30. doi:10.1016/j.acha.2006.04.006. http://www.dx.doi.org/10.1016/j.acha.2006.04.006

  8. Cristianini N, Shawe-taylor J, Elissee A, Kandola J (2002) On kernel-target alignment. In: Advances in neural information processing systems 14. MIT Press, pp 367–373

  9. Duan K, Keerthi SS, Poo AN (2003) Evaluation of simple performance measures for tuning svm hyperparameters. Neurocomputing 51:41–59. doi:10.1016/S0925-2312(02)00601-X. http://www.sciencedirect.com/science/article/B6V10-4625PVR-1/2/cc81e4581eca9413c3784c75c1ba0ee1

    Google Scholar 

  10. Graf A, Smola A, Borer S (2003) Classification in a normalized feature space using support vector machines. Neural Netw, IEEE Trans 14(3):597–605. doi:10.1109/TNN.2003.811708

    Article  Google Scholar 

  11. Grant M, Boyd S (2008) Graph implementations for nonsmooth convex programs. http://www.dx.doi.org/10.1007/978-1-84800-155-8_7

  12. Grant M, Boyd S (2009) Cvx: Matlab software for disciplined convex programming web page and software. http://www.stanford.edu/boyd/cvx

  13. Guo Y, Gao J, Kwan P (2008) Twin kernel embedding. Pattern Anal Mach Intell, IEEE Trans 30(8):1490–1495. doi:10.1109/TPAMI.2008.74

    Article  Google Scholar 

  14. Herbster M, Pontil M, Wainer L (2005) Online learning over graphs. In: ICML ’05: proceedings of the 22nd international conference on Machine learning. ACM, New York, NY, USA, pp 305–312. http://www.doi.acm.org/10.1145/1102351.1102390

  15. Kandola J, Shawe-Taylor J, Cristianini N (2002) On the extensions of kernel alignment. NC-TR-2002-120. http://www.eprints.ecs.soton.ac.uk/9745

  16. Kandola J, Shawe-Taylor J, Cristianini N (2002) Optimizing kernel alignment over combinations of kernel. http://www.eprints.ecs.soton.ac.uk/9746/

  17. Kim SJ, Zymnis A, Magnani A, Koh K, Boyd S (2008) Learning the kernel via convex optimization. pp 1997–2000. doi:10.1109/ICASSP.2008.4518030

  18. Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: Proceedings of the ICML, pp 315–322

  19. Kulis B, Sustik M, Dhillon I (2006) Learning low-rank kernel matrices. In: ICML ’06: Proceedings of the 23rd international conference on Machine learning. ACM, New York, NY, USA, pp 505–512. http://www.doi.acm.org/10.1145/1143844.1143908

  20. Vandenberghe L, Vandenberghe SB, Boyd S (1996) Semidefinite programming. SIAM Rev 38(1):49–95. http://www.jstor.org/stable/2132974

    Google Scholar 

  21. Lafon S, Lee A (2006) Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. Pattern Anal Mach Intell, IEEE Trans 28(9):1393–1403. doi:10.1109/TPAMI.2006.184

    Article  Google Scholar 

  22. Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72

    Google Scholar 

  23. Marie Szafranski YG, Rakotomamonjy A (2008) Composite kernel learning. In: ICML

  24. Meila M, Shi J (2001) A random walks view of spectral segmentation

  25. Micchelli CA, Pontil M (2005) Learning the kernel function via regularization. J Mach Learn Res 6:1099–1125

    MathSciNet  Google Scholar 

  26. Mika S, Ratsch G, Weston J, Scholkopf B, Mullers K (1999) Fisher discriminant analysis with kernels, pp 41–48. doi:10.1109/NNSP.1999.788121

  27. Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to kernel-based learning algorithms. Neural Netw, IEEE Trans 12(2):181–201. doi:10.1109/72.914517

    Article  Google Scholar 

  28. Nadler B, Lafon S, Coifman R, Kevrekidis I (2007) Diffusion maps-a probabilistic interpretation for spectral embedding and clustering algorithms. http://www.dx.doi.org/10.1007/978-3-540-73750-6_10

  29. Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. In: ICML ’07: Proceedings of the 24th international conference on Machine learning. ACM, New York, NY, USA, pp 775–782. http://www.doi.acm.org/10.1145/1273496.1273594

  30. Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comp 10(5):1299–1319. http://neco.mitpress.org/cgi/content/abstract/10/5/1299

    Google Scholar 

  31. Shaw B, Jebara T (2009) Structure preserving embedding. In: ICML ’09: Proceedings of the 26th annual international conference on machine learning. ACM, New York, NY, USA, pp 937–944.doi:10.1145/1553374.1553494

  32. Smola A, Hofmann T, Scholkopf B (2007) Kernel methods in machine learning. Ann Stat 36:1171–1220. http://eprints.pascal-network.org/archive/00003984

    Google Scholar 

  33. Society FRKCAM (1997) Spectral graph theory. Regional conference series in mathmatics. American Mathematical Society, Providence, RI

  34. Terence Sim SB, Bsat M (2001) The cmu pose, illumination, and expression (pie) database of human faces. Technical Reports CMU-RI-TR-01-02, Robotics Institute, Pittsburgh, PA

  35. Vapnik V (1999) An overview of statistical learning theory. Neural Netw, IEEE Trans 10(5):988–999. doi:10.1109/72.788640

    Article  Google Scholar 

  36. Weinberger KQ, Packer BD, Saul LK (2005) Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 381–388

  37. Weinberger KQ, Saul LK (2006) An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In: Unfolding, Proceedings of the 21st national conference on artificial intelligence. AAAI

  38. Weinberger KQ, Sha F, Saul LK (2004) Learning a kernel matrix for nonlinear dimensionality reduction. In: ICML ’04: Proceedings of the twenty-first international conference on Machine learning. ACM, New York, NY, USA, p 106. doi:10.1145/1015330.1015345

  39. Xiong H, Swamy M, Ahmad M (2005) Optimizing the kernel in the empirical feature space. Neural Netw, IEEE Trans 16(2):460–474. doi:10.1109/TNN.2004.841784

    Article  Google Scholar 

  40. Yang MH, Ahuj N, Kriegman D (2000) Face recognition using kernel eigenfaces. In: Image Processing, 2000. Proceedings. 2000 International Conference on, vol 1, pp 37–40. doi:10.1109/ICIP.2000.900886

  41. Zhang X, Lee WS (2006) Hyperparameter learning for graph based semi-supervised learning algorithms. In: NIPS. http://books.nips.cc/papers/files/nips19/NIPS2006_0148.pdf

  42. Zhengdong Lu, Dhillon PJI (2009) Geometry-aware metric learning. In: ICML ’09: Proceedings of the 26nd international conference on Machine learning. ACM

Download references

Acknowledgments

This research has been made possible through the Science Fund Grant “Delineation and 3D Visualization of Tumor and Risk Structures” (DVTRS), No: 1001/PKOMP/817001 by the Ministry of Science, Technology and Innovation of Malaysia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dhanesh Ramachandram.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abbasnejad, M.E., Ramachandram, D. & Mandava, R. An unsupervised approach to learn the kernel functions: from global influence to local similarity. Neural Comput & Applic 20, 703–715 (2011). https://doi.org/10.1007/s00521-010-0411-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-010-0411-7

Keywords

Navigation