Skip to main content
Log in

Kernel learning and optimization with Hilbert–Schmidt independence criterion

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Measures of statistical dependence between random variables have been successfully applied in many machine learning tasks, such as independent component analysis, feature selection, clustering and dimensionality reduction. The success is based on the fact that many existing learning tasks can be cast into problems of dependence maximization (or minimization). Motivated by this, we present a unifying view of kernel learning via statistical dependence estimation. The key idea is that good kernels should maximize the statistical dependence between the kernels and the class labels. The dependence is measured by the Hilbert–Schmidt independence criterion (HSIC), which is based on computing the Hilbert–Schmidt norm of the cross-covariance operator of mapped samples in the corresponding Hilbert spaces and is traditionally used to measure the statistical dependence between random variables. As a special case of kernel learning, we propose a Gaussian kernel optimization method for classification by maximizing the HSIC, where two forms of Gaussian kernels (spherical kernel and ellipsoidal kernel) are considered. Extensive experiments on real-world data sets from UCI benchmark repository validate the superiority of the proposed approach in terms of both prediction accuracy and computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  2. Gao X, Fan L, Xu H (2015) Multiple rank multi-linear kernel support vector machine for matrix data classification. Int J Mach Learn Cybern. doi:10.1007/s13042-015-0383-0

    Google Scholar 

  3. Wang T, Tian S, Huang H, Deng D (2009) Learning by local kernel polarization. Neurocomputing 72(13–15):3077–3084

    Article  Google Scholar 

  4. Gönen M, Alpayın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268

    MathSciNet  MATH  Google Scholar 

  5. Pan B, Chen WS, Xu C, Chen B (2016) A novel framework for learning geometry-aware kernels. IEEE Trans Neural Netw Learn Syst 27(5):939–951

    Article  MathSciNet  Google Scholar 

  6. Fukumizu K, Gretton A, Sun X, Schölkopf B (2007) Kernel measures of conditional dependence. Adv Neural Inf Process Syst 20:489–496

    Google Scholar 

  7. Gretton A, Fukumizu K, Teo CH, Song L, Schölkopf B, Smola A (2007) A kernel statistical test of independence. Adv Neural Inf Process Syst 20:585–592

    Google Scholar 

  8. Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012) A kernel two-sample test. J Mach Learn Res 13:723–773

    MathSciNet  MATH  Google Scholar 

  9. Chwialkowski K, Gretton A (2014) A kernel independence test of random process. In: Proceedings of the 31th International Conference on Machine Learning, Beijing, China, pp 1422–1430

  10. Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48

    MathSciNet  MATH  Google Scholar 

  11. Gretton A, Smola A, Bousquet O, Herbrich R, Belitski A, Augath M, Murayama Y, Pauls J, Schölkopf B, Logothetis NK (2005) Kernel constrained covariance for dependence measurement. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, Bridgetown, Barbados, pp 112–119

  12. Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert-Schmidt norms. In: Proceedings of the 16th International Conference on Algorithmic Learning Theory, Singapore, pp 63–77

  13. Song L, Smola A, Gretton A, Borgwardt K (2007) A dependence maximization view of clustering. In: Proceedings of the 24th International Conference on Machine Learning, Corvallis, USA, pp 823–830

  14. Zhong W, Pan W, Kwok JT, Tsang IW (2010) Incorporating the loss function into discriminative clustering of structured outputs. IEEE Trans Neural Netw 21(10):1564–1575

    Article  Google Scholar 

  15. Camps-Valls G, Mooij J, Schölkopf B (2010) Remote sensing feature selection by kernel dependence measures. IEEE Geosci Remote Sens Lett 7(3):587–591

    Article  Google Scholar 

  16. Song L, Smola A, Gretton A, Bedo J, Borgwardt K (2012) Feature selection via dependence maximization. J Mach Learn Res 13:1393–1434

    MathSciNet  MATH  Google Scholar 

  17. Chen J, Ji S, Ceran B, Li Q, Wu M, Ye J (2008) Learning subspace kernels for classification. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp 106–114

  18. Shu X, Lai D, Xu H, Tao L (2015) Learning shared subspace for multi-label dimensionality reduction via dependence maximization. Neurocomputing 168:356–364

    Article  Google Scholar 

  19. Chapelle O, Vapnik V, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1):131–159

    Article  MATH  Google Scholar 

  20. Keerthi SS (2002) Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. IEEE Trans Neural Netw 13(5):1225–1229

    Article  Google Scholar 

  21. Liu Y, Liao S, Hou Y (2011) Learning kernels with upper bounds of leave-one-out error. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management, Glasgow, United Kingdom, pp 2205–2208

  22. Cristianini N, Shawe-Taylor J, Elisseeff A, Kandola J (2001) On kernel-target alignment. Adv Neural Inf Process Syst 14:367–373

    Google Scholar 

  23. Cortes C, Mohri M, Rostamizadeh A (2012) Algorithms for learning kernels based on centered alignment. J Mach Learn Res 13:795–828

    MathSciNet  MATH  Google Scholar 

  24. Wang T, Zhao D, Tian S (2015) An overview of kernel alignment and its applications. Artif Intell Rev 43(2):179–192

    Article  Google Scholar 

  25. Baram Y (2005) Learning by kernel polarization. Neural Comput 17(6):1264–1275

    Article  MathSciNet  MATH  Google Scholar 

  26. Wang T, Zhao D, Feng Y (2013) Two-stage multiple kernel learning with multiclass kernel polarization. Knowl-Based Syst 48:10–16

    Article  Google Scholar 

  27. Nguyen CH, Ho TB (2008) An efficient kernel matrix evaluation measure. Pattern Recognit 41(11):3366–3372

    Article  MATH  Google Scholar 

  28. Wang L (2008) Feature selection with kernel class separability. IEEE Trans Pattern Anal Mach Intell 30(9):1534–1546

    Article  Google Scholar 

  29. Steinwart I (2001) On the influence of the kernels on the consistency of support vector machines. J Mach Learn Res 2:67–93

    MathSciNet  MATH  Google Scholar 

  30. Sugiyama M (2012) On kernel parameter selection in Hilbert-Schmidt independence criterion. IEICE Trans Inf Syst E95-D(10):2564–2567

    Article  Google Scholar 

  31. Lu Y, Wang L, Lu J, Yang J, Shen C (2014) Multiple kernel clustering based on centered kernel alignment. Pattern Recognit 47(11):3656–3664

    Article  MATH  Google Scholar 

  32. Neumann J, Schnörr C, Steidl G (2005) Combined SVM-based feature selection and classification. Mach Learn 61(1–3):129–150

    Article  MATH  Google Scholar 

  33. Lichman M (2013) UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml/

  34. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27. http://www.csie.ntu.edu.tw/~cjlin/libsvm

  35. Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425

    Article  Google Scholar 

  36. Chen PH, Lin CJ, Chölkopf B (2005) A tutorial on—support vector machines. Appl Stoch Models Bus Ind 21(2):111–136

    Article  MathSciNet  Google Scholar 

  37. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  38. Chen Z, Haykin S (2002) On different facets of regularization theory. Neural Comput 14(12):2791–2846

    Article  MATH  Google Scholar 

  39. Liu P, Huang Y, Meng L, Gong S, Zhang G (2016) Two-stage extreme learning machine for high-dimensional data. Int J Mach Learn Cybern 7(5):765–772

    Article  Google Scholar 

  40. Chen C, Zhang J, He X, Zhou ZH (2012) Non-parametric kernel learning with robust pairwise constraints. Int J Mach Learn Cybern 3(2):83–96

    Article  Google Scholar 

  41. Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471

    Article  Google Scholar 

  42. Yamada M, Jitkrittum W, Sigal L, Xing EP, Sugiyama M (2014) High-dimensional feature selection by feature-wise kernelized Lasso. Neural Comput 26(1):185–207

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (No. 61562003), the Natural Science Foundation of Jiangxi Province of China (No. 20161BAB202070), and the China Scholarship Council (No. 201508360144). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tinghua Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T., Li, W. Kernel learning and optimization with Hilbert–Schmidt independence criterion. Int. J. Mach. Learn. & Cyber. 9, 1707–1717 (2018). https://doi.org/10.1007/s13042-017-0675-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-017-0675-7

Keywords

Navigation