Skip to main content
Log in

Entropy Power Inequality for Learning Optimal Combination of Kernel Functions

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Kernel methods have become a standard solution for a large number of data analyses, and extensively utilized in the field of signal processing including analysis of speech, image, time series, and DNA sequences. The main difficulty in applying the kernel method is in designing the appropriate kernel for the specific data, and multiple kernel learning (MKL) is one of the principled approaches for kernel design problem. In this paper, a novel multiple kernel learning method based on the notion of Gaussianity evaluated by entropy power inequality is proposed. The notable characteristics of the proposed method are in utilizing the entropy power inequality for kernel learning, and in realizing an MKL algorithm which only optimizes the kernel combination coefficients, while the conventional methods need optimizing both combination coefficients and classifier parameters. The proposed kernel learning approach is shown to have good classification accuracy through a set of standard datasets for dichotomy classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

Notes

  1. http://www.cs.tsukuba.ac.jp/~hinohide/EPIK.zip

References

  1. Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337–404.

    Article  MATH  MathSciNet  Google Scholar 

  2. Beirlant, J., Dudewicz, E. J., Györfi, L., Meulen, E. C. (1997). Nonparametric Entropy Estimation: An Overview. International Journal of the Mathematical Statistics Sciences, 6, 17–39.

    MATH  Google Scholar 

  3. Bertsekas, D. P. (1996). Constrained Optimization and Lagrange Multiplier Methods (Optimization and Neural Computation Series), 1st edn: Athena Scientific.

  4. Cortes, C., Mohri, M., Rostamizadeh, A. (2012). Algorithms for Learning Kernels Based on Centered Alignment. Journal of Machine Learning Research, 13, 795–828.

    MATH  MathSciNet  Google Scholar 

  5. Cristianini, N., Shawe-Taylor, J., Kandola, J. (2002). On kernel target alignment. Advances in Neural Information Processing Systems, 14, 367–373.

    Google Scholar 

  6. Do, H., Kalousis, A., Woznica, A., Hilario, M. (2009). Margin and radius based multiple kernel learning. ECML/PKDD (1), 330–343.

  7. Duda, R. O., Hart, P. E., Stork, D. G. (2000). Pattern Classification: Wiley-Interscience Publication.

  8. Faivishevsky, L., & Goldberger, J (2009). ICA based on a smooth estimation of the differential entropy. Advances in Neural Information Processing Systems, 21, 433–440.

    Google Scholar 

  9. Faivishevsky, L., & Goldberger, J. (2010). A Nonparametric Information Theoretic Clustering Algorithm. International Conference on Machine Learning.

  10. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

    Article  Google Scholar 

  11. Fletcher, R. (1987). Practical methods of optimization, 2nd ed.: Wiley.

  12. Hino, H. (2010). A conditional entropy minimization criterion for dimensionality reduction and multiple kernel learning. Neural Computation, 22(11), 2887–2923.

    Article  MATH  MathSciNet  Google Scholar 

  13. Hino, H., & Reyhani, N. (2012). Multiple kernel learning with gaussianity measures. Neural Computatation, 24(7), 1853–1881.

    Article  MATH  Google Scholar 

  14. Hino, H., Wakayama, K., Murata, N. (2013). Entropy-based sliced inverse regression. Computational Statistics Data Analysis, 67, 105–114.

    Article  MathSciNet  Google Scholar 

  15. Izenman, A. J. (2008). Modern multivariate statistical techniques: regression, classification, and manifold learning: Springer.

  16. Kim, S., Magnani, A., Boyd, S. (2006). Optimal kernel selection in kernel fisher discriminant analysis. International Conference on Machine Learning, 465–472.

  17. Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E., Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.

    MATH  Google Scholar 

  18. Ledoux, M., & Talagrand, M. (1991). Probability in Banach Spaces: Springer.

  19. Lukacs, E. (1960). Characteristic Functions. London: Charles Griffin & Company, ltd.

    MATH  Google Scholar 

  20. Mehmet, G., & Ethem, A. (2011). Multiple Kernel Learning Algorithms. Journal of Machine Learning Research, 12, 2211–2268.

    MATH  Google Scholar 

  21. Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Müllers, K. R. (1999). Neural Networks for Signal Processing, IX, 41–48.

  22. R Development Core Team. (2011) R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria.

  23. Rakotomamonjy, A., Bach, F. R., Canu, S., Grandvalet, Y. (2008). SimpleMKL. Journal of Machine Learning Research, 9, 2491–2521.

    MATH  MathSciNet  Google Scholar 

  24. Reed, M., & Simon, B. (1981). Functional Analysis. California: Academic Press.

    Google Scholar 

  25. Reyhani, N. (2013). Multiple Spectral Kernel Learning and a Gaussian Complexity Computation. Neural Computation, 25(7), 1926–1951.

    Article  MathSciNet  Google Scholar 

  26. Rätsch, G., Onoda, T., Müller, K. R. (2001). Soft margins for adaboost. Machine Learning, 42(3), 287–320.

    Article  MATH  Google Scholar 

  27. Shannon, C. E. (1948). A mathematical theory of communication. Bell Systems Technical Journal, 27 379–423, 623–656.

    Article  MathSciNet  Google Scholar 

  28. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel Methods for Pattern Analysis. New York: Cambridge University Press.

    Book  Google Scholar 

  29. Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B. (2006). Large scale multiple kernel learning. Journal of Machine Learning Research, 7, 1531–1565.

    MATH  Google Scholar 

  30. Yan, F., Kittler, J., Mikolajczyk, K., Tahir, A. (2009). Non-sparse multiple kernel learning for fisher discriminant analysis. International Conference on Data Mining, 1064–1069.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hideitsu Hino.

Additional information

Part of this work is supported by JSPS Kakenhi No.25870811.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hino, H. Entropy Power Inequality for Learning Optimal Combination of Kernel Functions. J Sign Process Syst 79, 201–210 (2015). https://doi.org/10.1007/s11265-014-0899-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0899-7

Keywords

Navigation