Skip to main content

A Hilbert Space Embedding for Distributions

  • Conference paper
Book cover Algorithmic Learning Theory (ALT 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4754))

Included in the following conference series:

Abstract

We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  2. Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)

    MATH  Google Scholar 

  3. Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Kluwer Academic Publishers, Boston (2002)

    Book  Google Scholar 

  4. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  5. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley and Sons, New York (1991)

    Book  MATH  Google Scholar 

  6. Amari, S., Nagaoka, H.: Methods of Information Geometry. Oxford University Press (1993)

    Google Scholar 

  7. Krause, A., Guestrin, C.: Near-optimal nonmyopic value of information in graphical models. In: Uncertainty in Artificial Intelligence UAI 2005 (2005)

    Google Scholar 

  8. Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 617–623. MIT Press, Cambridge (2000)

    Google Scholar 

  9. Stögbauer, H., Kraskov, A., Astakhov, S., Grassberger, P.: Least dependent component analysis based on mutual information. Phys. Rev. E 70(6), 66123 (2004)

    Article  Google Scholar 

  10. Nemenman, I., Shafee, F., Bialek, W.: Entropy and inference, revisited. In: Neural Information Processing Systems, vol. 14, MIT Press, Cambridge (2002)

    Google Scholar 

  11. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  12. Schölkopf, B., Tsuda, K., Vert, J.P.: Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)

    Google Scholar 

  13. Hofmann, T., Schölkopf, B., Smola, A.J.: A review of kernel methods in machine learning. Technical Report 156, Max-Planck-Institut für biologische Kybernetik (2006)

    Google Scholar 

  14. Steinwart, I.: The influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research 2 (2002)

    Google Scholar 

  15. Fukumizu, K., Bach, F.R., Jordan, M.I.: Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. J. Mach. Learn. Res. 5, 73–99 (2004)

    MathSciNet  MATH  Google Scholar 

  16. Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.J.: A kernel method for the two-sample-problem. In: Schölkopf, B., Platt, J., Hofmann, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, MIT Press, Cambridge (2007)

    Google Scholar 

  17. Altun, Y., Smola, A.: Unifying divergence minimization and statistical inference via convex duality. In: Simon, H., Lugosi, G. (eds.) Proc. Annual Conf. Computational Learning Theory, pp. 139–153. Springer, Heidelberg (2006)

    Google Scholar 

  18. Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2002)

    MathSciNet  MATH  Google Scholar 

  19. Koltchinskii, V.: Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47, 1902–1914 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  20. Vapnik, V., Chervonenkis, A.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16(2), 264–281 (1971)

    Article  MATH  Google Scholar 

  21. Vapnik, V., Chervonenkis, A.: The necessary and sufficient conditions for the uniform convergence of averages to their expected values. Teoriya Veroyatnostei i Ee Primeneniya 26(3), 543–564 (1981)

    MathSciNet  MATH  Google Scholar 

  22. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Technical Report 649, UC Berkeley, Department of Statistics (September 2003)

    Google Scholar 

  23. Ravikumar, P., Lafferty, J.: Variational chernoff bounds for graphical models. In: Uncertainty in Artificial Intelligence UAI 2004 (2004)

    Google Scholar 

  24. Altun, Y., Smola, A.J., Hofmann, T.: Exponential families for conditional random fields. In: Uncertainty in Artificial Intelligence (UAI), Arlington, Virginia, pp. 2–9. AUAI Press (2004)

    Google Scholar 

  25. Hammersley, J.M., Clifford, P.E.: Markov fields on finite graphs and lattices (unpublished manuscript, 1971)

    Google Scholar 

  26. Besag, J.: Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Stat. Soc. Ser. B Stat. Methodol. 36(B), 192–326 (1974)

    MathSciNet  MATH  Google Scholar 

  27. Hein, M., Bousquet, O.: Hilbertian metrics and positive definite kernels on probability measures. In: Ghahramani, Z., Cowell, R. (eds.) Proc. of AI & Statistics, vol. 10 (2005)

    Google Scholar 

  28. Serfling, R.: Approximation Theorems of Mathematical Statistics. Wiley, New York (1980)

    Book  MATH  Google Scholar 

  29. Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  30. Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel method for the two-sample-problem. In: Advances in Neural Information Processing Systems 19, MIT Press, Cambridge (2007)

    Google Scholar 

  31. McDiarmid, C.: On the method of bounded differences. In: Surveys in Combinatorics, pp. 148–188. Cambridge University Press, Cambridge (1969)

    Google Scholar 

  32. Anderson, N., Hall, P., Titterington, D.: Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. Journal of Multivariate Analysis 50, 41–54 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  33. Grimmet, G.R., Stirzaker, D.R.: Probability and Random Processes, 3rd edn. Oxford University Press, Oxford (2001)

    Google Scholar 

  34. Arcones, M., Giné, E.: On the bootstrap of u and v statistics. The Annals of Statistics 20(2), 655–674 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  35. Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, 2nd edn., vol. 1. John Wiley and Sons, Chichester (1994)

    Google Scholar 

  36. Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14), e49–e57 (2006)

    Google Scholar 

  37. Huang, J., Smola, A., Gretton, A., Borgwardt, K., Schölkopf, B.: Correcting sample selection bias by unlabeled data. In: Schölkopf, B., Platt, J., Hofmann, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, MIT Press, Cambridge (2007)

    Google Scholar 

  38. Shimodaira, H.: Improving predictive inference under convariance shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90 (2000)

    Google Scholar 

  39. Bottou, L., Vapnik, V.N.: Local learning algorithms. Neural Computation 4(6), 888–900 (1992)

    Article  Google Scholar 

  40. Comon, P.: Independent component analysis, a new concept? Signal Processing 36, 287–314 (1994)

    Article  MATH  Google Scholar 

  41. Lee, T.W., Girolami, M., Bell, A., Sejnowski, T.: A unifying framework for independent component analysis. Comput. Math. Appl. 39, 1–21 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  42. Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Mach. Learn. Res. 3, 1–48 (2002)

    MathSciNet  MATH  Google Scholar 

  43. Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In: Jain, S., Simon, H.U., Tomita, E. (eds.) Proceedings Algorithmic Learning Theory, pp. 63–77. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  44. Gretton, A., Herbrich, R., Smola, A., Bousquet, O., Schölkopf, B.: Kernel methods for measuring independence. J. Mach. Learn. Res. 6, 2075–2129 (2005)

    MathSciNet  MATH  Google Scholar 

  45. Shen, H., Jegelka, S., Gretton, A.: Fast kernel ICA using an approximate newton method. In: AISTATS 11 (2007)

    Google Scholar 

  46. Feuerverger, A.: A consistent test for bivariate dependence. International Statistical Review 61(3), 419–433 (1993)

    Article  MATH  Google Scholar 

  47. Kankainen, A.: Consistent Testing of Total Independence Based on the Empirical Characteristic Function. PhD thesis, University of Jyväskylä (1995)

    Google Scholar 

  48. Burges, C.J.C., Vapnik, V.: A new method for constructing artificial neural networks. Interim technical report, ONR contract N00014-94-c-0186, AT&T Bell Laboratories (1995)

    Google Scholar 

  49. Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)

    MATH  Google Scholar 

  50. Anemuller, J., Duann, J.R., Sejnowski, T.J., Makeig, S.: Spatio-temporal dynamics in fmri recordings revealed with complex independent component analysis. Neurocomputing 69, 1502–1512 (2006)

    Article  Google Scholar 

  51. Schölkopf, B.: Support Vector Learning. R. Oldenbourg Verlag, Munich (1997), http://www.kernel-machines.org

    MATH  Google Scholar 

  52. Song, L., Smola, A., Gretton, A., Borgwardt, K., Bedo, J.: Supervised feature selection via dependence estimation. In: Proc. Intl. Conf. Machine Learning (2007)

    Google Scholar 

  53. Song, L., Bedo, J., Borgwardt, K., Gretton, A., Smola, A.: Gene selection via the BAHSIC family of algorithms. In: Bioinformatics (ISMB) (to appear, 2007)

    Google Scholar 

  54. van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A.M., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)

    Article  Google Scholar 

  55. Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA 103(15), 5923–5928 (2006)

    Article  Google Scholar 

  56. Bedo, J., Sanderson, C., Kowalczyk, A.: An efficient alternative to svm based recursive feature elimination with applications in natural language processing and bioinformatics. Artificial Intelligence (2006)

    Google Scholar 

  57. Smyth, G.: Linear models and empirical bayes methods for assessing differential expressionin microarray experiments. Statistical Applications in Genetics and Molecular Biology 3 (2004)

    Google Scholar 

  58. Lönnstedt, I., Speed, T.: Replicated microarray data. Statistica Sinica 12, 31–46 (2002)

    MathSciNet  MATH  Google Scholar 

  59. Dudík, M., Schapire, R., Phillips, S.: Correcting sample selection bias in maximum entropy density estimation. Advances in Neural Information Processing Systems 17 (2005)

    Google Scholar 

  60. Dudík, M., Schapire, R.E.: Maximum entropy distribution estimation with generalized regularization. In: Lugosi, G., Simon, H.U. (eds.) Proc. Annual Conf. Computational Learning Theory, Springer, Heidelberg (2006)

    Google Scholar 

  61. Hlawka, E.: Funktionen von beschränkter variation in der theorie der gleichverteilung. Annali di Mathematica Pura ed Applicata 54 (1961)

    Google Scholar 

  62. Gärtner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. IN: Proc. Intl. Conf. Machine Learning (2002)

    Google Scholar 

  63. Jebara, T., Kondor, I.: Bhattacharyya and expected likelihood kernels. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 57–71. Springer, Heidelberg (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Smola, A., Gretton, A., Song, L., Schölkopf, B. (2007). A Hilbert Space Embedding for Distributions. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds) Algorithmic Learning Theory. ALT 2007. Lecture Notes in Computer Science(), vol 4754. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75225-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75225-7_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75224-0

  • Online ISBN: 978-3-540-75225-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics