A Hilbert Space Embedding for Distributions

Smola, Alex; Gretton, Arthur; Song, Le; Schölkopf, Bernhard

doi:10.1007/978-3-540-75225-7_5

Alex Smola⁴,
Arthur Gretton⁵,
Le Song⁴ &
…
Bernhard Schölkopf⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4754))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

3482 Accesses
202 Citations

Abstract

We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
MATH Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Kluwer Academic Publishers, Boston (2002)
Book Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley and Sons, New York (1991)
Book MATH Google Scholar
Amari, S., Nagaoka, H.: Methods of Information Geometry. Oxford University Press (1993)
Google Scholar
Krause, A., Guestrin, C.: Near-optimal nonmyopic value of information in graphical models. In: Uncertainty in Artificial Intelligence UAI 2005 (2005)
Google Scholar
Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 617–623. MIT Press, Cambridge (2000)
Google Scholar
Stögbauer, H., Kraskov, A., Astakhov, S., Grassberger, P.: Least dependent component analysis based on mutual information. Phys. Rev. E 70(6), 66123 (2004)
Article Google Scholar
Nemenman, I., Shafee, F., Bialek, W.: Entropy and inference, revisited. In: Neural Information Processing Systems, vol. 14, MIT Press, Cambridge (2002)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Schölkopf, B., Tsuda, K., Vert, J.P.: Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)
Google Scholar
Hofmann, T., Schölkopf, B., Smola, A.J.: A review of kernel methods in machine learning. Technical Report 156, Max-Planck-Institut für biologische Kybernetik (2006)
Google Scholar
Steinwart, I.: The influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research 2 (2002)
Google Scholar
Fukumizu, K., Bach, F.R., Jordan, M.I.: Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. J. Mach. Learn. Res. 5, 73–99 (2004)
MathSciNet MATH Google Scholar
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.J.: A kernel method for the two-sample-problem. In: Schölkopf, B., Platt, J., Hofmann, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, MIT Press, Cambridge (2007)
Google Scholar
Altun, Y., Smola, A.: Unifying divergence minimization and statistical inference via convex duality. In: Simon, H., Lugosi, G. (eds.) Proc. Annual Conf. Computational Learning Theory, pp. 139–153. Springer, Heidelberg (2006)
Google Scholar
Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2002)
MathSciNet MATH Google Scholar
Koltchinskii, V.: Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47, 1902–1914 (2001)
Article MathSciNet MATH Google Scholar
Vapnik, V., Chervonenkis, A.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16(2), 264–281 (1971)
Article MATH Google Scholar
Vapnik, V., Chervonenkis, A.: The necessary and sufficient conditions for the uniform convergence of averages to their expected values. Teoriya Veroyatnostei i Ee Primeneniya 26(3), 543–564 (1981)
MathSciNet MATH Google Scholar
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Technical Report 649, UC Berkeley, Department of Statistics (September 2003)
Google Scholar
Ravikumar, P., Lafferty, J.: Variational chernoff bounds for graphical models. In: Uncertainty in Artificial Intelligence UAI 2004 (2004)
Google Scholar
Altun, Y., Smola, A.J., Hofmann, T.: Exponential families for conditional random fields. In: Uncertainty in Artificial Intelligence (UAI), Arlington, Virginia, pp. 2–9. AUAI Press (2004)
Google Scholar
Hammersley, J.M., Clifford, P.E.: Markov fields on finite graphs and lattices (unpublished manuscript, 1971)
Google Scholar
Besag, J.: Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Stat. Soc. Ser. B Stat. Methodol. 36(B), 192–326 (1974)
MathSciNet MATH Google Scholar
Hein, M., Bousquet, O.: Hilbertian metrics and positive definite kernels on probability measures. In: Ghahramani, Z., Cowell, R. (eds.) Proc. of AI & Statistics, vol. 10 (2005)
Google Scholar
Serfling, R.: Approximation Theorems of Mathematical Statistics. Wiley, New York (1980)
Book MATH Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)
Article MathSciNet MATH Google Scholar
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel method for the two-sample-problem. In: Advances in Neural Information Processing Systems 19, MIT Press, Cambridge (2007)
Google Scholar
McDiarmid, C.: On the method of bounded differences. In: Surveys in Combinatorics, pp. 148–188. Cambridge University Press, Cambridge (1969)
Google Scholar
Anderson, N., Hall, P., Titterington, D.: Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. Journal of Multivariate Analysis 50, 41–54 (1994)
Article MathSciNet MATH Google Scholar
Grimmet, G.R., Stirzaker, D.R.: Probability and Random Processes, 3rd edn. Oxford University Press, Oxford (2001)
Google Scholar
Arcones, M., Giné, E.: On the bootstrap of u and v statistics. The Annals of Statistics 20(2), 655–674 (1992)
Article MathSciNet MATH Google Scholar
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, 2nd edn., vol. 1. John Wiley and Sons, Chichester (1994)
Google Scholar
Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14), e49–e57 (2006)
Google Scholar
Huang, J., Smola, A., Gretton, A., Borgwardt, K., Schölkopf, B.: Correcting sample selection bias by unlabeled data. In: Schölkopf, B., Platt, J., Hofmann, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, MIT Press, Cambridge (2007)
Google Scholar
Shimodaira, H.: Improving predictive inference under convariance shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90 (2000)
Google Scholar
Bottou, L., Vapnik, V.N.: Local learning algorithms. Neural Computation 4(6), 888–900 (1992)
Article Google Scholar
Comon, P.: Independent component analysis, a new concept? Signal Processing 36, 287–314 (1994)
Article MATH Google Scholar
Lee, T.W., Girolami, M., Bell, A., Sejnowski, T.: A unifying framework for independent component analysis. Comput. Math. Appl. 39, 1–21 (2000)
Article MathSciNet MATH Google Scholar
Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Mach. Learn. Res. 3, 1–48 (2002)
MathSciNet MATH Google Scholar
Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In: Jain, S., Simon, H.U., Tomita, E. (eds.) Proceedings Algorithmic Learning Theory, pp. 63–77. Springer, Heidelberg (2005)
Chapter Google Scholar
Gretton, A., Herbrich, R., Smola, A., Bousquet, O., Schölkopf, B.: Kernel methods for measuring independence. J. Mach. Learn. Res. 6, 2075–2129 (2005)
MathSciNet MATH Google Scholar
Shen, H., Jegelka, S., Gretton, A.: Fast kernel ICA using an approximate newton method. In: AISTATS 11 (2007)
Google Scholar
Feuerverger, A.: A consistent test for bivariate dependence. International Statistical Review 61(3), 419–433 (1993)
Article MATH Google Scholar
Kankainen, A.: Consistent Testing of Total Independence Based on the Empirical Characteristic Function. PhD thesis, University of Jyväskylä (1995)
Google Scholar
Burges, C.J.C., Vapnik, V.: A new method for constructing artificial neural networks. Interim technical report, ONR contract N00014-94-c-0186, AT&T Bell Laboratories (1995)
Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)
MATH Google Scholar
Anemuller, J., Duann, J.R., Sejnowski, T.J., Makeig, S.: Spatio-temporal dynamics in fmri recordings revealed with complex independent component analysis. Neurocomputing 69, 1502–1512 (2006)
Article Google Scholar
Schölkopf, B.: Support Vector Learning. R. Oldenbourg Verlag, Munich (1997), http://www.kernel-machines.org
MATH Google Scholar
Song, L., Smola, A., Gretton, A., Borgwardt, K., Bedo, J.: Supervised feature selection via dependence estimation. In: Proc. Intl. Conf. Machine Learning (2007)
Google Scholar
Song, L., Bedo, J., Borgwardt, K., Gretton, A., Smola, A.: Gene selection via the BAHSIC family of algorithms. In: Bioinformatics (ISMB) (to appear, 2007)
Google Scholar
van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A.M., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Article Google Scholar
Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA 103(15), 5923–5928 (2006)
Article Google Scholar
Bedo, J., Sanderson, C., Kowalczyk, A.: An efficient alternative to svm based recursive feature elimination with applications in natural language processing and bioinformatics. Artificial Intelligence (2006)
Google Scholar
Smyth, G.: Linear models and empirical bayes methods for assessing differential expressionin microarray experiments. Statistical Applications in Genetics and Molecular Biology 3 (2004)
Google Scholar
Lönnstedt, I., Speed, T.: Replicated microarray data. Statistica Sinica 12, 31–46 (2002)
MathSciNet MATH Google Scholar
Dudík, M., Schapire, R., Phillips, S.: Correcting sample selection bias in maximum entropy density estimation. Advances in Neural Information Processing Systems 17 (2005)
Google Scholar
Dudík, M., Schapire, R.E.: Maximum entropy distribution estimation with generalized regularization. In: Lugosi, G., Simon, H.U. (eds.) Proc. Annual Conf. Computational Learning Theory, Springer, Heidelberg (2006)
Google Scholar
Hlawka, E.: Funktionen von beschränkter variation in der theorie der gleichverteilung. Annali di Mathematica Pura ed Applicata 54 (1961)
Google Scholar
Gärtner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. IN: Proc. Intl. Conf. Machine Learning (2002)
Google Scholar
Jebara, T., Kondor, I.: Bhattacharyya and expected likelihood kernels. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 57–71. Springer, Heidelberg (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

National ICT Australia, North Road, Canberra 0200 ACT, Australia
Alex Smola & Le Song
MPI for Biological Cybernetics, Spemannstr. 38, 72076 Tübingen, Germany
Arthur Gretton & Bernhard Schölkopf

Authors

Alex Smola
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Gretton
View author publications
You can also search for this author in PubMed Google Scholar
Le Song
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Schölkopf
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

RSISE @ ANU and SML @ NICTA, Canberra,, ACT, 0200, Australia
Marcus Hutter
Columbia University, NY, P.O. Box, New York, USA
Rocco A. Servedio
Graduate School of Information Sciences, Tohoku University,, Sendai 980-8579, Japan
Eiji Takimoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smola, A., Gretton, A., Song, L., Schölkopf, B. (2007). A Hilbert Space Embedding for Distributions. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds) Algorithmic Learning Theory. ALT 2007. Lecture Notes in Computer Science(), vol 4754. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75225-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-75225-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75224-0
Online ISBN: 978-3-540-75225-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics