Direct Approximation of Divergences Between Probability Distributions

Sugiyama, Masashi

doi:10.1007/978-3-642-41136-6_23

Masashi Sugiyama⁴

3836 Accesses
1 Citations

Abstract

Approximating a divergence between two probability distributions from their samples is a fundamental challenge in the statistics, information theory, and machine learning communities, because a divergence estimator can be used for various purposes such as two-sample homogeneity testing, change-point detection, and class-balance estimation. Furthermore, an approximator of a divergence between the joint distribution and the product of marginals can be used for independence testing, which has a wide range of applications including feature selectionFeature selection and extraction, clusteringClustering, object matching, independent component analysisIndependent component analysis (ICA), and causalityCausality learning. In this chapter, we review recent advances in direct divergence approximation that follow the general inference principle advocated by Vladimir Vapnik—one should not solve a more general problem as an intermediate step. More specifically, direct divergence approximation avoids separately estimating two probability distributions when approximating a divergence. We cover direct approximators of the Kullback–Leibler (KL) divergence, the Pearson (PE) divergence, the relative PE (rPE) divergence, and the L ²-distance. Despite the overwhelming popularity of the KL divergence, we argue that the latter approximators are more useful in practice due to their computational efficiency, high numerical stabilityStability, and superior robustness against outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ali, S., Silvey, S.: A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. B 28(1), 131–142 (1966)
MathSciNet MATH Google Scholar
Amari, S., Nagaoka, H.: Methods of Information Geometry. Oxford University Press, Providence (2000)
MATH Google Scholar
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Google Scholar
Cortes, C., Mansour, Y., Mohri, M.: Learning bounds for importance weighting. In: Lafferty, J., Williams, C., Zemel, R., Shawe-Taylor, J., Culotta, A. (eds.) Advances in Neural Information Processing Systems, Vancouver, vol. 23, pp. 442–450 (2010)
Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley, Hoboken (2006)
MATH Google Scholar
Csiszár, I.: Information-type measures of difference of probability distributions and indirect observation. Stud. Sci. Math. Hungarica 2, 229–318 (1967)
Google Scholar
du Plessis, M., Sugiyama, M.: Semi-supervised learning of class balance under class-prior change by distribution matching. In: Proceedings of 29th International Conference on Machine Learning (ICML’12), Edinburgh, pp. 823–830 (2012)
Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article MathSciNet MATH Google Scholar
Hoerl, A., Kennard, R.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(3), 55–67 (1970)
Article MATH Google Scholar
Jitkrittum, W., Hachiya, H., Sugiyama, M.: Feature selection via ℓ ₁-penalized squared-loss mutual information. IEICE Trans. Inf. Syst. E96-D(7), 1513–1524 (2013)
Article Google Scholar
Kanamori, T., Hido, S., Sugiyama, M.: A least-squares approach to direct importance estimation. J. Mach. Learn. Res. 10, 1391–1445 (2009)
MathSciNet MATH Google Scholar
Kanamori, T., Suzuki, T., Sugiyama, M.: f-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models. IEEE Trans. Inf. Theory 58(2), 708–720 (2012)
Google Scholar
Karasuyama, M., Sugiyama, M.: Canonical dependency analysis based on squared-loss mutual information. Neural Netw. 34, 46–55 (2012)
Article MATH Google Scholar
Kawahara, Y., Sugiyama, M.: Sequential change-point detection based on direct density-ratio estimation. Stat. Anal. Data Min. 5(2), 114–127 (2012)
Article MathSciNet Google Scholar
Keziou, A.: Dual representation of ϕ-divergences and applications. C. R. Math. 336(10), 857–862 (2003)
Article MathSciNet MATH Google Scholar
Kimura, M., Sugiyama, M.: Dependence-maximization clustering with least-squares mutual information. J. Adv. Comput. Intell. Intell. Inform. 15(7), 800–805 (2011)
Google Scholar
Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Article MathSciNet MATH Google Scholar
Liu, S., Yamada, M., Collier, N., Sugiyama, M.: Change-point detection in time-series data by relative density-ratio estimations. Neural Netw. 43, 72–83 (2013)
Article Google Scholar
Nguyen, X., Wainwright, M., Jordan, M.: Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theory 56(11), 5847–5861 (2010)
Article MathSciNet Google Scholar
Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 5 50(302), 157–175 (1900)
Google Scholar
Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948)
Article MathSciNet MATH Google Scholar
Sugiyama, M.: Machine learning with squared-loss mutual information. Entropy 15(1), 80–112 (2013)
Article MathSciNet Google Scholar
Sugiyama, M., Suzuki, T.: Least-squares independence test. IEICE Trans. Inf. Syst. E94-D(6), 1333–1336 (2011)
Article Google Scholar
Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H., von Bünau, P., Kawanabe, M.: Direct importance estimation for covariate shift adaptation. Ann. Inst. Stat. Math. 60(4), 699–746 (2008)
Article MATH Google Scholar
Sugiyama, M., Kawanabe, M., Chui, P.: Dimensionality reduction for density ratio estimation in high-dimensional spaces. Neural Netw. 23, 44–59 (2010)
Article Google Scholar
Sugiyama, M., Suzuki, T., Itoh, Y., Kanamori, T., Kimura, M.: Least-squares two-sample test. Neural Netw. 24(7), 735–751 (2011)
Article MATH Google Scholar
Sugiyama, M., Yamada, M., von Bünau, P., Suzuki, T., Kanamori, T., Kawanabe, M.: Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search. Neural Netw. 24(2), 183–198 (2011)
Article MATH Google Scholar
Sugiyama, M., Yamada, M., Kimura, M., Hachiya, H.: On information-maximization clustering: Tuning parameter selection and analytic solution. In: Proceedings of 28th International Conference on Machine Learning (ICML’11), Bellevue, pp. 65–72 (2011)
Google Scholar
Sugiyama, M., Suzuki, T., Kanamori, T.: Density Ratio Estimation in Machine Learning. Cambridge University Press, Cambridge (2012)
Book MATH Google Scholar
Sugiyama, M., Suzuki, T., Kanamori, T.: Density ratio matching under the Bregman divergence: a unified framework of density ratio estimation. Ann. Inst. Stat. Math. 64(5), 1009–1044 (2012)
Article MathSciNet MATH Google Scholar
Sugiyama, M., Suzuki, T., Kanamori, T., du Plessis, M., Liu, S., Takeuchi, I.: Density-difference estimation. Neural Comput. 25(10), 2734–2775 (2013)
Article Google Scholar
Suzuki, T., Sugiyama, M.: Least-squares independent component analysis. Neural Comput. 23(1), 284–301 (2011)
Article MathSciNet MATH Google Scholar
Suzuki, T., Sugiyama, M.: Sufficient dimension reduction via squared-loss mutual information estimation. Neural Comput. 3(25), 725–758 (2013)
Article MathSciNet Google Scholar
Suzuki, T., Sugiyama, M., Kanamori, T., Sese, J.: Mutual information estimation reveals global associations between stimuli and biological processes. BMC Bioinform. 10(1) (2009). S52, 12p
Google Scholar
Tibshirani, R.: Regression shrinkage and subset selection with the Lasso. J. R. Stat. Soc. B 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Tomioka, R., Suzuki, T., Sugiyama, M.: Super-linear convergence of dual augmented Lagrangian algorithm for sparsity regularized estimation. J. Mach. Learn. Res. 12, 1537–1586 (2011)
MathSciNet Google Scholar
Torkkola, K.: Feature extraction by non-parametric mutual information maximization. J. Mach. Learn. Res. 3, 1415–1438 (2003)
MathSciNet MATH Google Scholar
Tsuboi, Y., Kashima, H., Hido, S., Bickel, S., Sugiyama, M.: Direct density ratio estimation for large-scale covariate shift adaptation. J. Inf. Process. 17, 138–155 (2009)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Wahba, G.: Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia (1990)
Book MATH Google Scholar
Yamada, M., Sugiyama, M.: Direct importance estimation with Gaussian mixture models. IEICE Trans. Inf. Syst. E92-D(10), 2159–2162 (2009)
Article Google Scholar
Yamada, M., Sugiyama, M.: Dependence minimizing regression with model selection for non-linear causal inference under non-Gaussian noise. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI’10), Atlanta, pp. 643–648. AAAI (2010)
Google Scholar
Yamada, M., Sugiyama, M.: Cross-domain object matching with model selection. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS2011), Ft. Lauderdale, JMLR Workshop and Conference Proceedings, vol. 15, pp. 807–815 (2011)
Google Scholar
Yamada, M., Sugiyama, M.: Direct density-ratio estimation with dimensionality reduction via hetero-distributional subspace analysis. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI11), San Francisco, pp. 549–554. AAAI (2011)
Google Scholar
Yamada, M., Sugiyama, M., Wichern, G., Simm, J.: Direct importance estimation with a mixture of probabilistic principal component analyzers. IEICE Trans. Inf. Syst. E93-D(10), 2846–2849 (2010)
Article Google Scholar
Yamada, M., Niu, G., Takagi, J., Sugiyama, M.: Computationally efficient sufficient dimension reduction via squared-loss mutual information. In: Proceedings of the Third Asian Conference on Machine Learning (ACML’11), Taoyuan. JMLR Workshop and Conference Proceedings, vol. 20, pp. 247–262 (2011)
Google Scholar
Yamada, M., Suzuki, T., Kanamori, T., Hachiya, H., Sugiyama, M.: Relative density-ratio estimation for robust distribution comparison. Neural Comput. 25(5), 1324–1370 (2013)
Article MathSciNet Google Scholar
Yamanaka, M., Matsugu, M., Sugiyama, M.: Detection of activities and events without explicit categorization. IPSJ Trans. Math. Model. Appl. 6(2), 86–92 (2013)
Google Scholar
Yamanaka, M., Matsugu, M., Sugiyama, M.: Salient object detection based on direct density-ratio estimation. IPSJ Trans. Math. Model. Appl. 6(2), 78–85 (2013)
Google Scholar

Download references

Acknowledgements

The author acknowledges support from the JST PRESTO program, KAKENHI 25700022, the FIRST program, and AOARD.

Author information

Authors and Affiliations

Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo, 152-8552, Japan
Masashi Sugiyama

Authors

Masashi Sugiyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masashi Sugiyama .

Editor information

Editors and Affiliations

Max Planck Institute for Intelligent Systems, Tübingen, Germany
Bernhard Schölkopf
Dept. of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Zhiyuan Luo
Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Vladimir Vovk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sugiyama, M. (2013). Direct Approximation of Divergences Between Probability Distributions. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-41136-6_23
Published: 09 October 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41135-9
Online ISBN: 978-3-642-41136-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics