Skip to main content

Direct Approximation of Divergences Between Probability Distributions

  • Chapter
  • First Online:
Book cover Empirical Inference

Abstract

Approximating a divergence between two probability distributions from their samples is a fundamental challenge in the statistics, information theory, and machine learning communities, because a divergence estimator can be used for various purposes such as two-sample homogeneity testing, change-point detection, and class-balance estimation. Furthermore, an approximator of a divergence between the joint distribution and the product of marginals can be used for independence testing, which has a wide range of applications including feature selectionFeature selection and extraction, clusteringClustering, object matching, independent component analysisIndependent component analysis (ICA), and causalityCausality learning. In this chapter, we review recent advances in direct divergence approximation that follow the general inference principle advocated by Vladimir Vapnik—one should not solve a more general problem as an intermediate step. More specifically, direct divergence approximation avoids separately estimating two probability distributions when approximating a divergence. We cover direct approximators of the Kullback–Leibler (KL) divergence, the Pearson (PE) divergence, the relative PE (rPE) divergence, and the L 2-distance. Despite the overwhelming popularity of the KL divergence, we argue that the latter approximators are more useful in practice due to their computational efficiency, high numerical stabilityStability, and superior robustness against outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ali, S., Silvey, S.: A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. B 28(1), 131–142 (1966)

    MathSciNet  MATH  Google Scholar 

  2. Amari, S., Nagaoka, H.: Methods of Information Geometry. Oxford University Press, Providence (2000)

    MATH  Google Scholar 

  3. Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)

    Google Scholar 

  4. Cortes, C., Mansour, Y., Mohri, M.: Learning bounds for importance weighting. In: Lafferty, J., Williams, C., Zemel, R., Shawe-Taylor, J., Culotta, A. (eds.) Advances in Neural Information Processing Systems, Vancouver, vol. 23, pp. 442–450 (2010)

    Google Scholar 

  5. Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley, Hoboken (2006)

    MATH  Google Scholar 

  6. Csiszár, I.: Information-type measures of difference of probability distributions and indirect observation. Stud. Sci. Math. Hungarica 2, 229–318 (1967)

    Google Scholar 

  7. du Plessis, M., Sugiyama, M.: Semi-supervised learning of class balance under class-prior change by distribution matching. In: Proceedings of 29th International Conference on Machine Learning (ICML’12), Edinburgh, pp. 823–830 (2012)

    Google Scholar 

  8. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hoerl, A., Kennard, R.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(3), 55–67 (1970)

    Article  MATH  Google Scholar 

  10. Jitkrittum, W., Hachiya, H., Sugiyama, M.: Feature selection via 1-penalized squared-loss mutual information. IEICE Trans. Inf. Syst. E96-D(7), 1513–1524 (2013)

    Article  Google Scholar 

  11. Kanamori, T., Hido, S., Sugiyama, M.: A least-squares approach to direct importance estimation. J. Mach. Learn. Res. 10, 1391–1445 (2009)

    MathSciNet  MATH  Google Scholar 

  12. Kanamori, T., Suzuki, T., Sugiyama, M.: f-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models. IEEE Trans. Inf. Theory 58(2), 708–720 (2012)

    Google Scholar 

  13. Karasuyama, M., Sugiyama, M.: Canonical dependency analysis based on squared-loss mutual information. Neural Netw. 34, 46–55 (2012)

    Article  MATH  Google Scholar 

  14. Kawahara, Y., Sugiyama, M.: Sequential change-point detection based on direct density-ratio estimation. Stat. Anal. Data Min. 5(2), 114–127 (2012)

    Article  MathSciNet  Google Scholar 

  15. Keziou, A.: Dual representation of ϕ-divergences and applications. C. R. Math. 336(10), 857–862 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  16. Kimura, M., Sugiyama, M.: Dependence-maximization clustering with least-squares mutual information. J. Adv. Comput. Intell. Intell. Inform. 15(7), 800–805 (2011)

    Google Scholar 

  17. Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  18. Liu, S., Yamada, M., Collier, N., Sugiyama, M.: Change-point detection in time-series data by relative density-ratio estimations. Neural Netw. 43, 72–83 (2013)

    Article  Google Scholar 

  19. Nguyen, X., Wainwright, M., Jordan, M.: Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theory 56(11), 5847–5861 (2010)

    Article  MathSciNet  Google Scholar 

  20. Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 5 50(302), 157–175 (1900)

    Google Scholar 

  21. Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  22. Sugiyama, M.: Machine learning with squared-loss mutual information. Entropy 15(1), 80–112 (2013)

    Article  MathSciNet  Google Scholar 

  23. Sugiyama, M., Suzuki, T.: Least-squares independence test. IEICE Trans. Inf. Syst. E94-D(6), 1333–1336 (2011)

    Article  Google Scholar 

  24. Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H., von Bünau, P., Kawanabe, M.: Direct importance estimation for covariate shift adaptation. Ann. Inst. Stat. Math. 60(4), 699–746 (2008)

    Article  MATH  Google Scholar 

  25. Sugiyama, M., Kawanabe, M., Chui, P.: Dimensionality reduction for density ratio estimation in high-dimensional spaces. Neural Netw. 23, 44–59 (2010)

    Article  Google Scholar 

  26. Sugiyama, M., Suzuki, T., Itoh, Y., Kanamori, T., Kimura, M.: Least-squares two-sample test. Neural Netw. 24(7), 735–751 (2011)

    Article  MATH  Google Scholar 

  27. Sugiyama, M., Yamada, M., von Bünau, P., Suzuki, T., Kanamori, T., Kawanabe, M.: Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search. Neural Netw. 24(2), 183–198 (2011)

    Article  MATH  Google Scholar 

  28. Sugiyama, M., Yamada, M., Kimura, M., Hachiya, H.: On information-maximization clustering: Tuning parameter selection and analytic solution. In: Proceedings of 28th International Conference on Machine Learning (ICML’11), Bellevue, pp. 65–72 (2011)

    Google Scholar 

  29. Sugiyama, M., Suzuki, T., Kanamori, T.: Density Ratio Estimation in Machine Learning. Cambridge University Press, Cambridge (2012)

    Book  MATH  Google Scholar 

  30. Sugiyama, M., Suzuki, T., Kanamori, T.: Density ratio matching under the Bregman divergence: a unified framework of density ratio estimation. Ann. Inst. Stat. Math. 64(5), 1009–1044 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  31. Sugiyama, M., Suzuki, T., Kanamori, T., du Plessis, M., Liu, S., Takeuchi, I.: Density-difference estimation. Neural Comput. 25(10), 2734–2775 (2013)

    Article  Google Scholar 

  32. Suzuki, T., Sugiyama, M.: Least-squares independent component analysis. Neural Comput. 23(1), 284–301 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  33. Suzuki, T., Sugiyama, M.: Sufficient dimension reduction via squared-loss mutual information estimation. Neural Comput. 3(25), 725–758 (2013)

    Article  MathSciNet  Google Scholar 

  34. Suzuki, T., Sugiyama, M., Kanamori, T., Sese, J.: Mutual information estimation reveals global associations between stimuli and biological processes. BMC Bioinform. 10(1) (2009). S52, 12p

    Google Scholar 

  35. Tibshirani, R.: Regression shrinkage and subset selection with the Lasso. J. R. Stat. Soc. B 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  36. Tomioka, R., Suzuki, T., Sugiyama, M.: Super-linear convergence of dual augmented Lagrangian algorithm for sparsity regularized estimation. J. Mach. Learn. Res. 12, 1537–1586 (2011)

    MathSciNet  Google Scholar 

  37. Torkkola, K.: Feature extraction by non-parametric mutual information maximization. J. Mach. Learn. Res. 3, 1415–1438 (2003)

    MathSciNet  MATH  Google Scholar 

  38. Tsuboi, Y., Kashima, H., Hido, S., Bickel, S., Sugiyama, M.: Direct density ratio estimation for large-scale covariate shift adaptation. J. Inf. Process. 17, 138–155 (2009)

    Google Scholar 

  39. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  40. Wahba, G.: Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia (1990)

    Book  MATH  Google Scholar 

  41. Yamada, M., Sugiyama, M.: Direct importance estimation with Gaussian mixture models. IEICE Trans. Inf. Syst. E92-D(10), 2159–2162 (2009)

    Article  Google Scholar 

  42. Yamada, M., Sugiyama, M.: Dependence minimizing regression with model selection for non-linear causal inference under non-Gaussian noise. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI’10), Atlanta, pp. 643–648. AAAI (2010)

    Google Scholar 

  43. Yamada, M., Sugiyama, M.: Cross-domain object matching with model selection. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS2011), Ft. Lauderdale, JMLR Workshop and Conference Proceedings, vol. 15, pp. 807–815 (2011)

    Google Scholar 

  44. Yamada, M., Sugiyama, M.: Direct density-ratio estimation with dimensionality reduction via hetero-distributional subspace analysis. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI11), San Francisco, pp. 549–554. AAAI (2011)

    Google Scholar 

  45. Yamada, M., Sugiyama, M., Wichern, G., Simm, J.: Direct importance estimation with a mixture of probabilistic principal component analyzers. IEICE Trans. Inf. Syst. E93-D(10), 2846–2849 (2010)

    Article  Google Scholar 

  46. Yamada, M., Niu, G., Takagi, J., Sugiyama, M.: Computationally efficient sufficient dimension reduction via squared-loss mutual information. In: Proceedings of the Third Asian Conference on Machine Learning (ACML’11), Taoyuan. JMLR Workshop and Conference Proceedings, vol. 20, pp. 247–262 (2011)

    Google Scholar 

  47. Yamada, M., Suzuki, T., Kanamori, T., Hachiya, H., Sugiyama, M.: Relative density-ratio estimation for robust distribution comparison. Neural Comput. 25(5), 1324–1370 (2013)

    Article  MathSciNet  Google Scholar 

  48. Yamanaka, M., Matsugu, M., Sugiyama, M.: Detection of activities and events without explicit categorization. IPSJ Trans. Math. Model. Appl. 6(2), 86–92 (2013)

    Google Scholar 

  49. Yamanaka, M., Matsugu, M., Sugiyama, M.: Salient object detection based on direct density-ratio estimation. IPSJ Trans. Math. Model. Appl. 6(2), 78–85 (2013)

    Google Scholar 

Download references

Acknowledgements

The author acknowledges support from the JST PRESTO program, KAKENHI 25700022, the FIRST program, and AOARD.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masashi Sugiyama .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sugiyama, M. (2013). Direct Approximation of Divergences Between Probability Distributions. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41136-6_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41135-9

  • Online ISBN: 978-3-642-41136-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics