Skip to main content

Sketching Information Divergences

  • Conference paper
Learning Theory (COLT 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4539))

Included in the following conference series:

Abstract

When comparing discrete probability distributions, natural measures of similarity are not ℓ p distances but rather are information-divergences such as Kullback-Leibler and Hellinger. This paper considers some of the issues related to constructing small-space sketches of distributions, a concept related to dimensionality-reduction, such that these measures can be approximately computed from the sketches. Related problems for ℓ p distances are reasonably well understood via a series of results including Johnson, Lindenstrauss [27,18], Alon, Matias, Szegedy [1], Indyk [24], and Brinkman, Charikar [8]. In contrast, almost no analogous results are known to date about constructing sketches for the information-divergences used in statistics and learning theory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Amari, S.-I.: Differential-geometrical methods in statistics. Springer-Verlag, New York (1985)

    Google Scholar 

  • Amari, S.-I., Nagaoka, H.: Methods of Information Geometry. Oxford University and AMS Translations of Mathematical Monographs (2000)

    Google Scholar 

  • Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler algorithm for estimating frequency moments of data streams. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 708–713 (2006)

    Google Scholar 

  • Bose, P., Kranakis, E., Morin, P., Tang, Y.: Bounds for frequency estimation of packet streams. In: SIROCCO, pp. 33–42 (2003)

    Google Scholar 

  • Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. U.S.S.R. Computational Mathematics and Mathematical Physics 7(1), 200–217 (1967)

    Article  Google Scholar 

  • Breiman, L.: Prediction games and arcing algorithms. Neural Computation 11(7), 1493–1517 (1999)

    Article  Google Scholar 

  • B. Brinkman and M. Charikar. On the impossibility of dimension reduction in l_1 In IEEE Symposium on Foundations of Computer Science, pages 514–523, 2003.

    Google Scholar 

  • Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for computing the entropy of a stream. In: ACM-SIAM Symposium on Discrete Algorithms (2007)

    Google Scholar 

  • Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In: IEEE Conference on Computational Complexity, pp. 107–117 (2003)

    Google Scholar 

  • Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: International Colloquium on Automata, Languages and Programming, pp. 693–703 (2002)

    Google Scholar 

  • Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, Adaboost and Bregman distances. Machine Learning 48(1-3), 253–285 (2002)

    Article  MATH  Google Scholar 

  • Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using Hamming norms (how to zero in). IEEE Trans. Knowl. Data Eng. 15(3), 529–540 (2003)

    Article  Google Scholar 

  • Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons, New York, NY, USA (1991)

    Google Scholar 

  • Csiszár, I.: Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. Ann. Statist. pp. 2032–2056 (1991)

    Google Scholar 

  • Dasgupta, S., Gupta, A.: An elementary proof of a theorem of johnson and lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: ESA, pp. 348–360 (2002)

    Google Scholar 

  • Feigenbaum, J., Kannan, S., Strauss, M., Viswanathan, M.: An approximate L 1 difference algorithm for massive data streams. SIAM Journal on Computing 32(1) 131–151 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Guha, S., McGregor, A.: Space-efficient sampling. In: AISTATS, pp. 169–176 (2007)

    Google Scholar 

  • Guha, S., McGregor, A., Venkatasubramanian, S.: Streaming and sublinear approximation of entropy and information distances. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 733–742 (2006)

    Google Scholar 

  • Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. External memory algorithms, pp. 107–118 (1999)

    Google Scholar 

  • Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. IEEE Symposium on Foundations of Computer Science, pp. 189–197 (2000)

    Google Scholar 

  • Indyk, P., Woodruff, D.P.: Optimal approximations of the frequency moments of data streams. In: ACM Symposium on Theory of Computing, pp. 202–208 (2005)

    Google Scholar 

  • Jerome Friedman, R.T., Hastie, T.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 28, 337–407 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Johnson, W.B., Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert Space. Contemporary Mathematics 26, 189–206, May (1984)

    Google Scholar 

  • Kivinen, J., Warmuth, M.K.: Boosting as entropy projection. In: COLT, pp. 134–144 (1999)

    Google Scholar 

  • Lafferty, J.D.: Additive models, boosting, and inference for generalized divergences. In: COLT, pp. 125–133 (1999)

    Google Scholar 

  • Lafferty, J.D., Pietra, S.D., Pietra, V.J.D.: Statistical learning algorithms based on bregman distances. In: Canadian Workshop on Information Theory (1997)

    Google Scholar 

  • Liese, F., Vajda, F.: Convex statistical distances. Teubner-Texte zur Mathematik, Band 95, Leipzig (1987)

    Google Scholar 

  • Mason, L., Baxter, J., Bartlett, P., Frean, M.: Functional gradient techniques for combining hypotheses. In: Advances in Large Margin Classifiers, MIT Press, Cambridge (1999)

    Google Scholar 

  • Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  • Nguyen, X., Wainwright, M.J., Jordan, M.I.: Divergences, surrogate loss functions and experimental design. In: Proceedings of NIPS (2005)

    Google Scholar 

  • Razborov, A.A.: On the distributional complexity of disjointness. Theor. Comput. Sci. 106(2), 385–390 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  • Saks, M.E., Sun, X.: Space lower bounds for distance approximation in the data stream model. ACM Symposium on Theory of Computing, pp. 360–369 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Nader H. Bshouty Claudio Gentile

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Guha, S., Indyk, P., McGregor, A. (2007). Sketching Information Divergences. In: Bshouty, N.H., Gentile, C. (eds) Learning Theory. COLT 2007. Lecture Notes in Computer Science(), vol 4539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72927-3_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72925-9

  • Online ISBN: 978-3-540-72927-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics