Abstract
When comparing discrete probability distributions, natural measures of similarity are not ℓ p distances but rather are information-divergences such as Kullback-Leibler and Hellinger. This paper considers some of the issues related to constructing small-space sketches of distributions, a concept related to dimensionality-reduction, such that these measures can be approximately computed from the sketches. Related problems for ℓ p distances are reasonably well understood via a series of results including Johnson, Lindenstrauss [27,18], Alon, Matias, Szegedy [1], Indyk [24], and Brinkman, Charikar [8]. In contrast, almost no analogous results are known to date about constructing sketches for the information-divergences used in statistics and learning theory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)
Amari, S.-I.: Differential-geometrical methods in statistics. Springer-Verlag, New York (1985)
Amari, S.-I., Nagaoka, H.: Methods of Information Geometry. Oxford University and AMS Translations of Mathematical Monographs (2000)
Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler algorithm for estimating frequency moments of data streams. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 708–713 (2006)
Bose, P., Kranakis, E., Morin, P., Tang, Y.: Bounds for frequency estimation of packet streams. In: SIROCCO, pp. 33–42 (2003)
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. U.S.S.R. Computational Mathematics and Mathematical Physics 7(1), 200–217 (1967)
Breiman, L.: Prediction games and arcing algorithms. Neural Computation 11(7), 1493–1517 (1999)
B. Brinkman and M. Charikar. On the impossibility of dimension reduction in l_1 In IEEE Symposium on Foundations of Computer Science, pages 514–523, 2003.
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)
Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for computing the entropy of a stream. In: ACM-SIAM Symposium on Discrete Algorithms (2007)
Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In: IEEE Conference on Computational Complexity, pp. 107–117 (2003)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: International Colloquium on Automata, Languages and Programming, pp. 693–703 (2002)
Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, Adaboost and Bregman distances. Machine Learning 48(1-3), 253–285 (2002)
Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using Hamming norms (how to zero in). IEEE Trans. Knowl. Data Eng. 15(3), 529–540 (2003)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons, New York, NY, USA (1991)
Csiszár, I.: Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. Ann. Statist. pp. 2032–2056 (1991)
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of johnson and lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: ESA, pp. 348–360 (2002)
Feigenbaum, J., Kannan, S., Strauss, M., Viswanathan, M.: An approximate L 1 difference algorithm for massive data streams. SIAM Journal on Computing 32(1) 131–151 (2002)
Guha, S., McGregor, A.: Space-efficient sampling. In: AISTATS, pp. 169–176 (2007)
Guha, S., McGregor, A., Venkatasubramanian, S.: Streaming and sublinear approximation of entropy and information distances. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 733–742 (2006)
Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. External memory algorithms, pp. 107–118 (1999)
Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. IEEE Symposium on Foundations of Computer Science, pp. 189–197 (2000)
Indyk, P., Woodruff, D.P.: Optimal approximations of the frequency moments of data streams. In: ACM Symposium on Theory of Computing, pp. 202–208 (2005)
Jerome Friedman, R.T., Hastie, T.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 28, 337–407 (2000)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert Space. Contemporary Mathematics 26, 189–206, May (1984)
Kivinen, J., Warmuth, M.K.: Boosting as entropy projection. In: COLT, pp. 134–144 (1999)
Lafferty, J.D.: Additive models, boosting, and inference for generalized divergences. In: COLT, pp. 125–133 (1999)
Lafferty, J.D., Pietra, S.D., Pietra, V.J.D.: Statistical learning algorithms based on bregman distances. In: Canadian Workshop on Information Theory (1997)
Liese, F., Vajda, F.: Convex statistical distances. Teubner-Texte zur Mathematik, Band 95, Leipzig (1987)
Mason, L., Baxter, J., Bartlett, P., Frean, M.: Functional gradient techniques for combining hypotheses. In: Advances in Large Margin Classifiers, MIT Press, Cambridge (1999)
Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)
Nguyen, X., Wainwright, M.J., Jordan, M.I.: Divergences, surrogate loss functions and experimental design. In: Proceedings of NIPS (2005)
Razborov, A.A.: On the distributional complexity of disjointness. Theor. Comput. Sci. 106(2), 385–390 (1992)
Saks, M.E., Sun, X.: Space lower bounds for distance approximation in the data stream model. ACM Symposium on Theory of Computing, pp. 360–369 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Guha, S., Indyk, P., McGregor, A. (2007). Sketching Information Divergences. In: Bshouty, N.H., Gentile, C. (eds) Learning Theory. COLT 2007. Lecture Notes in Computer Science(), vol 4539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-72927-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72925-9
Online ISBN: 978-3-540-72927-3
eBook Packages: Computer ScienceComputer Science (R0)