Abstract
We introduce a new sublinear space data structure—the Count-Min Sketch— for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition, it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc. The time and space bounds we show for using the CM sketch to solve these problems significantly improve those previously known — typically from 1/ε 2 to 1/ε in factor.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alon, N., Gibbons, P., Matias, Y., Szegedy, M.: Tracking join and self-join sizes in limited storage. In: Proceedings of the Eighteenth ACM Symposium on Principles of Database Systems (PODS 1999), pp. 10–20 (1999)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pp. 20–29 (1996); Journal version in Journal of Computer and System Sciences 58, 137–147 (1999)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of Symposium on Principles of Database Systems (PODS), pp. 1–16 (2002)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: Proceedings of ACM Principles of Database Systems, pp. 296–306 (2003)
Cormode, G., Muthukrishnan, S.: What’s new: Finding significant differences in network data streams. In: Proceedings of IEEE Infocom (2004)
Estan, C., Varghese, G.: Data streaming in computer networks. In: Proceedings of Workshop on Management and Processing of Data Streams (2003), http://www.research.att.com/conf/mpds2003/schedule/estanV.ps
Flajolet, P., Martin, G.N.: Probabilistic counting. In: 24th Annual Symposium on Foundations of Computer Science, pp. 76–82 (1983); Journal version in Journal of Computer and System Sciences 31, 182–209 (1985)
Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and mining data streams:You only get one look. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2002)
Gibbons, P., Matias, Y.: Synopsis structures for massive data sets. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, A (1999)
Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, smallspace algorithms for approximate histogram maintenance. In: Proceedings of the 34th ACM Symposium on Theory of Computing, pp. 389–398 (2002)
Gilbert, A., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: Onepass summaries for approximate aggregate queries. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 79–88 (2001); Journal version in IEEE Transactions on Knowledge and Data Engineering 15(3), 541–554 (2003)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: How to summarize the universe: Dynamic maintenance of quantiles. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 454–465 (2002)
Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. SIGMOD Record (ACM Special Interest Group on Management of Data) 30(2), 58–66 (2001)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)
Muthukrishnan, S.: Data streams: Algorithms and applications. In: ACM-SIAM Symposium on Discrete Algorithms (2003), http://athos.rutgers.edu/~muthu/stream-1-1.ps
Woodruff, D.: Optimal space lower bounds for all frequency moments. In: ACM-SIAM Symposium on Discrete Algorithms (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cormode, G., Muthukrishnan, S. (2004). An Improved Data Stream Summary: The Count-Min Sketch and Its Applications. In: Farach-Colton, M. (eds) LATIN 2004: Theoretical Informatics. LATIN 2004. Lecture Notes in Computer Science, vol 2976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24698-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-24698-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21258-4
Online ISBN: 978-3-540-24698-5
eBook Packages: Springer Book Archive