Abstract
In this paper, we introduce a novel data structure, ESBF (Ex- tensible and Scalable Bloom Filter), and the algorithm FI-ESBF (Finding frequent Items using ESBF) for estimating the frequent items in data streams. FI-ESBF can work with high precision while using much less memory than those of the best reported algorithm does considering the large number of distinct items in the stream. ESBF is the extension of counting Bloom Filter(CBF), By using it, we are allowed to adjust the size of memory used dynamically according to the different data distribution and the number of distinct items in the data streams, therefore the priori knowledge about the data distribution of the streams and the number of distinct elements to be stored is not required.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bloom, B.: Space/time tradeoffs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Fang, M., et al.: Computing iceberg queries efficiently. In VLDB (August 1998)
Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 346–357 (2002)
Xu Yu, J., Chong, Z., Lu, H., Zhou, A.: False Positive or False Negative:Mining Frequent Itemsets form High Speed Transactional Data Streams. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 204–215 (2004)
Cormode, G., Muthukrishnan, S.: Whats Hot and Whats Not: Tracking Most Frequent Items Dynamically. In: Proceedings of the 22nd Symposium on Principles of Databse Systems, pp. 296–306 (June 2003)
Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and mining data streams: you only get one look. In: the tutorial notes of the 28th Int’l Conference on Very Large Databases, Hong Kong, China (August 2002)
Demaine, E.D., Lopez-Ortiz, A., Munro, J.I.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)
Estan, C., Varghese, G.: New Directions in Traffic Measurement and Accounting: Focusing on the Elephants, Ignoring the Mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)
Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically Maintaining Frequent Items over A Data Stream. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 287–294. ACM Press, New York (2003)
Karp, R., Shenker, S., Papadimitriou, C.: A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM Transactions on Database Systems 28(1), 51–55 (2003)
Metwally, A., Agrawal, D., El Abbadi, A.: Efficient Computation of Frequent and Top-k Elements in Data Streams. Technical Report 2005-23, University of California, Santa Barbara (September 2005)
Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactons on networking 8(3) (June 2000)
Aguilar-Saborit, J., Trancoso, P., Muntes-Mulero, V., Larriba-Pey, J.L.: Dynamic Count Filters. SIGMOD Record 35(1) (March 2006)
Cohen, S., Matias, Y.: Spectral Bloom Filters. In: SIGMOD 2003, June 912 , San Diego, CA (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, S., Hao, X., Xu, H., Hu, Y. (2007). Finding Frequent Items in Data Streams Using ESBF. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-77018-3_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77016-9
Online ISBN: 978-3-540-77018-3
eBook Packages: Computer ScienceComputer Science (R0)