Abstract
Bloom Filters are space and time efficient randomized data structures for representing (multi-)sets with certain allowable errors, and are widely used in many applications. Previous works on Bloom Filters considered how to support insertions, deletions, membership queries, and multiplicity queries over (multi-)sets. In this paper, we introduce two novel algorithms for computing cardinalities of multi-sets represented by Bloom Filters, which extend the functionality of the Bloom Filter and thus make it usable in a variety of new applications. The Bloom structure presented in the previous work is used without any modification, and our algorithms have no influence to previous functionality. For Bloom Filters support cardinality computing in addition to insertions, deletions, membership queries, and multiplicity queries simultaneously, our work is a new step towards fully representing multi-sets by Bloom Filters. Performance analysis and experimental results show the difference of the two algorithms and show that our algorithms perform well in most cases.
Supported by State Key Laboratory of Networking and Switching Technology, NSFC Grant 60473051 and 60503037, and NSFBC Grant 4062018.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Communication of the ACM 13(7), 422–426 (1970)
Fan, L., Cao, P., Almeida, J., Border, A.Z.: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. ACM SIGCOMM Computer Communication Review 28(4), 254–265 (1998)
Cohen, S., Matias, Y.: Spectral Bloom Filters. In: Proceedings of SIGMOD, pp. 241–252 (2003)
Flajolet, P., Martin, N.: Probabilistic Counting Algorithms for Data Base Applications. Journal of Computer and System Sciences 31(2), 182–209 (1985)
Ganguly, S., Garofalakis, M.N., Rastogi, R.: Tracking Set-Expression Cardinalities over Continuous Update Streams. VLDB Journal 13(4), 354–369 (2004)
Garofalakis, M.N., Ganguly, S., Kumar, A., Rastogi, R.: Join-Distinct Aggregate Estimation over Update Streams. In: Proceedings of PODS 2005, pp. 259–270 (2005)
Broder, A., Mitzenmacher, M.: Network Applications of Bloom Filters: A Survey. Internet Mathematics 1(4), 485–509 (2004)
Metwally, A., Agrawal, D., Abbadi, A.E.: Duplicate Detection in Click Streams. In: Proceedings of WWW 2005, pp. 12–21 (2005)
Deng, F., Rafiei, D.: Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters. In: Proceedings of SIGMOD 2006, pp. 25–36 (2006)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proceedings of PODS 2002, pp. 1–16 (2002)
L’Ecuyer, P.: Tables of Maximally Equidistributed Combined LFSR Generators. Mathematics of Computation 68(225), 261–269 (1999)
Elias, P.: Universal Codeword Sets and Representations of the Integers. IEEE Transactions on Information Theory 21(2), 194–202 (1975)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhao, J., Yang, D., Chen, L., Gao, J., Wang, T. (2006). Cardinality Computing: A New Step Towards Fully Representing Multi-sets by Bloom Filters. In: Aberer, K., Peng, Z., Rundensteiner, E.A., Zhang, Y., Li, X. (eds) Web Information Systems – WISE 2006. WISE 2006. Lecture Notes in Computer Science, vol 4255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11912873_26
Download citation
DOI: https://doi.org/10.1007/11912873_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48105-8
Online ISBN: 978-3-540-48107-2
eBook Packages: Computer ScienceComputer Science (R0)