Abstract
The duplicate-insensitive and time-decayed sum of an arbitrary subset in a stream is an important aggregation for various analyses in many distributed stream scenarios. In general, precisely providing this sum in an unbounded and high-rate stream is infeasible. Therefore, we target at this problem and introduce a sketch, namely, time-decaying Bloom Filter (TDBF). The TDBF can detect duplicates in a stream and meanwhile dynamically maintain decayed-weight of all distinct elements in the stream according to a user-specified decay function. For a query for the current decayed sum of a subset in the stream, TDBF provides an effective estimation. In our theoretical analysis, a provably approximate guarantee has been given for the error of the estimation. In addition, the experimental results on synthetic stream validate our theoretical analysis.
This work is supported by Chinese Academy of Science "100 Talents" Project and National Science Foundation of China under its General Projects funding #60772034. Corresponding author: H. Shen.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cohen, E., Strauss, M.: Maintaining time-decaying stream aggregates. In: Proc. Principles of Database Systems (PODS), San Diego, California, June 2003, pp. 223–233 (2003)
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. SIAM J. on Computing 31(6), 1794–1813 (2002)
Babcock, B., Babu, S., Datar, M., Windom, J.: Model and issues in data stream systems. In: Proc. Principles of Database Systems (PODS), Wisconsin, June 2002, pp. 1–16 (2002)
Golab, L., Ozsu, M.T.: Issues in data stream management. SIGMOD Record 32(2), 5–14 (2003)
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database System Implementation. Prentice Hall, Englewood Cliffs (2000)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)
Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: A scalable wide-area Web cache sharing protocol. IEEE/ACM Trans. net2working 8(3), 281–293 (2000)
Cohen, S., Matias, Y.: Spectral bloom filters. In: Proc. ACM SIGMOD Conf., California, June 2003, pp. 241–252 (2003)
Muthukrishnan, S.: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science. Now Publishers (August 2005)
Arasu, A., Manku, G.: Approximate counts and quantiles over sliding windows. In: Proc. Principles of Database Systems(PODS), Paris, France, June 2004, pp. 286–296 (2004)
Metwally, A., Agrawal, D., Abbadi, A.E.: Duplicate detection in click streams. In: Proc. 14th Int. Conf. World Wide Web, Chiba, Japan, May 2005, pp. 12–21 (2005)
Deng, F., Rafiei, D.: Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters. In: Proc. ACM SIGMOD Conf., New York, June 2006, pp. 25–36 (2006)
Cormode, G., Tirthapura, S., Xu, B.: Time-decaying sketches for sensor data aggregation. In: Proc. Principles of distributed computing (PODC), Portland, Oregon, May 2007, pp. 215–224 (2007)
Cheng, K., Xiang, L., Iwaihara, M.: Time-Decaying Bloom Filters for Data Streams with Skewed Distributions. In: Proc. 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA) (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Y., Shen, H., Tian, H., Zhang, X. (2009). Dynamically Maintaining Duplicate-Insensitive and Time-Decayed Sum Using Time-Decaying Bloom Filter. In: Hua, A., Chang, SL. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2009. Lecture Notes in Computer Science, vol 5574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03095-6_70
Download citation
DOI: https://doi.org/10.1007/978-3-642-03095-6_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03094-9
Online ISBN: 978-3-642-03095-6
eBook Packages: Computer ScienceComputer Science (R0)