Abstract
In data stream applications, a good approximation obtained in a timely manner is often better than the exact answer that’s delayed beyond the window of opportunity. Of course, the quality of the approximate is as important as its timely delivery. Unfortunately, algorithms capable of online processing do not conform strictly to a precise error guarantee. Since online processing is essential and so is the precision of the error, it is necessary that stream algorithms meet both criteria. Yet, this is not the case for mining frequent sets in data streams. We present EStream, a novel algorithm that allows online processing while producing results strictly within the error bound. Our theoretical and experimental results show that EStream is a better candidate for finding frequent sets in data streams, when both constraints need to be satisfied.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB Conference, pp. 487–499 (1994)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS Conference, pp. 1–16 (2002)
Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: ACM-SIAM Symposium on Discrete Algorithms (2002)
Chang, J.H., Lee, W.S.: Estwin: Adaptively monitoring the recent change of frequent itemsets over online data streams. In: CIKM Conference (2003)
Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: ACM SIGKDD Conference, pp. 487–492 (2003)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)
Dang, X.H., Ng, W.K., Ong, K.L.: Online mining of frequent patterns with precise error guarantees. Technical Report, Nanyang Technological University
Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and mining data streams: you only get one look a tutorial. In: ACM SIGMOD Conference (2002)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. AAAI/MIT (2003)
Hidber, C.: Online association rule mining. In: SIGMOD Conference (1999)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB Conference, pp. 346–357 (2002)
Tatbul, N., Çetintemel, U., Zdonik, S.B., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: VLDB Conference, pp. 309–320 (2003)
Teng, W.G., Chen, M.S., Yu, P.S.: A regression-based temporal pattern mining scheme for data streams. In: VLDB Conference, pp. 93–104 (2003)
Yu, J.X., Lu, Z.C.H., Zhou, A.: False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In: VLDB Conference (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dang, X.H., Ng, WK., Ong, KL. (2006). EStream: Online Mining of Frequent Sets with Precise Error Guarantee. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_30
Download citation
DOI: https://doi.org/10.1007/11823728_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37736-8
Online ISBN: 978-3-540-37737-5
eBook Packages: Computer ScienceComputer Science (R0)