Abstract
Most algorithms that focus on discovering frequent patterns from data streams assumed that the machinery is capable of managing all the incoming transactions without any delay; or without the need to drop transactions. However, this assumption is often impractical due to the inherent characteristics of data stream environments. Especially under high load conditions, there is often a shortage of system resources to process the incoming transactions. This causes unwanted latencies that in turn, affects the applicability of the data mining models produced – which often has a small window of opportunity. We propose a load shedding algorithm to address this issue. The algorithm adaptively detects overload situations and drops transactions from data streams using a probabilistic model. We tested our algorithm on both synthetic and real-life datasets to verify the feasibility of our algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB Conference, pp. 487–499 (1994)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS Conference, pp. 1–16 (2002)
Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: ICDE Conference, pp. 350–361 (2004)
Chambers, C., Feng, W., Sahu, S., Saha, D.: Measurement-based characterization of a collection of on-line games. In: IMC Conference, pp. 1–14 (2005)
Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: ACM SIGKDD Conference, pp. 487–492 (2003)
Chi, Y., Yu, P.S., Wang, H., Muntz, R.R.: Loadstar: A load shedding scheme for classifying data streams. In: SIAM Conference, pp. 346–357 (2005)
Dang, X.H., Ng, W.K., Ong, K.L.: Adaptive load shedding for mining frequent patterns from data streams. Technical Report, Nanyang Technological University
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Next Generation Data Mining, AAAI/MIT (2003)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1963)
Lin, C.H., Chiu, D.Y., Wu, Y.H., Chen, A.L.P.: Mining frequent itemsets from data streams with a time-sensitive sliding window. In: SIAM Conference (2005)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB Conference, pp. 346–357 (2002)
Tatbul, N., Çetintemel, U., Zdonik, S.B., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: VLDB Conference, pp. 309–320 (2003)
Teng, W.G., Chen, M.S., Yu, P.S.: A regression-based temporal pattern mining scheme for data streams. In: VLDB Conference, pp. 93–104 (2003)
Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: ACM SIGKDD Conference, pp. 344–353 (2004)
Yu, J.X., Lu, Z.C.H., Zhou, A.: False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In: VLDB Conference (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dang, X.H., Ng, WK., Ong, KL. (2006). Adaptive Load Shedding for Mining Frequent Patterns from Data Streams. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_33
Download citation
DOI: https://doi.org/10.1007/11823728_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37736-8
Online ISBN: 978-3-540-37737-5
eBook Packages: Computer ScienceComputer Science (R0)