Abstract
Recently, frequent itemsets mining over data streams attracted much attention. However, mining closed itemsets from data stream has not been well addressed. The main difficulty lies in its high complexity of maintenance aroused by the \(\underline{exact}\) model definition of closed itemsets and the dynamic changing of data streams. In data stream scenario, it is sufficient to mining only approximated frequent closed itemsets instead of in full precision. Such a compact but close-enough frequent itemset is called a relaxed frequent closed itemsets.
In this paper, we first introduce the concept of \(\mathcal{RC}\) (Relaxed frequent Closed Itemsets), which is the generalized form of approximation. We also propose a novel mechanism CLAIM, which stands for CLosed Approximated Itemset Mining, to support efficiently mining of \(\mathcal{RC}\). The CLAIM adopts bipartite graph model to store frequent closed itemsets, use Bloom filter based hash function to speed up the update of drifted itemsets, and build a compact HR-tree structure to efficiently maintain the \(\mathcal{RC}\)s and support mining process. An experimental study is conducted, and the results demonstrate the effectiveness and efficiency of our approach at handling frequent closed itemsets mining for data stream.
This work is supported by the National Natural Science Foundation of China under Grant No. 60473051 and No.60642004 and HP and IBM Joint Research Project.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams. In: Proc. of the 28th Intl. Conf. on Very Large Data Bases, pp. 204–215 (2004)
Pei, J., Dong, G., Zou, W., Han, J.: On Computing Condensed Frequent Pattern Bases. In: Proc. of IEEE Intl. Conf. on Data Mining, pp. 378–385 (2002)
Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: Proc. of the 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 487–492 (2003)
Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, p. 693. Springer, Heidelberg (2002)
Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: Proc. of the 28th Intl. Conf. on Very Large Data Bases, pp. 346–357 (2002)
Giannella, C., Han, J., Robertson, E., Liu, C.: Mining frequent itemsets over arbitrary time intervals in data streams. Technical Report tr587, Indiana University (2003)
Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically. In: The ACM Symposium on Principles of Database Systems, pp. 296–306 (2003)
Karp, R.M., Shenker, S.: A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM Transactions on Database Systems 28(1), 51–55 (2003)
Han, J., Pei, J., Yin, Y.: Mining frequent itemsets without candidate generation. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Database, pp. 1–12 (2000)
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proc. of the 2001 IEEE Intl. Conf. on Data Mining, pp. 163–170 (2001)
Wang, J., Han, J., Pei, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: Proc. of the Intl. Conf. Knowledge Discovery and Data Mining, pp. 236–245 (2003)
Chi, Y., Wang, H., Yu, P., Muntz, R.: MOMENT: Maintaining closed frequent itemsets over a stream sliding window. In: Proc. Of 4th IEEE Intl. Conf. on Data Mining, pp. 59–66 (2004)
Bloom, B.: Space/time tradeoffs in in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Song, G., Yang, D., Cui, B., Zheng, B., Liu, Y., Xie, K. (2007). CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_56
Download citation
DOI: https://doi.org/10.1007/978-3-540-71703-4_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71702-7
Online ISBN: 978-3-540-71703-4
eBook Packages: Computer ScienceComputer Science (R0)