CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data

Song, Guojie; Yang, Dongqing; Cui, Bin; Zheng, Baihua; Liu, Yunfeng; Xie, Kunqing

doi:10.1007/978-3-540-71703-4_56

Guojie Song^1,4,
Dongqing Yang¹,
Bin Cui¹,
Baihua Zheng²,
Yunfeng Liu³ &
…
Kunqing Xie⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4443))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1521 Accesses

Abstract

Recently, frequent itemsets mining over data streams attracted much attention. However, mining closed itemsets from data stream has not been well addressed. The main difficulty lies in its high complexity of maintenance aroused by the \(\underline{exact}\) model definition of closed itemsets and the dynamic changing of data streams. In data stream scenario, it is sufficient to mining only approximated frequent closed itemsets instead of in full precision. Such a compact but close-enough frequent itemset is called a relaxed frequent closed itemsets.

In this paper, we first introduce the concept of \(\mathcal{RC}\) (Relaxed frequent Closed Itemsets), which is the generalized form of approximation. We also propose a novel mechanism CLAIM, which stands for CLosed Approximated Itemset Mining, to support efficiently mining of \(\mathcal{RC}\). The CLAIM adopts bipartite graph model to store frequent closed itemsets, use Bloom filter based hash function to speed up the update of drifted itemsets, and build a compact HR-tree structure to efficiently maintain the \(\mathcal{RC}\)s and support mining process. An experimental study is conducted, and the results demonstrate the effectiveness and efficiency of our approach at handling frequent closed itemsets mining for data stream.

This work is supported by the National Natural Science Foundation of China under Grant No. 60473051 and No.60642004 and HP and IBM Joint Research Project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scan

Article 24 September 2022

PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data

Article Open access 17 December 2019

Efficient Mining of Top k-Closed Itemset in Real Time

References

Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams. In: Proc. of the 28th Intl. Conf. on Very Large Data Bases, pp. 204–215 (2004)
Google Scholar
Pei, J., Dong, G., Zou, W., Han, J.: On Computing Condensed Frequent Pattern Bases. In: Proc. of IEEE Intl. Conf. on Data Mining, pp. 378–385 (2002)
Google Scholar
Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: Proc. of the 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 487–492 (2003)
Google Scholar
Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, p. 693. Springer, Heidelberg (2002)
Chapter Google Scholar
Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: Proc. of the 28th Intl. Conf. on Very Large Data Bases, pp. 346–357 (2002)
Google Scholar
Giannella, C., Han, J., Robertson, E., Liu, C.: Mining frequent itemsets over arbitrary time intervals in data streams. Technical Report tr587, Indiana University (2003)
Google Scholar
Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically. In: The ACM Symposium on Principles of Database Systems, pp. 296–306 (2003)
Google Scholar
Karp, R.M., Shenker, S.: A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM Transactions on Database Systems 28(1), 51–55 (2003)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent itemsets without candidate generation. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Database, pp. 1–12 (2000)
Google Scholar
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proc. of the 2001 IEEE Intl. Conf. on Data Mining, pp. 163–170 (2001)
Google Scholar
Wang, J., Han, J., Pei, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: Proc. of the Intl. Conf. Knowledge Discovery and Data Mining, pp. 236–245 (2003)
Google Scholar
Chi, Y., Wang, H., Yu, P., Muntz, R.: MOMENT: Maintaining closed frequent itemsets over a stream sliding window. In: Proc. Of 4th IEEE Intl. Conf. on Data Mining, pp. 59–66 (2004)
Google Scholar
Bloom, B.: Space/time tradeoffs in in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic Engineering and Computer Science, Peking University, Beijing, China
Guojie Song, Dongqing Yang & Bin Cui
School of Information System, Singapore Management University, Singapore
Baihua Zheng
Computer Center of Peking University, Beijing,
Yunfeng Liu
National Laboratory on Machine Perception, Peking University, Beijing,
Guojie Song & Kunqing Xie

Authors

Guojie Song
View author publications
You can also search for this author in PubMed Google Scholar
Dongqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Cui
View author publications
You can also search for this author in PubMed Google Scholar
Baihua Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yunfeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kunqing Xie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ramamohanarao Kotagiri P. Radha Krishna Mukesh Mohania Ekawit Nantajeewarawat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, G., Yang, D., Cui, B., Zheng, B., Liu, Y., Xie, K. (2007). CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_56

Download citation

DOI: https://doi.org/10.1007/978-3-540-71703-4_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71702-7
Online ISBN: 978-3-540-71703-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics