Skip to main content

CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data

  • Conference paper
Advances in Databases: Concepts, Systems and Applications (DASFAA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4443))

Included in the following conference series:

  • 1521 Accesses

Abstract

Recently, frequent itemsets mining over data streams attracted much attention. However, mining closed itemsets from data stream has not been well addressed. The main difficulty lies in its high complexity of maintenance aroused by the \(\underline{exact}\) model definition of closed itemsets and the dynamic changing of data streams. In data stream scenario, it is sufficient to mining only approximated frequent closed itemsets instead of in full precision. Such a compact but close-enough frequent itemset is called a relaxed frequent closed itemsets.

In this paper, we first introduce the concept of \(\mathcal{RC}\) (Relaxed frequent Closed Itemsets), which is the generalized form of approximation. We also propose a novel mechanism CLAIM, which stands for CLosed Approximated Itemset Mining, to support efficiently mining of \(\mathcal{RC}\). The CLAIM adopts bipartite graph model to store frequent closed itemsets, use Bloom filter based hash function to speed up the update of drifted itemsets, and build a compact HR-tree structure to efficiently maintain the \(\mathcal{RC}\)s and support mining process. An experimental study is conducted, and the results demonstrate the effectiveness and efficiency of our approach at handling frequent closed itemsets mining for data stream.

This work is supported by the National Natural Science Foundation of China under Grant No. 60473051 and No.60642004 and HP and IBM Joint Research Project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams. In: Proc. of the 28th Intl. Conf. on Very Large Data Bases, pp. 204–215 (2004)

    Google Scholar 

  2. Pei, J., Dong, G., Zou, W., Han, J.: On Computing Condensed Frequent Pattern Bases. In: Proc. of IEEE Intl. Conf. on Data Mining, pp. 378–385 (2002)

    Google Scholar 

  3. Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: Proc. of the 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 487–492 (2003)

    Google Scholar 

  4. Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, p. 693. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: Proc. of the 28th Intl. Conf. on Very Large Data Bases, pp. 346–357 (2002)

    Google Scholar 

  6. Giannella, C., Han, J., Robertson, E., Liu, C.: Mining frequent itemsets over arbitrary time intervals in data streams. Technical Report tr587, Indiana University (2003)

    Google Scholar 

  7. Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically. In: The ACM Symposium on Principles of Database Systems, pp. 296–306 (2003)

    Google Scholar 

  8. Karp, R.M., Shenker, S.: A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM Transactions on Database Systems 28(1), 51–55 (2003)

    Article  Google Scholar 

  9. Han, J., Pei, J., Yin, Y.: Mining frequent itemsets without candidate generation. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Database, pp. 1–12 (2000)

    Google Scholar 

  10. Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proc. of the 2001 IEEE Intl. Conf. on Data Mining, pp. 163–170 (2001)

    Google Scholar 

  11. Wang, J., Han, J., Pei, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: Proc. of the Intl. Conf. Knowledge Discovery and Data Mining, pp. 236–245 (2003)

    Google Scholar 

  12. Chi, Y., Wang, H., Yu, P., Muntz, R.: MOMENT: Maintaining closed frequent itemsets over a stream sliding window. In: Proc. Of 4th IEEE Intl. Conf. on Data Mining, pp. 59–66 (2004)

    Google Scholar 

  13. Bloom, B.: Space/time tradeoffs in in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ramamohanarao Kotagiri P. Radha Krishna Mukesh Mohania Ekawit Nantajeewarawat

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Song, G., Yang, D., Cui, B., Zheng, B., Liu, Y., Xie, K. (2007). CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71703-4_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71702-7

  • Online ISBN: 978-3-540-71703-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics