Mining Maximal Frequent Itemsets in Data Streams Based on FP-Tree

Ao, Fujiang; Yan, Yuejin; Huang, Jian; Huang, Kedi

doi:10.1007/978-3-540-73499-4_36

Fujiang Ao¹,
Yuejin Yan²,
Jian Huang¹ &
…
Kedi Huang¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3669 Accesses
3 Citations

Abstract

Mining maximal frequent itemsets in data streams is more difficult than mining them in static databases for the huge, high-speed and continuous characteristics of data streams. In this paper, we propose a novel one-pass algorithm called FpMFI-DS, which mines all maximal frequent itemsets in Landmark windows or Sliding windows in data streams based on FP-Tree. A new structure of FP-Tree is designed for storing all transactions in Landmark windows or Sliding windows in data streams. To improve the efficiency of the algorithm, a new pruning technique, extension support equivalency pruning (ESEquivPS), is imported to it. The experiments show that our algorithm is efficient and scalable. It is suitable for mining MFIs both in static database and in data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. of the twenty-first ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems 2002, pp. 1–16 (2002)
Google Scholar
Mao, G., Wu, X., Liu, C.: Online Mining of Maximal Frequent Itemsequences from Data Streams. University of Vermont, Computer Science Technical Report, CS-05-07 (2005)
Google Scholar
Li, H., Lee, S., Shan, M.: Online mining (recently) maximal frequent itemsets over data streams. In: Proc. of the fifteenth International Workshops on Research Issues in Data Engineering: Stream Data Mining and Applications, Tokyo, Japan, pp. 11–18. IEEE Press, NJ (2005)
Google Scholar
Lee, D., Lee, W.: Finding maximal frequent itemsets over online data streams adaptively. In: Proc. of the Fifth IEEE International Conference on Data Mining.Houston, USA, pp. 266–273. IEEE Press, NJ (2005)
Google Scholar
Chi, Y., Wang, H., Yu, P.S., Muntz, R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proc. of the fourth IEEE International Conference on Data Mining, UK, pp. 59–66. IEEE Press, NJ (2004)
Google Scholar
Jiang, N., Gruenwald, L.: CFI-Stream: mining closed frequent itemsets in data streams. In: Proc. of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, Philadelphia, PA, USA, 2006, pp. 592–597 (2006)
Google Scholar
Bayardo, R.: Efficiently mining long patterns from databases. In: ACM SIGMOD Conference (1998)
Google Scholar
Agarwal, R., Aggarwal, C., Prasad, V.: A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing (2001)
Google Scholar
Gouda, K., Zaki, M.J.: Efficiently Mining Maximal Frequent Itemsets. In: Proc. of the IEEE Int. Conference on Data Mining, San Jose (2001)
Google Scholar
Rigoutsos, L., Floratos, A.: Combinatorial pattern discovery in biological sequences: The Teiresias algorithm. Bioinformatics 14(1), 55–67 (1998)
Article Google Scholar
Grahne, G., Zhu, J.: Efficiently Using Prefix-trees in Mining Frequent Itemsets. In: Proc. of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, November 19, 2003, Melbourne, Florida, USA (2003)
Google Scholar
Yan, Y., Li, Z., Chen, H.: Fast Mining Maximal Frequent ItemSets Based on FP-Tree. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 475–487. Springer, Heidelberg (2004)
Google Scholar
Zhu, Y., Shasha, D.: StatStream: Statistical monitoring of thousands of data streams in real time. In: Bernstein, P., Ioannidis, Y., Ramakrishnan, R. (eds.) Proc. of the 28th Int’l Conf. on Very Large Data Bases, Hong Kong, pp. 358–369. Morgan Kaufmann, Seattle (2002)
Chapter Google Scholar
Rymon, R.: Search through Systematic Set Enumeration. In: Proc. of Third Int’l Conf. on Principles of Knowledge Representation and Reasoning, pp. 539–550 (1992)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: SIGMOD 2000. Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data, May 2000, Dallas, TX (2000)
Google Scholar
Ma, Z., Chen, X., Wang, X.: Pruning strategy for mining maximal frequent itemsets. Journal of Tsinghua Univ 45(S1), 1748–1752 (2005)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994. Proc. Of the 20th Intl. Conf. on Very Large Databases, Santiago, Chile, September 1994, pp. 487–499 (1994)
Google Scholar
Codes and datasets available at http://fimi.cs.helsinki.fi/

Download references

Author information

Authors and Affiliations

School of Mechanical Engineering and Automation, National University, of Defense Technology, Changsha, 410073, China
Fujiang Ao, Jian Huang & Kedi Huang
School of Computer Science, National University of Defense, Technology, Changsha, 410073, China
Yuejin Yan

Authors

Fujiang Ao
View author publications
You can also search for this author in PubMed Google Scholar
Yuejin Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jian Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kedi Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ao, F., Yan, Y., Huang, J., Huang, K. (2007). Mining Maximal Frequent Itemsets in Data Streams Based on FP-Tree. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-540-73499-4_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics