Skip to main content

Mining Maximal Frequent Itemsets in Data Streams Based on FP-Tree

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Abstract

Mining maximal frequent itemsets in data streams is more difficult than mining them in static databases for the huge, high-speed and continuous characteristics of data streams. In this paper, we propose a novel one-pass algorithm called FpMFI-DS, which mines all maximal frequent itemsets in Landmark windows or Sliding windows in data streams based on FP-Tree. A new structure of FP-Tree is designed for storing all transactions in Landmark windows or Sliding windows in data streams. To improve the efficiency of the algorithm, a new pruning technique, extension support equivalency pruning (ESEquivPS), is imported to it. The experiments show that our algorithm is efficient and scalable. It is suitable for mining MFIs both in static database and in data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. of the twenty-first ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems 2002, pp. 1–16 (2002)

    Google Scholar 

  2. Mao, G., Wu, X., Liu, C.: Online Mining of Maximal Frequent Itemsequences from Data Streams. University of Vermont, Computer Science Technical Report, CS-05-07 (2005)

    Google Scholar 

  3. Li, H., Lee, S., Shan, M.: Online mining (recently) maximal frequent itemsets over data streams. In: Proc. of the fifteenth International Workshops on Research Issues in Data Engineering: Stream Data Mining and Applications, Tokyo, Japan, pp. 11–18. IEEE Press, NJ (2005)

    Google Scholar 

  4. Lee, D., Lee, W.: Finding maximal frequent itemsets over online data streams adaptively. In: Proc. of the Fifth IEEE International Conference on Data Mining.Houston, USA, pp. 266–273. IEEE Press, NJ (2005)

    Google Scholar 

  5. Chi, Y., Wang, H., Yu, P.S., Muntz, R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proc. of the fourth IEEE International Conference on Data Mining, UK, pp. 59–66. IEEE Press, NJ (2004)

    Google Scholar 

  6. Jiang, N., Gruenwald, L.: CFI-Stream: mining closed frequent itemsets in data streams. In: Proc. of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, Philadelphia, PA, USA, 2006, pp. 592–597 (2006)

    Google Scholar 

  7. Bayardo, R.: Efficiently mining long patterns from databases. In: ACM SIGMOD Conference (1998)

    Google Scholar 

  8. Agarwal, R., Aggarwal, C., Prasad, V.: A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing (2001)

    Google Scholar 

  9. Gouda, K., Zaki, M.J.: Efficiently Mining Maximal Frequent Itemsets. In: Proc. of the IEEE Int. Conference on Data Mining, San Jose (2001)

    Google Scholar 

  10. Rigoutsos, L., Floratos, A.: Combinatorial pattern discovery in biological sequences: The Teiresias algorithm. Bioinformatics 14(1), 55–67 (1998)

    Article  Google Scholar 

  11. Grahne, G., Zhu, J.: Efficiently Using Prefix-trees in Mining Frequent Itemsets. In: Proc. of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, November 19, 2003, Melbourne, Florida, USA (2003)

    Google Scholar 

  12. Yan, Y., Li, Z., Chen, H.: Fast Mining Maximal Frequent ItemSets Based on FP-Tree. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 475–487. Springer, Heidelberg (2004)

    Google Scholar 

  13. Zhu, Y., Shasha, D.: StatStream: Statistical monitoring of thousands of data streams in real time. In: Bernstein, P., Ioannidis, Y., Ramakrishnan, R. (eds.) Proc. of the 28th Int’l Conf. on Very Large Data Bases, Hong Kong, pp. 358–369. Morgan Kaufmann, Seattle (2002)

    Chapter  Google Scholar 

  14. Rymon, R.: Search through Systematic Set Enumeration. In: Proc. of Third Int’l Conf. on Principles of Knowledge Representation and Reasoning, pp. 539–550 (1992)

    Google Scholar 

  15. Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: SIGMOD 2000. Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data, May 2000, Dallas, TX (2000)

    Google Scholar 

  16. Ma, Z., Chen, X., Wang, X.: Pruning strategy for mining maximal frequent itemsets. Journal of Tsinghua Univ 45(S1), 1748–1752 (2005)

    Google Scholar 

  17. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994. Proc. Of the 20th Intl. Conf. on Very Large Databases, Santiago, Chile, September 1994, pp. 487–499 (1994)

    Google Scholar 

  18. Codes and datasets available at http://fimi.cs.helsinki.fi/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ao, F., Yan, Y., Huang, J., Huang, K. (2007). Mining Maximal Frequent Itemsets in Data Streams Based on FP-Tree. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73499-4_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73498-7

  • Online ISBN: 978-3-540-73499-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics