Skip to main content

Frequent Pattern Mining in Data Streams

  • Chapter

Part of the book series: Advances in Database Systems ((ADBS,volume 31))

Abstract

Frequent pattern mining is a core data mining operation and has been extensively studied over the last decade. Recently, mining frequent patterns over data streams have attracted a lot of research interests. Compared with other streaming queries, frequent pattern mining poses great challenges due to high memory and computational costs, and accuracy requirement of the mining results.

In this chapter, we overview the state-of-art techniques to mine frequent patterns over data streams. We also introduce a new approach for this problem, which makes two major contributions. First, this one pass algorithm for frequent itemset mining has deterministic bounds on the accuracy, and does not require any out-of-core summary structure. Second, because the one pass algorithm does not produce any false negatives, it can be easily extended to a two pass accurate algorithm. The two pass algorithm is very memory efficient.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonent, and A. Inkeri Verkamo. Fast discovery of association rules. In U. Fayyad and et al, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  2. Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD Conference, pages 207–216, May 1993.

    Google Scholar 

  3. Tatsuya Asai, Hiroki Arimura, Kenji Abe, Shinji Kawasoe, and Setsuo Arikawa. Online algorithms for mining semi-structured data stream. In ICDM, pages 27–34, 2002.

    Google Scholar 

  4. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and Issues in Data Stream Systems. In Proceedings of the 2002 ACM Symposium on Principles of Database Systems (PODS 2002) (Invited Paper). ACM Press, June 2002.

    Google Scholar 

  5. B. Babcock, S. Chaudhuri, and G. Das. Dynamic Sampling for Approximate Query Processing. In Proceedings of the 2003 ACM SIGMOD Conference. ACM Press, June 2003.

    Google Scholar 

  6. Herve; Bronnimann, Bin Chen, Manoranjan Dash, Peter Haas, and Peter Scheuermann. Efficient data reduction with ease. In KDD’ 03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 59–68, 2003.

    Google Scholar 

  7. Joong Hyuk Chang and Won Suk Lee. Finding recent frequent itemsets adaptively over online data streams. In KDD’ 03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003.

    Google Scholar 

  8. Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP’ 02: Proceedings of the 29th International Colloquium on Automata, Languages and Programming, 2002.

    Google Scholar 

  9. Bin Chen, Peter Haas, and Peter Scheuermann. A new two-phase sampling based algorithm for discovering association rules. In KDD’ 02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 462–468, 2002.

    Google Scholar 

  10. D. Cheung, J. Han, V. NG, and C. Wong. Maintenance of discovered association rules in large databases: an incremental updating technique. In ICDE, 1996.

    Google Scholar 

  11. Yun Chi, Haixun Wang, Philip S. Yu, and Richard R. Muntz. Moment: Maintaining closed frequent itemsets over a stream sliding window. In ICDM, pages 59–66, 2004.

    Google Scholar 

  12. Yun Chi, Yirong Yang, and Richard R. Muntz. Hybridtreeminer: An efficient algorithm for mining frequent rooted trees and free trees using canonical forms. In The 16th International Conference on Scientific and Statistical Database Management (SSDBM’04), 2004.

    Google Scholar 

  13. G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. Comparing Data Streams Using Hamming Norms. In Proceedings of Conference on Very Large Data Bases (VLDB), pages 335–345, 2002.

    Google Scholar 

  14. Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava. Finding hierarchical heavy hitters in data streams. In VLDB, pages 464–475, 2003.

    Google Scholar 

  15. C. Giannella, Jiawei Han, Jian Pei, Xifeng Yan, and P. S. Yu. Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In Proceedings of the NSF Workshop on Next Generation Data Mining, November 2002.

    Google Scholar 

  16. Phillip B. Gibbons and Yossi Matias. New sampling-based summary statistics for improving approximate query answers. In ACM SIGMOD, pages 331–342, 1998.

    Google Scholar 

  17. Bart Goethals and Mohammed J. Zaki. Workshop Report on Workshop on Frequent Itemset Mining Implementations (FIMI). 2003.

    Google Scholar 

  18. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the ACM SIGMOD Conference on Management of Data, 2000.

    Google Scholar 

  19. C. Hidber. Online Association Rule Mining. In Proceedings of ACM SIGMOD Conference on Management of Data, pages 145–156. ACM Press, 1999.

    Google Scholar 

  20. Jun Huan, Wei Wang, Deepak Bandyopadhyay, Jack Snoeyink, Jan Prins, and Alexander Tropsha. Mining protein family-specific residue packing patterns from protein structure graphs. In Eighth International Conference on Research in Computational Molecular Biology (RECOMB), pages 308–315, 2004.

    Google Scholar 

  21. Akihiro Inokuchi, Takashi Washio, and Hiroshi Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Principles of Knowledge Discovery and Data Mining (PKDD2000), pages 13–23, 2000.

    Google Scholar 

  22. R. Jin and G. Agrawal. An algorithm for in-core frequent itemset mining on streaming data. In ICDM, November 2005.

    Google Scholar 

  23. Ruoming Jin and Gagan Agrawal. An algorithm for in-core frequent itemset mining on streaming data. Technical Report OSU-CISRC-2/04-TR14, Ohio State University, 2004.

    Google Scholar 

  24. Ruoming Jin and Gagan Agrawal. A systematic approach for optimizing complex mining tasks on multiple datasets. In Proceedings of ICDE, 2005.

    Google Scholar 

  25. Richard M. Karp, Christos H. Papadimitriou, and Scott Shanker. A Simple Algorithm for Finding Frequent Elements in Streams and Bags. Available from http://www.cs.berkeley.edu/christos/iceberg.ps, 2002.

    Google Scholar 

  26. Michihiro Kuramochi and George Karypis. Frequent subgraph discovery. In ICDM’ 01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 313–320, 2001.

    Google Scholar 

  27. Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, and Christopher Olston. Finding (recently) frequent items in distributed data streams. In ICDE’ 05: Proceedings of the 21st International Conference on Data Engineering (ICDE’05), pages 767–778, 2005.

    Google Scholar 

  28. G. S. Manku and R. Motwani. Approximate Frequency Counts Over Data Streams. In Proceedings of Conference on Very Large DataBases (VLDB), pages 346–357, 2002.

    Google Scholar 

  29. A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In 21th VLDB Conf., 1995.

    Google Scholar 

  30. Wei-Guang Teng, Ming-Syan Chen, and Philip S. Yu. A regression-based temporal pattern mining scheme for data streams. In VLDB, pages 93–104, 2003.

    Google Scholar 

  31. H. Toivonen. Sampling large databases for association rules. In 22nd VLDB Conf., 1996.

    Google Scholar 

  32. Dong Xin, Jiawei Han, Xifeng Yan, and Hong Cheng. Mining compressed frequent-pattern sets. In VLDB, pages 709–720, 2005.

    Google Scholar 

  33. Xifeng Yan and Jiawei Han. gspan: Graph-based substructure pattern mining. In ICDM’ 02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), page 721, 2002.

    Google Scholar 

  34. Jeffrey Xu Yu, Zhihong Chong, Hongjun Lu, and Aoying Zhou. False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, Aug 2004.

    Google Scholar 

  35. M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: An International Journal, 1(4):343–373, December 1997.

    Article  Google Scholar 

  36. Mohammed J. Zaki. Efficiently mining frequent trees in a forest. In KDD’ 02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71–80, 2002.

    Google Scholar 

  37. Mohammed J. Zaki and Charu C. Aggarwal. Xrules: an effective structural classifier for xml data. In KDD’ 03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 316–325, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Jin, R., Agrawal, G. (2007). Frequent Pattern Mining in Data Streams. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-47534-9_4

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-28759-1

  • Online ISBN: 978-0-387-47534-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics