Skip to main content

A False Negative Maximal Frequent Itemset Mining Algorithm over Stream

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7120))

Abstract

Maximal frequent itemsets are one of several condensed representations of frequent itemsets, which store most of the information contained in frequent itemsets using less space, thus being more suitable for stream mining. This paper focuses on mining maximal frequent itemsets approximately over a stream landmark model. We separate the continuously arriving transactions into sections and maintain them with 3-tuple lists indexed by an extended direct update tree; thus, an efficient algorithm named FNMFIMoDS is proposed. In our algorithm, we employ the Chernoff Bound to perform the maximal frequent itemset mining in a false negative manner; plus, we classify the itemsets into categories and prune some redundant itemsets, which can further reduce the memory cost, as well guarantee our algorithm conducting with an incremental fashion. Our experimental results on two synthetic datasets and two real world datasets show that with a high precision, FNMFIMoDS achieves a faster speed and a much reduced memory cost in comparison with the state-of-the-art algorithm.

This research is supported by the National Science Foundation of China(61100112), and Program for Innovation Research in Central University of Finance and Economics.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online Algorithms for Mining Semi-structured Data Stream. In: Proc. ICDM (2002)

    Google Scholar 

  2. Afrati, F., Gionis, A., Mannila, H.: Approximating a Collection of Frequent Sets. In: Proc. SIGKDD (2004)

    Google Scholar 

  3. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. VLDB (1994)

    Google Scholar 

  4. Boulicaut, J., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7, 5–22 (2003)

    Article  MathSciNet  Google Scholar 

  5. Calders, T., Dexters, N., Goethals, B.: Mining Frequent Itemsets in a Stream. In: Proc. ICDM (2007)

    Google Scholar 

  6. Calders, T., Goethals, B.: Mining All Non-Derivable Frequent Itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 74. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Chang, J.H., Lee, W.S.: Decaying Obsolete Information in Finding Recent Frequent Itemsets over Data Stream. IEICE Transaction on Information and Systems 87(6), 1588–1592 (2004)

    Google Scholar 

  8. Chang, J.H., Lee, W.S.: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering 20(4), 753–762 (2004)

    Google Scholar 

  9. Chang, J.H., Lee, W.S.: Finding Recent Frequent Itemsets Adaptively over Online Data Streams. In: Proc. SIGKDD (2003)

    Google Scholar 

  10. Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding Hierarchical Heavy Hitters in Data Streams. In: Proc. VLDB (2003)

    Google Scholar 

  11. Cheng, J., Ke, Y., Ng, W.: A Survey on Algorithms for Mining Frequent Itemsets over Data Streams. Knowledge and Information Systems 16(1), 1–27 (2006)

    Article  Google Scholar 

  12. Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not:Tracking Most Frequent Items Dynamically. In: Proc. PODS (2003)

    Google Scholar 

  13. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Proc. AAAI/MIT (2003)

    Google Scholar 

  14. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 17, 55–86 (2007)

    Article  MathSciNet  Google Scholar 

  15. Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. In: DMKD (2004)

    Google Scholar 

  16. Jin, R., Agrawal, G.: An Algorithm for In-Core Frequent Itemset Mining on Streaming Data. In: Proc. ICDM (2005)

    Google Scholar 

  17. Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically Maintaining Frequent Items Over A Data Stream. In: Proc. CIKM (2003)

    Google Scholar 

  18. Kevin, S., Ramakrishnan, R.: Bottom-Up Computation of Sparse and Iceberg CUBEs. In: Proc. SIGMOD (1999)

    Google Scholar 

  19. Koh, J.-L., Shin, S.-N.: An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 352–362. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Leung, C.K., Khan, Q.I.: DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams. In: Proc. ICDM (2006)

    Google Scholar 

  21. Lee, D., Lee, W.: Finding Maximal Frequent Itemsets over Online Data Streams Adaptively. In: Proc. ICDM (2005)

    Google Scholar 

  22. Manku, G.S., Motwani, R.: Approximate Frequency Counts over Streaming Data. In: Proc. VLDB (2002)

    Google Scholar 

  23. Mozafari, B., Thakkar, H., Zaniolo, C.: Verifying and Mining Frequent Patterns from Large Windows over Data Streams. In: Proc. ICDE (2008)

    Google Scholar 

  24. Mao, G., Wu, X., Zhu, X., Chen, G.: Mining Maximal Frequent Itemsets from Data Streams. Journal of Information Science 33(3), 251–262 (2007)

    Article  Google Scholar 

  25. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  26. Padmanabhan, B., Tuzhilin, A.: On Characterization and Discovery of Minimal Unexpected Patterns in Rule Discovery. IEEE Transactions on Knowledge and Data Engineering 18(2), 202–216 (2006)

    Article  Google Scholar 

  27. Teng, W., Chen, M., Yu, P.S.: A Regression-Based Temporal Pattern Mining Scheme for Data Streams. In: Proc. VLDB (2003)

    Google Scholar 

  28. Tao, F., Murtagh, F., Farid, M.: Weighted Association Rule Mining using Weighted Support and Significance Framework. In: Proc. SIGKDD (2003)

    Google Scholar 

  29. Woo, H.J., Lee, W.S.: estMax: Tracing Maximal Frequent Itemsets over Online Data Streams. In: Proc. ICDM (2007)

    Google Scholar 

  30. Li, H., Lee, S., Shan, M.: Online Mining(Recently) Maximal Frequent Itemsets over Data Streams. In: Proc. RIDE (2005)

    Google Scholar 

  31. Gouda, K., Zaki, M.J.: Efficiently Mining Maximal Frequent Itemsets. In: Proc. ICDM (2001)

    Google Scholar 

  32. Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. In: Proc. SIGMOD (1998)

    Google Scholar 

  33. Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: Depth First Generation of Long Patterns. In: Proc. SIGKDD (2000)

    Google Scholar 

  34. Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A Maximal Frequent Itemsets Algorithm for Transactional Databases. In: Proc. ICDE (2001)

    Google Scholar 

  35. Yang, G.: The Complexity of Mining Maximal Frequent Itemsets and Maximal Frequent Patterns. In: Proc. SIGKDD (2004)

    Google Scholar 

  36. Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In: Proc. VLDB (2004)

    Google Scholar 

  37. Sun, X., Orlowska, M.E., Li, X.: Finding frequent itemsets in high-speed data streams. In: Proc. SDM 2006 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, H., Zhang, N. (2011). A False Negative Maximal Frequent Itemset Mining Algorithm over Stream. In: Tang, J., King, I., Chen, L., Wang, J. (eds) Advanced Data Mining and Applications. ADMA 2011. Lecture Notes in Computer Science(), vol 7120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25853-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25853-4_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25852-7

  • Online ISBN: 978-3-642-25853-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics