A False Negative Maximal Frequent Itemset Mining Algorithm over Stream

Li, Haifeng; Zhang, Ning

doi:10.1007/978-3-642-25853-4_3

A False Negative Maximal Frequent Itemset Mining Algorithm over Stream

Haifeng Li²² &
Ning Zhang²²

Conference paper

969 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7120))

Abstract

Maximal frequent itemsets are one of several condensed representations of frequent itemsets, which store most of the information contained in frequent itemsets using less space, thus being more suitable for stream mining. This paper focuses on mining maximal frequent itemsets approximately over a stream landmark model. We separate the continuously arriving transactions into sections and maintain them with 3-tuple lists indexed by an extended direct update tree; thus, an efficient algorithm named FNMFIMoDS is proposed. In our algorithm, we employ the Chernoff Bound to perform the maximal frequent itemset mining in a false negative manner; plus, we classify the itemsets into categories and prune some redundant itemsets, which can further reduce the memory cost, as well guarantee our algorithm conducting with an incremental fashion. Our experimental results on two synthetic datasets and two real world datasets show that with a high precision, FNMFIMoDS achieves a faster speed and a much reduced memory cost in comparison with the state-of-the-art algorithm.

This research is supported by the National Science Foundation of China(61100112), and Program for Innovation Research in Central University of Finance and Economics.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online Algorithms for Mining Semi-structured Data Stream. In: Proc. ICDM (2002)
Google Scholar
Afrati, F., Gionis, A., Mannila, H.: Approximating a Collection of Frequent Sets. In: Proc. SIGKDD (2004)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. VLDB (1994)
Google Scholar
Boulicaut, J., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7, 5–22 (2003)
Article MathSciNet Google Scholar
Calders, T., Dexters, N., Goethals, B.: Mining Frequent Itemsets in a Stream. In: Proc. ICDM (2007)
Google Scholar
Calders, T., Goethals, B.: Mining All Non-Derivable Frequent Itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 74. Springer, Heidelberg (2002)
Chapter Google Scholar
Chang, J.H., Lee, W.S.: Decaying Obsolete Information in Finding Recent Frequent Itemsets over Data Stream. IEICE Transaction on Information and Systems 87(6), 1588–1592 (2004)
Google Scholar
Chang, J.H., Lee, W.S.: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering 20(4), 753–762 (2004)
Google Scholar
Chang, J.H., Lee, W.S.: Finding Recent Frequent Itemsets Adaptively over Online Data Streams. In: Proc. SIGKDD (2003)
Google Scholar
Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding Hierarchical Heavy Hitters in Data Streams. In: Proc. VLDB (2003)
Google Scholar
Cheng, J., Ke, Y., Ng, W.: A Survey on Algorithms for Mining Frequent Itemsets over Data Streams. Knowledge and Information Systems 16(1), 1–27 (2006)
Article Google Scholar
Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not:Tracking Most Frequent Items Dynamically. In: Proc. PODS (2003)
Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Proc. AAAI/MIT (2003)
Google Scholar
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 17, 55–86 (2007)
Article MathSciNet Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. In: DMKD (2004)
Google Scholar
Jin, R., Agrawal, G.: An Algorithm for In-Core Frequent Itemset Mining on Streaming Data. In: Proc. ICDM (2005)
Google Scholar
Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically Maintaining Frequent Items Over A Data Stream. In: Proc. CIKM (2003)
Google Scholar
Kevin, S., Ramakrishnan, R.: Bottom-Up Computation of Sparse and Iceberg CUBEs. In: Proc. SIGMOD (1999)
Google Scholar
Koh, J.-L., Shin, S.-N.: An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 352–362. Springer, Heidelberg (2006)
Chapter Google Scholar
Leung, C.K., Khan, Q.I.: DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams. In: Proc. ICDM (2006)
Google Scholar
Lee, D., Lee, W.: Finding Maximal Frequent Itemsets over Online Data Streams Adaptively. In: Proc. ICDM (2005)
Google Scholar
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Streaming Data. In: Proc. VLDB (2002)
Google Scholar
Mozafari, B., Thakkar, H., Zaniolo, C.: Verifying and Mining Frequent Patterns from Large Windows over Data Streams. In: Proc. ICDE (2008)
Google Scholar
Mao, G., Wu, X., Zhu, X., Chen, G.: Mining Maximal Frequent Itemsets from Data Streams. Journal of Information Science 33(3), 251–262 (2007)
Article Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Chapter Google Scholar
Padmanabhan, B., Tuzhilin, A.: On Characterization and Discovery of Minimal Unexpected Patterns in Rule Discovery. IEEE Transactions on Knowledge and Data Engineering 18(2), 202–216 (2006)
Article Google Scholar
Teng, W., Chen, M., Yu, P.S.: A Regression-Based Temporal Pattern Mining Scheme for Data Streams. In: Proc. VLDB (2003)
Google Scholar
Tao, F., Murtagh, F., Farid, M.: Weighted Association Rule Mining using Weighted Support and Significance Framework. In: Proc. SIGKDD (2003)
Google Scholar
Woo, H.J., Lee, W.S.: estMax: Tracing Maximal Frequent Itemsets over Online Data Streams. In: Proc. ICDM (2007)
Google Scholar
Li, H., Lee, S., Shan, M.: Online Mining(Recently) Maximal Frequent Itemsets over Data Streams. In: Proc. RIDE (2005)
Google Scholar
Gouda, K., Zaki, M.J.: Efficiently Mining Maximal Frequent Itemsets. In: Proc. ICDM (2001)
Google Scholar
Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. In: Proc. SIGMOD (1998)
Google Scholar
Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: Depth First Generation of Long Patterns. In: Proc. SIGKDD (2000)
Google Scholar
Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A Maximal Frequent Itemsets Algorithm for Transactional Databases. In: Proc. ICDE (2001)
Google Scholar
Yang, G.: The Complexity of Mining Maximal Frequent Itemsets and Maximal Frequent Patterns. In: Proc. SIGKDD (2004)
Google Scholar
Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In: Proc. VLDB (2004)
Google Scholar
Sun, X., Orlowska, M.E., Li, X.: Finding frequent itemsets in high-speed data streams. In: Proc. SDM 2006 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information, Central University of Finance and Economics, Beijing, China, 100081
Haifeng Li & Ning Zhang

Authors

Haifeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Ning Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Jie Tang & Jianyong Wang &
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, SAR, China
Irwin King
Faculty of Engineering and Information Technology, University of Technology, 2007, Sydney, NSW, Australia
Ling Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Zhang, N. (2011). A False Negative Maximal Frequent Itemset Mining Algorithm over Stream. In: Tang, J., King, I., Chen, L., Wang, J. (eds) Advanced Data Mining and Applications. ADMA 2011. Lecture Notes in Computer Science(), vol 7120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25853-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-25853-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25852-7
Online ISBN: 978-3-642-25853-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics