Abstract
Maximal frequent itemsets are one of several condensed representations of frequent itemsets, which store most of the information contained in frequent itemsets using less space, thus being more suitable for stream mining. This paper focuses on mining maximal frequent itemsets approximately over a stream landmark model. We separate the continuously arriving transactions into sections and maintain them with 3-tuple lists indexed by an extended direct update tree; thus, an efficient algorithm named FNMFIMoDS is proposed. In our algorithm, we employ the Chernoff Bound to perform the maximal frequent itemset mining in a false negative manner; plus, we classify the itemsets into categories and prune some redundant itemsets, which can further reduce the memory cost, as well guarantee our algorithm conducting with an incremental fashion. Our experimental results on two synthetic datasets and two real world datasets show that with a high precision, FNMFIMoDS achieves a faster speed and a much reduced memory cost in comparison with the state-of-the-art algorithm.
This research is supported by the National Science Foundation of China(61100112), and Program for Innovation Research in Central University of Finance and Economics.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online Algorithms for Mining Semi-structured Data Stream. In: Proc. ICDM (2002)
Afrati, F., Gionis, A., Mannila, H.: Approximating a Collection of Frequent Sets. In: Proc. SIGKDD (2004)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. VLDB (1994)
Boulicaut, J., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7, 5–22 (2003)
Calders, T., Dexters, N., Goethals, B.: Mining Frequent Itemsets in a Stream. In: Proc. ICDM (2007)
Calders, T., Goethals, B.: Mining All Non-Derivable Frequent Itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 74. Springer, Heidelberg (2002)
Chang, J.H., Lee, W.S.: Decaying Obsolete Information in Finding Recent Frequent Itemsets over Data Stream. IEICE Transaction on Information and Systems 87(6), 1588–1592 (2004)
Chang, J.H., Lee, W.S.: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering 20(4), 753–762 (2004)
Chang, J.H., Lee, W.S.: Finding Recent Frequent Itemsets Adaptively over Online Data Streams. In: Proc. SIGKDD (2003)
Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding Hierarchical Heavy Hitters in Data Streams. In: Proc. VLDB (2003)
Cheng, J., Ke, Y., Ng, W.: A Survey on Algorithms for Mining Frequent Itemsets over Data Streams. Knowledge and Information Systems 16(1), 1–27 (2006)
Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not:Tracking Most Frequent Items Dynamically. In: Proc. PODS (2003)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Proc. AAAI/MIT (2003)
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 17, 55–86 (2007)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. In: DMKD (2004)
Jin, R., Agrawal, G.: An Algorithm for In-Core Frequent Itemset Mining on Streaming Data. In: Proc. ICDM (2005)
Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically Maintaining Frequent Items Over A Data Stream. In: Proc. CIKM (2003)
Kevin, S., Ramakrishnan, R.: Bottom-Up Computation of Sparse and Iceberg CUBEs. In: Proc. SIGMOD (1999)
Koh, J.-L., Shin, S.-N.: An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 352–362. Springer, Heidelberg (2006)
Leung, C.K., Khan, Q.I.: DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams. In: Proc. ICDM (2006)
Lee, D., Lee, W.: Finding Maximal Frequent Itemsets over Online Data Streams Adaptively. In: Proc. ICDM (2005)
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Streaming Data. In: Proc. VLDB (2002)
Mozafari, B., Thakkar, H., Zaniolo, C.: Verifying and Mining Frequent Patterns from Large Windows over Data Streams. In: Proc. ICDE (2008)
Mao, G., Wu, X., Zhu, X., Chen, G.: Mining Maximal Frequent Itemsets from Data Streams. Journal of Information Science 33(3), 251–262 (2007)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Padmanabhan, B., Tuzhilin, A.: On Characterization and Discovery of Minimal Unexpected Patterns in Rule Discovery. IEEE Transactions on Knowledge and Data Engineering 18(2), 202–216 (2006)
Teng, W., Chen, M., Yu, P.S.: A Regression-Based Temporal Pattern Mining Scheme for Data Streams. In: Proc. VLDB (2003)
Tao, F., Murtagh, F., Farid, M.: Weighted Association Rule Mining using Weighted Support and Significance Framework. In: Proc. SIGKDD (2003)
Woo, H.J., Lee, W.S.: estMax: Tracing Maximal Frequent Itemsets over Online Data Streams. In: Proc. ICDM (2007)
Li, H., Lee, S., Shan, M.: Online Mining(Recently) Maximal Frequent Itemsets over Data Streams. In: Proc. RIDE (2005)
Gouda, K., Zaki, M.J.: Efficiently Mining Maximal Frequent Itemsets. In: Proc. ICDM (2001)
Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. In: Proc. SIGMOD (1998)
Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: Depth First Generation of Long Patterns. In: Proc. SIGKDD (2000)
Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A Maximal Frequent Itemsets Algorithm for Transactional Databases. In: Proc. ICDE (2001)
Yang, G.: The Complexity of Mining Maximal Frequent Itemsets and Maximal Frequent Patterns. In: Proc. SIGKDD (2004)
Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In: Proc. VLDB (2004)
Sun, X., Orlowska, M.E., Li, X.: Finding frequent itemsets in high-speed data streams. In: Proc. SDM 2006 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, H., Zhang, N. (2011). A False Negative Maximal Frequent Itemset Mining Algorithm over Stream. In: Tang, J., King, I., Chen, L., Wang, J. (eds) Advanced Data Mining and Applications. ADMA 2011. Lecture Notes in Computer Science(), vol 7120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25853-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-25853-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25852-7
Online ISBN: 978-3-642-25853-4
eBook Packages: Computer ScienceComputer Science (R0)