Abstract
A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. The processing time is greatly influenced by the amount of information that should be maintained. This issue becomes more serious in finding frequent itemsets or frequency counting over an online transactional data stream since there can be a large number of itemsets to be monitored. We have proposed a method called theestDec method for finding frequent itemsets over an online data stream. In order to reduce the number of monitored itemsets in this method, monitoring the count of an itemset is delayed until its support is large enough to become a frequent itemset in the near future. For this purpose, the count of an itemset should be estimated. Consequently, how to estimate the count of an itemset is a critical issue in minimizing memory usage as well as processing time. In this paper, the effects of various count estimation methods for finding frequent itemsets are analyzed in terms of mining accuracy, memory usage and processing time.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Garofalakis M, Gehrke J, Rastogi R. Querying and mining data streams: You only get one look. In the Tutorial Notes of the 28th International Conference on Very Large Databases, Hong Kong, China, August 2002.
Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. the 20th Int. Conf. Very Large Databases, Santiago, Chile, September 1994, pp. 487–499.
Brin S, Motwani R, Ullman J D, Tsur S. Dynamic itemset counting and implication rules for market basket data. In Proc. the ACM SIGMOD International Conference on Management of Data, Tucson, AZ, May 1997, pp. 255–264.
Agarwal R C, Aggarwal C C, Prasad V V V. Depth first generation of long patterns. In Proc. the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, September 2000, pp. 108–118.
Hidber C. Online association rule mining. In Proc. the ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, May 1999, pp. 145–156.
Manku G S, Motwani R. Approximate frequency counts over data streams. In Proc. the 28th Int. Conf. Very Large Databases, Hong Kong, China, August 2002, pp. 346–357.
Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In Proc. the 29th Int. Colloquium on Automata, Language and Programming, 2002, pp. 693–703.
Chang J H, Lee W S. Finding recent frequent itemsets adaptively over online data streams. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, August 2003, pp. 487–492.
Author information
Authors and Affiliations
Corresponding author
Additional information
Joong Hyuk Chang received the B.S. and M.S. degrees in computer science from Yonsei University, Seoul, Korea, in 1996 and 1998 respectively. He is currently a Ph.D. candidate at the Department of Computer Science, Yonsei University, Seoul, Korea. His current research interests include mining data streams, query processing in data streams, data stream management systems, and data mining in databases.
Won Suk Lee received the B.S. degree in computer engineering from Boston University, Boston, M.A. and the M.S. and Ph.D. degrees in electrical and computer engineering from Purdue University, West Lafayette, IN. He was a senior staff engineer at Computer Division, Samsung Electronics. He is currently an associate professor of Department of Computer Science at Yonsei University, Korea. His current research interests include data streams, data mining, anomaly intrusion detection, and mediator systems.
Rights and permissions
About this article
Cite this article
Chang, J.H., Lee, W.S. Effect of Count Estimation in Finding Frequent Itemsets over Online Transactional Data Streams. J Comput Sci Technol 20, 63–69 (2005). https://doi.org/10.1007/s11390-005-0007-3
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/s11390-005-0007-3