research-article

Mining top-k frequent items in a data stream with flexible sliding windows

Authors:

Hoang Thanh Lam,

Toon CaldersAuthors Info & Claims

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 283 - 292

https://doi.org/10.1145/1835804.1835842

Published: 25 July 2010 Publication History

Get Access

Abstract

We study the problem of finding the k most frequent items in a stream of items for the recently proposed max-frequency measure. Based on the properties of an item, the max-frequency of an item is counted over a sliding window of which the length changes dynamically. Besides being parameterless, this way of measuring the support of items was shown to have the advantage of a faster detection of bursts in a stream, especially if the set of items is heterogeneous. The algorithm that was proposed for maintaining all frequent items, however, scales poorly when the number of items becomes large. Therefore, in this paper we propose, instead of reporting all frequent items, to only mine the top-k most frequent ones. First we prove that in order to solve this problem exactly, we still need a prohibitive amount of memory (at least linear in the number of items). Yet, under some reasonable conditions, we show both theoretically and empirically that a memory-efficient algorithm exists. A prototype of this algorithm is implemented and we present its performance w.r.t. memory-efficiency on real-life data and in controlled experiments with synthetic data.

Supplementary Material

JPG File (kdd2010_thanh_lam_mtf_01.jpg)

Download
8.54 KB

MOV File (kdd2010_thanh_lam_mtf_01.mov)

Download
109.04 MB

References

[1]

T. Calders, N. Dexters, and B. Goethals. Mining frequent itemsets in a stream. In ICDM, pages 83--92, 2007.

Digital Library

Google Scholar

[2]

T. Calders, N. Dexters, and B. Goethals. Mining frequent items in a stream using exible windows. Intell. Data Anal., 12(3):293--304, 2008.

Digital Library

Google Scholar

[3]

E. D. Demaine, A. Lopez-Ortiz, and J. I. Munro. Frequency estimation of internet packet streams with limited space. In ESA '02: Proceedings of the 10th Annual European Symposium on Algorithms, pages 348--360, London, UK, 2002. Springer-Verlag.

Digital Library

Google Scholar

[4]

P. Flajolet, D. Gardy, and L. Thimonier. Birthday paradox, coupon collectors, caching algorithms and self-organizing search. Discrete Appl. Math., 39(3):207--229, 1992.

Digital Library

Google Scholar

[5]

R. E. Giannella C., Han J. and L. Chao Mining frequent itemsets over arbitrary time intervals in data streams. In Technical Report TR587 at Indiana University, Bloomington, 37 pages, 2003.

Google Scholar

[6]

L. K. Lee and H. F. Ting. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In PODS '06: Proceedings of the twenty-fifth ACM SIGMOD, pages 290--297, New York, NY, USA, 2006. ACM.

Digital Library

Google Scholar

[7]

R. Motwani and P. Raghavan. Randomized algorithms. ACM Comput. Surv., 28(1):33--37, 1996.

Digital Library

Google Scholar

[8]

A. N. Myers and H. S. Wilf. Some new aspects of the coupon collector's problem. SIAM J. Discret. Math., 17(1):1--17, 2004.

Digital Library

Google Scholar

Cited By

View all

Martínez CSolera-Pardo G(2023)LotterySampling: A Randomized Algorithm for the Heavy Hitters and Top-k Problems in Data StreamsComputing and Combinatorics10.1007/978-3-031-22105-7_3(24-35)Online publication date: 1-Jan-2023
https://doi.org/10.1007/978-3-031-22105-7_3
Chen JZheng HLi PZhang ZLi HLiu W(2020)Fuzzy Association Rule Mining Algorithm Based on Load ClassifierData Science10.1007/978-981-15-2810-1_18(178-191)Online publication date: 2-Feb-2020
https://doi.org/10.1007/978-981-15-2810-1_18
Zitouni MAkbarinia RYahia SMasseglia FHaddad HWainwright RChbeir R(2018)Maximally informative k-itemset mining from massively distributed data streamsProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167187(502-509)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.1145/3167132.3167187
Show More Cited By

Index Terms

Mining top-k frequent items in a data stream with flexible sliding windows
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Frequent Items Mining on Data Stream Based on Weighted Counts
CYBERC '11: Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

Frequent items mining is an important data mining task with many real-world applications. By considering different weights of the items, weighted frequent items mining can discover more important knowledge compared to traditional frequent patterns ...
Mining top-k frequent patterns over data streams sliding window

Frequent pattern mining in data streams is an important research topic in the data mining community. In previous studies, a minimum support threshold was assumed to be available for mining frequent patterns. However, setting such a threshold is ...
Incremental mining of closed inter-transaction itemsets over data stream sliding windows

Mining inter-transaction association rules is one of the most interesting issues in data mining research. However, in a data stream environment the previous approaches are unable to find the result of the new-incoming data and the original database ...

Comments

Information & Contributors

Information

Published In

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

July 2010

1240 pages

ISBN:9781450300551

DOI:10.1145/1835804

General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '10

Sponsor:

KDD '10: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

July 25 - 28, 2010

DC, Washington, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
954
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Martínez CSolera-Pardo G(2023)LotterySampling: A Randomized Algorithm for the Heavy Hitters and Top-k Problems in Data StreamsComputing and Combinatorics10.1007/978-3-031-22105-7_3(24-35)Online publication date: 1-Jan-2023
https://doi.org/10.1007/978-3-031-22105-7_3
Chen JZheng HLi PZhang ZLi HLiu W(2020)Fuzzy Association Rule Mining Algorithm Based on Load ClassifierData Science10.1007/978-981-15-2810-1_18(178-191)Online publication date: 2-Feb-2020
https://doi.org/10.1007/978-981-15-2810-1_18
Zitouni MAkbarinia RYahia SMasseglia FHaddad HWainwright RChbeir R(2018)Maximally informative k-itemset mining from massively distributed data streamsProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167187(502-509)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.1145/3167132.3167187
Yang RXu MJones PSamatova N(2017)Real time utility-based recommendation for revenue optimization via an adaptive online Top-K high utility itemsets mining model2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)10.1109/FSKD.2017.8393050(1859-1866)Online publication date: Jul-2017
https://doi.org/10.1109/FSKD.2017.8393050
Yagci AAytekin TGurgen F(2017)Scalable and adaptive collaborative filtering by mining frequent item co-occurrences in a user feedback streamEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.10.01158(171-184)Online publication date: Feb-2017
https://doi.org/10.1016/j.engappai.2016.10.011
Bustio LCumplido RHernández RBande JFeregrino C(2016)Frequent Itemsets Mining in Data Streams Using Reconfigurable HardwareNew Frontiers in Mining Complex Patterns10.1007/978-3-319-39315-5_3(32-45)Online publication date: 18-May-2016
https://doi.org/10.1007/978-3-319-39315-5_3
Bustio LCumplido RHernández RBande JFeregrino C(2015)Frequent itemsets mining in data streams using reconfigurable hardwareProceedings of the 4th International Conference on New Frontiers in Mining Complex Patterns10.5555/3122094.3122098(32-45)Online publication date: 7-Sep-2015
https://dl.acm.org/doi/10.5555/3122094.3122098
Phridviraj MRadhaKrishna VSrinivas CGuruRao C(2015)A Novel Gaussian Based Similarity Measure for Clustering Customer Transactions Using Transaction Sequence VectorProcedia Technology10.1016/j.protcy.2015.02.12619(880-887)Online publication date: 2015
https://doi.org/10.1016/j.protcy.2015.02.126
PhridviRaj MGuruRao C(2014)Data Mining – Past, Present and Future – A Typical Survey on Data StreamsProcedia Technology10.1016/j.protcy.2013.12.48312(255-263)Online publication date: 2014
https://doi.org/10.1016/j.protcy.2013.12.483
PhridviRaj Srinivas CGuruRao C(2014)Clustering Text Data Streams – A Tree based Approach with Ternary Function and Ternary Feature VectorProcedia Computer Science10.1016/j.procs.2014.05.35031(976-984)Online publication date: 2014
https://doi.org/10.1016/j.procs.2014.05.350
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Frequent Items Mining on Data Stream Based on Weighted Counts

Mining top-k frequent patterns over data streams sliding window

Incremental mining of closed inter-transaction itemsets over data stream sliding windows

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations