skip to main content
10.1145/1835804.1835842acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Mining top-k frequent items in a data stream with flexible sliding windows

Published: 25 July 2010 Publication History

Abstract

We study the problem of finding the k most frequent items in a stream of items for the recently proposed max-frequency measure. Based on the properties of an item, the max-frequency of an item is counted over a sliding window of which the length changes dynamically. Besides being parameterless, this way of measuring the support of items was shown to have the advantage of a faster detection of bursts in a stream, especially if the set of items is heterogeneous. The algorithm that was proposed for maintaining all frequent items, however, scales poorly when the number of items becomes large. Therefore, in this paper we propose, instead of reporting all frequent items, to only mine the top-k most frequent ones. First we prove that in order to solve this problem exactly, we still need a prohibitive amount of memory (at least linear in the number of items). Yet, under some reasonable conditions, we show both theoretically and empirically that a memory-efficient algorithm exists. A prototype of this algorithm is implemented and we present its performance w.r.t. memory-efficiency on real-life data and in controlled experiments with synthetic data.

Supplementary Material

JPG File (kdd2010_thanh_lam_mtf_01.jpg)
MOV File (kdd2010_thanh_lam_mtf_01.mov)

References

[1]
T. Calders, N. Dexters, and B. Goethals. Mining frequent itemsets in a stream. In ICDM, pages 83--92, 2007.
[2]
T. Calders, N. Dexters, and B. Goethals. Mining frequent items in a stream using exible windows. Intell. Data Anal., 12(3):293--304, 2008.
[3]
E. D. Demaine, A. Lopez-Ortiz, and J. I. Munro. Frequency estimation of internet packet streams with limited space. In ESA '02: Proceedings of the 10th Annual European Symposium on Algorithms, pages 348--360, London, UK, 2002. Springer-Verlag.
[4]
P. Flajolet, D. Gardy, and L. Thimonier. Birthday paradox, coupon collectors, caching algorithms and self-organizing search. Discrete Appl. Math., 39(3):207--229, 1992.
[5]
R. E. Giannella C., Han J. and L. Chao Mining frequent itemsets over arbitrary time intervals in data streams. In Technical Report TR587 at Indiana University, Bloomington, 37 pages, 2003.
[6]
L. K. Lee and H. F. Ting. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In PODS '06: Proceedings of the twenty-fifth ACM SIGMOD, pages 290--297, New York, NY, USA, 2006. ACM.
[7]
R. Motwani and P. Raghavan. Randomized algorithms. ACM Comput. Surv., 28(1):33--37, 1996.
[8]
A. N. Myers and H. S. Wilf. Some new aspects of the coupon collector's problem. SIAM J. Discret. Math., 17(1):1--17, 2004.

Cited By

View all
  • (2023)LotterySampling: A Randomized Algorithm for the Heavy Hitters and Top-k Problems in Data StreamsComputing and Combinatorics10.1007/978-3-031-22105-7_3(24-35)Online publication date: 1-Jan-2023
  • (2020)Fuzzy Association Rule Mining Algorithm Based on Load ClassifierData Science10.1007/978-981-15-2810-1_18(178-191)Online publication date: 2-Feb-2020
  • (2018)Maximally informative k-itemset mining from massively distributed data streamsProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167187(502-509)Online publication date: 9-Apr-2018
  • Show More Cited By

Index Terms

  1. Mining top-k frequent items in a data stream with flexible sliding windows

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
    July 2010
    1240 pages
    ISBN:9781450300551
    DOI:10.1145/1835804
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data stream mining
    2. top-k frequent items

    Qualifiers

    • Research-article

    Conference

    KDD '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)LotterySampling: A Randomized Algorithm for the Heavy Hitters and Top-k Problems in Data StreamsComputing and Combinatorics10.1007/978-3-031-22105-7_3(24-35)Online publication date: 1-Jan-2023
    • (2020)Fuzzy Association Rule Mining Algorithm Based on Load ClassifierData Science10.1007/978-981-15-2810-1_18(178-191)Online publication date: 2-Feb-2020
    • (2018)Maximally informative k-itemset mining from massively distributed data streamsProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167187(502-509)Online publication date: 9-Apr-2018
    • (2017)Real time utility-based recommendation for revenue optimization via an adaptive online Top-K high utility itemsets mining model2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)10.1109/FSKD.2017.8393050(1859-1866)Online publication date: Jul-2017
    • (2017)Scalable and adaptive collaborative filtering by mining frequent item co-occurrences in a user feedback streamEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.10.01158(171-184)Online publication date: Feb-2017
    • (2016)Frequent Itemsets Mining in Data Streams Using Reconfigurable HardwareNew Frontiers in Mining Complex Patterns10.1007/978-3-319-39315-5_3(32-45)Online publication date: 18-May-2016
    • (2015)Frequent itemsets mining in data streams using reconfigurable hardwareProceedings of the 4th International Conference on New Frontiers in Mining Complex Patterns10.5555/3122094.3122098(32-45)Online publication date: 7-Sep-2015
    • (2015)A Novel Gaussian Based Similarity Measure for Clustering Customer Transactions Using Transaction Sequence VectorProcedia Technology10.1016/j.protcy.2015.02.12619(880-887)Online publication date: 2015
    • (2014)Data Mining – Past, Present and Future – A Typical Survey on Data StreamsProcedia Technology10.1016/j.protcy.2013.12.48312(255-263)Online publication date: 2014
    • (2014)Clustering Text Data Streams – A Tree based Approach with Ternary Function and Ternary Feature VectorProcedia Computer Science10.1016/j.procs.2014.05.35031(976-984)Online publication date: 2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media