A Two-List Framework for Accurate Detection of Frequent Items in Data Streams

Vengerov, David

doi:10.1007/978-3-319-96136-1_19

David Vengerov¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10934))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

1819 Accesses
1 Citations

Abstract

The problem of detecting the most frequent items in large data sets and providing accurate frequency estimates for those items is becoming more and more important in a variety of domains. We propose a new two-list framework for addressing this problem, which extends the state-of-the-art Filtered Space-Saving (FSS) algorithm. An algorithm called FSSA giving an efficient array-based implementation of this framework is presented. An adaptive version of this algorithm is also presented, which adjusts the relative sizes of the two lists based on the estimated number of distinct keys in the data set. Analytical comparison with the FSS algorithm showed that FSSA has smaller expected frequency estimation errors, and experiments on both artificial and real workloads confirm this result. A theoretical analysis of space and time complexity for FSSA and its benchmark algorithms was performed. Finally, we showed that FSS2L framework can be naturally parallelized, leading to a linear decrease in the maximum frequency estimation error.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On the design of hardware-software architectures for frequent itemsets mining on data streams

Article 16 May 2017

Frequent Itemset Mining

A Review of Scalable Approaches for Frequent Itemset Mining

References

Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. VLDB Endowment 1(2), 1530–1541 (2008)
Article Google Scholar
Das, S., Antony, S., Agrawal, D., El Abbadi, A.: Thread cooperation in multicore architectures for frequency counting over multiple data streams. VLDB Endowment 2(1), 217–228 (2009)
Article Google Scholar
Demaine, E., López-Ortiz A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Proceedings of the European Symposium on Algorithms (ESA), pp. 348–360 (2002)
Chapter Google Scholar
Flajolet, P., Fusy, E., Gandouet, O., Meunier, F.: Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of the 13th Conference on Analysis of Algorithm, pp. 127–146 (2007)
Google Scholar
Homem, N., Carvalho, J.: Finding top-k elements in data streams. Inf. Sci. 180(24), 4958–4974 (2010)
Article Google Scholar
Manku, G., Motwani R.: Approximate frequency counts over data streams. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB), pp. 346–357. Morgan Kaufmann, Hong Kong (2002)
Chapter Google Scholar
Manerikar, N., Palpanas, T.: Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415–430 (2009)
Article Google Scholar
Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 398–412. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30570-5_27
Chapter Google Scholar
Open-Source Data Mining Library. http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php

Download references

Author information

Authors and Affiliations

Oracle Labs, Belmont, CA, 94002, USA
David Vengerov

Authors

David Vengerov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Vengerov .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vengerov, D. (2018). A Two-List Framework for Accurate Detection of Frequent Items in Data Streams. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10934. Springer, Cham. https://doi.org/10.1007/978-3-319-96136-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-96136-1_19
Published: 08 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96135-4
Online ISBN: 978-3-319-96136-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics