Loading web-font TeX/Math/Italic
A Probabilistic Sketch for Summarizing Cold Items of Data Streams | IEEE Journals & Magazine | IEEE Xplore

A Probabilistic Sketch for Summarizing Cold Items of Data Streams


Abstract:

Conventional sketches on counting stream item frequencies use hash functions for mapping data items to a concise structure, e.g., a two-dimensional array, at the expense ...Show More

Abstract:

Conventional sketches on counting stream item frequencies use hash functions for mapping data items to a concise structure, e.g., a two-dimensional array, at the expense of overcounting due to hashing collisions. Despite the popularity, it is still challenging to handle cold (low-frequency) items, especially when the space is limited. The cold items can be misreported as hot (high-frequency) items as the accumulation of error in hashing collisions, leading to the estimation accuracy degrading. We find that a streaming item can be split into a set of compactly stored basic elements, which can be recomposed in a probabilistic manner to estimate the frequency of an item. Thus, we design a novel decomposition and recomposition framework, called the XY- sketch, which estimates the frequency of a stream item by estimating the probability of basic elements appearing in the data stream. By improving the estimation accuracy of cold items, we show that advanced streaming queries, such as top- k queries and heavy change queries. Throughout, we conduct theoretical analysis and optimizations under space constraints. Experiments on real datasets are conducted to examine the effectiveness of our proposals.
Published in: IEEE/ACM Transactions on Networking ( Volume: 32, Issue: 2, April 2024)
Page(s): 1287 - 1302
Date of Publication: 05 October 2023

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.