research-article

Out of Many We are One: Measuring Item Batch with Clock-Sketch

Authors:

Tong YangAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 261 - 273

https://doi.org/10.1145/3448016.3452784

Published: 18 June 2021 Publication History

Get Access

Abstract

Item batch denotes a consecutive sequence of identical items that are close in time in a data stream. It is a useful data stream pattern in cache, burst detection, APT detection, \etc Basic item batch measurement tasks include membership, cardinality, time span and size. Currently, there is no algorithm tailored for item batch measurement. The greatest challenge lies in accurately estimating the time gap between two consecutive identical items. In this paper, we propose Clock-sketch, a framework that introduces the well-known CLOCK algorithm into item batch measurement. The methodology of Clock-sketch is to clean outdated information as much as possible, while guaranteeing that the information of all items visited within the time window $\mathcalT $ is preserved. We conduct experiments on three real-world datasets that feature in item batch pattern. We compare the accuracy and throughput performance of our Clock-sketch against the state-of-the-art and two naive approaches without using Clock-sketch technique. Results of item batch activeness show that Clock-sketch outperforms the state-of-the-art SWAMP in generating 50 times less false positive rate when memory is small. All source codes are open-sourced and released at Github.

Supplementary Material

MP4 File (3448016.3452784.mp4)

Item batch denotes a consecutive sequence of identical items thatare close in time in a data stream. It is an useful data stream patternin cache, burst detection and APT detectionetc.Basic item batchmeasurement tasks include membership, cardinality, time span andsize. Currently, there is no algorithm tailored for item batch mea-surement. The greatest challenge lies in accurately estimating thetime gap between two consecutive identical items. In this paper, wepropose clock-sketch, a framework that introduces the well-knownCLOCK algorithm into item batch measurement. The methodologyof clock-sketch is to clean outdated information as much as possible,while guaranteeing that the information of all items visited withintime windowTis preserved. We conduct experiments on threereal-world datasets which feature in item batch pattern and com-pared accuracy and throughput performance of our clock-sketchagainst the state-of-the-art and two naive approaches without us-ing clock-sketch technique. Results of item batch activeness showthat clock-sketch outperforms the state-of-the-art SWAMP in gen-erating 50 times less false positive rate when memory is small. Allsource codes are open-sourced and released at Github.

Download
134.89 MB

References

[1]

Eran Assaf, Ran Ben Basat, Gil Einziger, and Roy Friedman. 2018. Pay for a sliding bloom filter and get counting, distinct elements, and entropy for free. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 2204--2212.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Two Birds One Stone: On both Cold-Start and Long-Tail Recommendation

Forgetting techniques for stream-based matrix factorization in recommender systems

Improving one-class collaborative filtering by incorporating rich user information

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations