research-article

High throughput heavy hitter aggregation for modern SIMD processors

Authors:
Orestis Polychroniou

Columbia University

Columbia University
View Profile

,
Kenneth A. Ross

Columbia University

Columbia University
View Profile

DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New HardwareJune 2013Article No.: 6Pages 1–6https://doi.org/10.1145/2485278.2485284

Published:24 June 2013Publication History

DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware

Pages 1–6

ABSTRACT

Heavy hitters are data items that occur at high frequency in a data set. They are among the most important items for an organization to summarize and understand during analytical processing. In data sets with sufficient skew, the number of heavy hitters can be relatively small. We take advantage of this small footprint to compute aggregate functions for the heavy hitters in fast cache memory in a single pass.

We design cache-resident, shared-nothing structures that hold only the most frequent elements. Our algorithm works in three phases. It first samples and picks heavy hitter candidates. It then builds a hash table and computes the exact aggregates of these elements. Finally, a validation step identifies the true heavy hitters from among the candidates.

We identify trade-offs between the hash table configuration and performance. Configurations consist of the probing algorithm and the table capacity that determines how many candidates can be aggregated. The probing algorithm can be perfect hashing, cuckoo hashing and bucketized hashing to explore trade-offs between size and speed.

We optimize performance by the use of SIMD instructions, utilized in novel ways beyond single vectorized operations, to minimize cache accesses and the instruction footprint.

References

P. Boncz, M. Zukowski, and N. Nes. Monetdb/x100: Hyper-pipelining query execution. In CIDR, 2005.Google Scholar
M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In ICALP, 2002. Google ScholarDigital Library
J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. In VLDB, 2007. Google ScholarDigital Library
J. Cieslewicz, K. A. Ross, K. Satsumi, and Y. Ye. Automatic contention detection and amelioration for data-intensive operations. In SIGMOD, 2010. Google ScholarDigital Library
G. Cormode et al. An improved data stream summary: the count-min sketch and its applications. J. Algo., 55(1), 2005. Google ScholarDigital Library
G. Cormode and M. Hadjieleftheriou. Finding frequent items in data streams. In VLDB, 2008. Google ScholarDigital Library
M. Dietzfelbinger et al. A reliable randomized algorithm for the closest-pair problem. J. Algorithms, 25(1), 1997. Google ScholarDigital Library
M. Dietzfelbinger and U. Schellbach. Weaknesses of cuckoo hashing with a simple universal hash class: The case of large universes. In SOFSEM, 2009. Google ScholarDigital Library
W. J. Ewens and H. S. Wilf. Computing the distribution of the maximum in balls-and-boxes problems with application to clusters of disease cases. PNAS, 104(27), 2007.Google Scholar
R. M. Karp et al. A simple algorithm for finding frequent elements in streams and bags. ACM T. Dat. S., 28(1), 2003. Google ScholarDigital Library
S. Manegold et al. Optimizing database architecture for the new bottleneck: memory access. VLDB J., 9(3), 2000. Google ScholarDigital Library
G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In VLDB, 2002. Google ScholarDigital Library
A. Metwally, D. Agrawal, and A. E. Abbadi. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst., 31(3), 2006. Google ScholarDigital Library
J. Misra and D. Gries. Finding repeating elements. Technical report, Cornell University, 1982. Google ScholarDigital Library
T. Neumann. Efficiently compiling efficient query plans for modern hardware. VLDB, 4(9), 2011. Google ScholarDigital Library
R. Pagh et al. Cuckoo hashing. J. Algorithms, 51(2), 2004. Google ScholarDigital Library
K. A. Ross. Efficient hash probes on modern processors. In ICDE, 2007.Google ScholarCross Ref
P. Roy, J. Teubner, and G. Alonso. Efficient frequent item counting in multi-core hardware. In KDD, 2012. Google ScholarDigital Library
Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In DaMoN, 2011. Google ScholarDigital Library
J. Zhou and K. A. Ross. Implementing database operations using simd instructions. In SIGMOD, 2002. Google ScholarDigital Library
M. Zukowski, S. Héman, and P. Boncz. Architecture-conscious hashing. In DaMoN, 2006. Google ScholarDigital Library

Recommendations

High Throughput Hierarchical Heavy Hitter Detection in Data Streams
HIPC '15: Proceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Detecting heavy activity aggregation in data streams is a critical task for many networking, data base and data-mining applications. The aggregation points often belong to hierarchical domains (e.g. IP domain, XML data tree, etc.). These aggregation ...
Read More
High Throughput Sketch Based Online Heavy Hitter Detection on FPGA
HEART '15

In the context of networking, a heavy hitter is an entity in a data stream whose amount of activity (such as bandwidth consumption or number of connections) is higher than a given threshold. Detecting heavy hitters is a critical task for network ...
Read More
High bandwidth cache design for superscalar processors
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware
June 2013
65 pages
ISBN:9781450321969
DOI:10.1145/2485278
Conference Chairs:
Ryan Johnson
University of Toronto
,
Alfons Kemper
Technische Universität München
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate80of102submissions,78%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 439
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

High throughput heavy hitter aggregation for modern SIMD processors

DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Recommendations

High Throughput Hierarchical Heavy Hitter Detection in Data Streams

High Throughput Sketch Based Online Heavy Hitter Detection on FPGA

High bandwidth cache design for superscalar processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

High throughput heavy hitter aggregation for modern SIMD processors

DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Recommendations

High Throughput Hierarchical Heavy Hitter Detection in Data Streams

High Throughput Sketch Based Online Heavy Hitter Detection on FPGA

High bandwidth cache design for superscalar processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media