Article

Hardware-Conscious Sliding Window Aggregation on GPUs

Authors:
Georgios Michas

National and Kapodistrian, University of Athens, Greece

National and Kapodistrian, University of Athens, Greece
View Profile

,
Periklis Chrysogelos

EPFL, Switzerland

EPFL, Switzerland
View Profile

,
Ioannis Mytilinis

EPFL, Switzerland

EPFL, Switzerland
View Profile

,
Anastasia Ailamaki

EPFL, RAW Labs SA, Switzerland

EPFL, RAW Labs SA, Switzerland
View Profile

DAMON '21: Proceedings of the 17th International Workshop on Data Management on New HardwareJune 2021Article No.: 13Pages 1–5https://doi.org/10.1145/3465998.3466014

Published:20 June 2021Publication History

DAMON '21: Proceedings of the 17th International Workshop on Data Management on New Hardware

Pages 1–5

ABSTRACT

Stream Processing Engines (SPEs) have recently begun utilizing heterogeneous coprocessors (e.g., GPUs) to meet the velocity requirements of modern real-time applications. The massive parallelism and high memory bandwidth of GPUs can significantly increase processing throughput in data-intensive streaming scenarios, such as windowed aggregations. However, previous research only focused on the overall architecture of hybrid CPU-GPU streaming systems and the need for efficient in-GPU window operators was overshadowed by the limited interconnect bandwidth.

With aggregation taking up a significant portion of streaming workloads, in this work, we analyze and optimize the performance of sliding window aggregates over GPUs. Current implementations under-utilize the hardware, and for a range of query parameters they cannot even saturate the bandwidth of the interconnect. To optimize execution, we first evaluate the fundamental building blocks of streaming aggregation for GPUs and identify the performance bottlenecks. Then, we build Slider: an adaptive algorithm that selects the most appropriate primitives and kernel configurations based on the query parameters. Our evaluation shows that Slider outperforms previous approaches by 3×-1250×, and saturates both the interconnect and the memory bandwidth for a wide range of examined input workloads.

References

Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).Google Scholar
Periklis Chrysogelos, Manos Karpathiotakis, Raja Appuswamy, and Anastasia Ailamaki. 2019. HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines. Proc. VLDB Endow. 12, 5 (2019), 544--556. https://doi.org/10.14778/3303753.3303760Google ScholarDigital Library
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data mining and knowledge discovery 1, 1 (1997), 29--53.Google Scholar
Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, Alexander L Wolf, Paolo Costa, and Peter Pietzuch. 2016. Saber: Window-based hybrid stream processing for heterogeneous architectures. In Proceedings of the 2016 International Conference on Management of Data. 555--569.Google ScholarDigital Library
Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A Tucker. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. Acm Sigmod Record 34, 1 (2005), 39--44.Google ScholarDigital Library
Duane Merrill and Michael Garland. 2016. Single-pass parallel prefix scan with decoupled look-back. NVIDIA, Tech. Rep. NVR-2016-002 (2016).Google Scholar
Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu. 2015. General incremental sliding-window aggregation. Proceedings of the VLDB Endowment 8, 7 (2015), 702--713.Google ScholarDigital Library
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. 2014. Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 147--156.Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, April 25-27, 2012, Steven D. Gribble and Dina Katabi (Eds.). USENIX Association, 15--28. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zahariaGoogle ScholarDigital Library
Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. Analyzing efficient stream processing on modern hardware. Proceedings of the VLDB Endowment 12, 5 (2019), 516--530.Google ScholarDigital Library
Feng Zhang, Lin Yang, Shuhao Zhang, Bingsheng He, Wei Lu, and Xiaoyong Du. 2020. FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU Integrated Architectures. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15-17, 2020. 633--647.Google Scholar

Recommendations

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays

With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to ...
Read More
A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications

The increasing usage of hardware accelerators such as Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) has significantly increased application design complexity. Such complexity results from a larger design space created by ...
Read More
Stream Aggregation with Compressed Sliding Windows
High performance stream aggregation is critical for many emerging applications that analyze massive volumes of data. Incoming data needs to be stored in a sliding window during processing, in case the aggregation functions cannot be computed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAMON '21: Proceedings of the 17th International Workshop on Data Management on New Hardware
June 2021
104 pages
ISBN:9781450385565
DOI:10.1145/3465998

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
aggregation
hardware
sliding window
stream processing
Qualifiers
- Article
- Research
- Refereed limited
Conference

Acceptance Rates
DAMON '21 Paper Acceptance Rate15of17submissions,88%Overall Acceptance Rate80of102submissions,78%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 501
  Total Downloads
- Downloads (Last 12 months)121
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hardware-Conscious Sliding Window Aggregation on GPUs

DAMON '21: Proceedings of the 17th International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Recommendations

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications

Stream Aggregation with Compressed Sliding Windows

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Hardware-Conscious Sliding Window Aggregation on GPUs

DAMON '21: Proceedings of the 17th International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Recommendations

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications

Stream Aggregation with Compressed Sliding Windows

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media