skip to main content
10.1145/3465998.3466014acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Hardware-Conscious Sliding Window Aggregation on GPUs

Published:20 June 2021Publication History

ABSTRACT

Stream Processing Engines (SPEs) have recently begun utilizing heterogeneous coprocessors (e.g., GPUs) to meet the velocity requirements of modern real-time applications. The massive parallelism and high memory bandwidth of GPUs can significantly increase processing throughput in data-intensive streaming scenarios, such as windowed aggregations. However, previous research only focused on the overall architecture of hybrid CPU-GPU streaming systems and the need for efficient in-GPU window operators was overshadowed by the limited interconnect bandwidth.

With aggregation taking up a significant portion of streaming workloads, in this work, we analyze and optimize the performance of sliding window aggregates over GPUs. Current implementations under-utilize the hardware, and for a range of query parameters they cannot even saturate the bandwidth of the interconnect. To optimize execution, we first evaluate the fundamental building blocks of streaming aggregation for GPUs and identify the performance bottlenecks. Then, we build Slider: an adaptive algorithm that selects the most appropriate primitives and kernel configurations based on the query parameters. Our evaluation shows that Slider outperforms previous approaches by 3×-1250×, and saturates both the interconnect and the memory bandwidth for a wide range of examined input workloads.

References

  1. Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).Google ScholarGoogle Scholar
  2. Periklis Chrysogelos, Manos Karpathiotakis, Raja Appuswamy, and Anastasia Ailamaki. 2019. HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines. Proc. VLDB Endow. 12, 5 (2019), 544--556. https://doi.org/10.14778/3303753.3303760Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data mining and knowledge discovery 1, 1 (1997), 29--53.Google ScholarGoogle Scholar
  4. Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, Alexander L Wolf, Paolo Costa, and Peter Pietzuch. 2016. Saber: Window-based hybrid stream processing for heterogeneous architectures. In Proceedings of the 2016 International Conference on Management of Data. 555--569.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A Tucker. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. Acm Sigmod Record 34, 1 (2005), 39--44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Duane Merrill and Michael Garland. 2016. Single-pass parallel prefix scan with decoupled look-back. NVIDIA, Tech. Rep. NVR-2016-002 (2016).Google ScholarGoogle Scholar
  7. Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu. 2015. General incremental sliding-window aggregation. Proceedings of the VLDB Endowment 8, 7 (2015), 702--713.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. 2014. Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 147--156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, April 25-27, 2012, Steven D. Gribble and Dina Katabi (Eds.). USENIX Association, 15--28. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zahariaGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  10. Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. Analyzing efficient stream processing on modern hardware. Proceedings of the VLDB Endowment 12, 5 (2019), 516--530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Feng Zhang, Lin Yang, Shuhao Zhang, Bingsheng He, Wei Lu, and Xiaoyong Du. 2020. FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU Integrated Architectures. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15-17, 2020. 633--647.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    DAMON '21: Proceedings of the 17th International Workshop on Data Management on New Hardware
    June 2021
    104 pages
    ISBN:9781450385565
    DOI:10.1145/3465998

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 20 June 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article
    • Research
    • Refereed limited

    Acceptance Rates

    DAMON '21 Paper Acceptance Rate15of17submissions,88%Overall Acceptance Rate80of102submissions,78%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader