skip to main content
10.1145/3328905.3329509acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

STRETCH: Scalable and Elastic Deterministic Streaming Analysis with Virtual Shared-Nothing Parallelism

Published: 24 June 2019 Publication History

Abstract

Despite the established scientific knowledge on efficient parallel and elastic data stream processing, it is challenging to combine generality and high level of abstraction (targeting ease of use) with fine-grained processing aspects (targeting efficiency) in stream processing frameworks. Towards this goal, we propose STRETCH, a framework that aims at guaranteeing (i) high efficiency in throughput and latency of stateful analysis and (ii) fast elastic reconfigurations (without requiring state transfer) for intra-node streaming applications. To achieve these, we introduce virtual shared-nothing parallelization and propose a scheme to implement it in STRETCH, enabling users to leverage parallelization techniques while also taking advantage of shared-memory synchronization, which has been proven to boost the scaling-up of streaming applications while supporting determinism. We provide a fully-implemented prototype and, together with a thorough evaluation, correctness proofs for its underlying claims supporting determinism and a model (also validated empirically) of virtual shared-nothing and pure shared-nothing scalability behavior. As we show, STRETCH can match the throughput and latency figures of the front of state-of-the-art solutions, while also achieving fast elastic reconfigurations (taking only a few milliseconds).

References

[1]
Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Wittle. 2015. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. Endowment 8, 12 (2015), 1792--1803.
[2]
Magdalena Balazinska, Hari Balakrishnan, Samuel R Madden, and Michael Stone-braker. 2008. Fault-tolerance in the Borealis distributed stream processing system. ACM Trans. on Database Systems (TODS) 33, 1 (2008), 3.
[3]
Cagri Balkesen, Nesime Tatbul, and M Tamer Özsu. 2013. Adaptive input admission and management for parallel stream processing. In Proc. of the 7th ACM Int'l Conf. on Distributed event-based systems. ACM, 15--26.
[4]
Paris Carbone, Stephan Ewen, Gyula Fóra, Seif Haridi, Stefan Richter, and Kostas Tzoumas. 2017. State management in Apache Flink®: consistent stateful distributed stream processing. Proc. VLDB Endowment 10, 12 (2017), 1718--1729.
[5]
Valeria Cardellini, Matteo Nardelli, and Dario Luzi. 2016. Elastic stateful stream processing in storm. In High Performance Computing & Simulation (HPCS), 2016 Int'l Conf. on. IEEE, 583--590.
[6]
Tiziano De Matteis and Gabriele Mencagli. 2017. Proactive elasticity and energy awareness in data stream processing. Journal of Systems and Software 127 (2017), 302--319.
[7]
flink {n. d.}. Apache Flink. https://flink.apache.org. ({n. d.}). Accessed:2019-3-1.
[8]
Buğa Gedik. 2014. Partitioning Functions for Stateful Data Parallelism in Stream Processing. The VLDB Journal 23, 4 (Aug. 2014), 517--539.
[9]
Buğa Gedik, Rajesh R Bordawekar, and S Yu Philip. 2009. CellJoin: a parallel stream join operator for the cell processor. The VLDB journal (2009).
[10]
Vincenzo Gulisano. 2012. StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Ph.D. Dissertation. Universidad Politécnica de Madrid.
[11]
Vincenzo Gulisano, Yiannis Nikolakopoulos, Daniel Cederman, Marina Papatriantafilou, and Philippas Tsigas. 2017. Efficient Data Streaming Multiway Aggregation Through Concurrent Algorithmic Designs and New Abstract Data Types. ACM Trans. Parallel Comput. 4, 2, Article 11 (Oct. 2017), 28 pages.
[12]
Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. 2016. ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join. IEEE Trans. Big Data PP, 99 (2016), 1--1.
[13]
Vincenzo Gulisano, Yiannis Nikolakopoulos, Ivan Walulya, Marina Papatriantafilou, and Philippas Tsigas. 2015. Deterministic Real-time Analytics of Geospatial Data Streams Through ScaleGate Objects. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS '15). ACM, New York, NY, USA, 316--317.
[14]
Vincenzo Gulisano, Marina Papatriantafilou, and Alessandro Vittorio Papadopoulos. 2018. Elasticity. In Encyclopedia of Big Data Technologies, Sherif Sakr and Albert Y. Zomaya (Eds.). Springer Int'l Conf. Publishing, Cham, 1--7.
[15]
Thomas Heinze, Zbigniew Jerzak, Gregor Hackenbroich, and Christof Fetzer. 2014. Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS '14). ACM, New York, NY, USA, 13--22.
[16]
Thomas Heinze, Yuanzhen Ji, Yinying Pan, Franz Josef Grueneberger, Zbigniew Jerzak, and Christof Fetzer. 2013. Elastic Complex Event Processing under Varying Query Load. In BD3@ VLDB. 25--30.
[17]
Thomas Heinze, Valerio Pappalardo, Zbigniew Jerzak, and Christof Fetzer. 2014. Auto-scaling techniques for elastic data stream processing. In Data Engineering Workshops (ICDEW), 2014 IEEE 30th Int'l Conf. on. IEEE, 296--302.
[18]
Martin Hirzel, Robert Soulé, Scott Schneider, Buğa Gedik, and Robert Grimm. 2014. A Catalog of Stream Processing Optimizations. ACM Comput. Surv. 46, 4, Article 46 (March 2014), 34 pages.
[19]
C. Hochreiner, M. Vögler, S. Schulte, and S. Dustdar. 2016. Elastic Stream Processing for the Internet of Things. In 2016 IEEE 9th International Conference on Cloud Computing (CLOUD). 100--107.
[20]
Yuanzhen Ji, Hongjin Zhou, Zbigniew Jerzak, Anisoara Nica, Gregor Hackenbroich, and Christof Fetzer. 2015. Quality-Driven Continuous Query Execution over Out-of-Order Data Streams. In Proc. of the 2015 ACM SIGMOD Int'l Conf. on Management of Data. ACM, 889--894.
[21]
A. G. Kumbhare, Y. Simmhan, M. Frincu, and V. K. Prasanna. 2015. Reactive Resource Provisioning Heuristics for Dynamic Dataflows on Cloud Infrastructure. IEEE Transactions on Cloud Computing 3, 2 (April 2015), 105--118.
[22]
Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Steve Suh, Shivaram Venkataraman, Paolo Costa, Terry Kim, Saravanan Muthukrishnan, Vamsi Kuppa, et al. 2018. Chi: a scalable and programmable control plane for distributed stream processing systems. Proc. VLDB Endowment 11, 10 (2018), 1303--1316.
[23]
André Martin, Andrey Brito, and Christof Fetzer. 2014. Scalable and elastic realtime click stream analysis using streammine3g. In Proc. of the 8th ACM Int'l Conf. on Distributed Event-Based Systems. ACM, 198--205.
[24]
Ruben Mayer, Boris Koldehofe, and Kurt Rothermel. 2015. Predictable low-latency event detection with parallel complex event processing. IEEE Internet of Things Journal 2, 4 (2015), 274--286.
[25]
Pratanu Roy, Jens Teubner, and Rainer Gemulla. 2014. Low-Latency Handshake Join. Proc. VLDB Endowment (2014).
[26]
S. Schneider, H. Andrade, B. Gedik, A. Biem, and K. Wu. 2009. Elastic scaling of data parallel operators in stream processing. In 2009 IEEE International Symposium on Parallel Distributed Processing. 1--12.
[27]
spark {n. d.}. Apache Spark. https://spark.apache.org. ({n. d.}). Accessed:2019-3-1.
[28]
storm {n. d.}. Apache Storm. http://storm.apache.org. ({n. d.}). Accessed:2019-3-1.
[29]
Jens Teubner and Rene Mueller. 2011. How soccer players would do stream joins. In Proc. of the 2011 ACM SIGMOD Int'l Conf. on Management of data.
[30]
Ivan Walulya, Dimitris Palyvos-Giannas, Yiannis Nikolakopoulos, Vincenzo Gulisano, Marina Papatriantafilou, and Philippas Tsigas. 2018. Viper: A module for communication-layer determinism and scaling in low-latency stream processing. Future Generation Computer Systems 88 (2018), 297--308.
[31]
Nikos Zacheilas, Vana Kalogeraki, Yiannis Nikolakopoulos, Vincenzo Gulisano, Marina Papatriantafilou, and Philippas Tsigas. 2017. Maximizing Determinism in Stream Processing Under Latency Constraints. In Proceedings of the 11th ACM Int'l Conf. on Distributed and Event-based Systems (DEBS '17). ACM, 112--123.
[32]
N. Zacheilas, V. Kalogeraki, N. Zygouras, N. Panagiotou, and D. Gunopulos. 2015. Elastic complex event processing exploiting prediction. In 2015 IEEE International Conference on Big Data (Big Data). 213--222.

Cited By

View all
  • (2023)Bounding substreams in distributed stream processingInformation Systems10.1016/j.is.2023.102251117:COnline publication date: 1-Jul-2023
  • (2022)Substream management in distributed streaming dataflowsProceedings of the 16th ACM International Conference on Distributed and Event-Based Systems10.1145/3524860.3539809(55-66)Online publication date: 27-Jun-2022
  • (2022)Research Summary: Deterministic, Explainable and Efficient Stream ProcessingProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542750(65-69)Online publication date: 25-Jul-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '19: Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems
June 2019
291 pages
ISBN:9781450367943
DOI:10.1145/3328905
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data streaming
  2. Elasticity
  3. Scalability
  4. Shared-nothing parallelism

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DEBS '19

Acceptance Rates

DEBS '19 Paper Acceptance Rate 13 of 47 submissions, 28%;
Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Bounding substreams in distributed stream processingInformation Systems10.1016/j.is.2023.102251117:COnline publication date: 1-Jul-2023
  • (2022)Substream management in distributed streaming dataflowsProceedings of the 16th ACM International Conference on Distributed and Event-Based Systems10.1145/3524860.3539809(55-66)Online publication date: 27-Jun-2022
  • (2022)Research Summary: Deterministic, Explainable and Efficient Stream ProcessingProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542750(65-69)Online publication date: 25-Jul-2022
  • (2022)pi-LiscoProceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3507093(460-469)Online publication date: 25-Apr-2022
  • (2022)STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318197933:12(4221-4238)Online publication date: 1-Dec-2022
  • (2022)Multiple pattern matching for network security applicationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.10.011137:C(34-52)Online publication date: 21-Apr-2022
  • (2022)Elastic Resource Management in Stream ProcessingEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_191-2(1-7)Online publication date: 17-Mar-2022
  • (2020)The role of event-time order in data streaming analysisProceedings of the 14th ACM International Conference on Distributed and Event-based Systems10.1145/3401025.3404088(214-217)Online publication date: 13-Jul-2020
  • (2019)Adaptive Stream-based Shifting Bottleneck Detection in IoT-based Computing Architectures2019 24th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA.2019.8868962(993-1000)Online publication date: 10-Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media