skip to main content
10.1145/3093742.3093923acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

Performance Modeling of Stream Joins

Published: 08 June 2017 Publication History

Abstract

Streaming analysis is widely used in a variety of environments, from cloud computing infrastructures up to the network's edge. In these contexts, accurate modeling of streaming operators' performance enables fine-grained prediction of applications' behavior without the need of costly monitoring. This is of utmost importance for computationally-expensive operators like stream joins, that observe throughput and latency very sensitive to rate-varying data streams, especially when deterministic processing is required.
In this paper, we present a modeling framework for estimating the throughput and the latency of stream join processing. The model is presented in an incremental step-wise manner, starting from a centralized non-deterministic stream join and expanding up to a deterministic parallel stream join. The model describes how the dynamics of throughput and latency are influenced by the number of physical input streams, as well as by the amount of parallelism in the actual processing and the requirement for determinism. We present an experimental validation of the model with respect to the actual implementation. The proposed model can provide insights that are catalytic for understanding the behavior of stream joins against different system deployments, with special emphasis on the influences of determinism and parallelization.

References

[1]
Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, and Shivakumar Venkataraman. 2013. Photon: Fault-tolerant and scalable joining of continuous data streams. In Proceedings of the 2013 international conference on Management of data. ACM, New York, NY, USA, 577--588.
[2]
Nihal Dindar, Nesime Tatbul, Renée J. Miller, Laura M. Haas, and Irina Botan. 2013. Modeling the execution semantics of stream processing engines with SECRET. The VLDB Journal 22, 4 (2013), 421--446.
[3]
Luping Ding and Elke A. Rundensteiner. 2004. Evaluating Window Joins over Punctuated Streams. In Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. ACM, New York, NY, USA, 98--107.
[4]
Mohammed Elseidy, Abdallah Elguindy, Aleksandar Vitorovic, and Christoph Koch. 2014. Scalable and adaptive online joins. Proceedings of the VLDB Endowment 7, 6 (2014), 441--452.
[5]
Minos Garofalakis, Johannes Gehrke, and Rajeev Rastogi. 2016. Data Stream Mgement: Processing High-Speed Data Streams. Springer.
[6]
Buğra Gedik, Rajesh R Bordawekar, and S Yu Philip. 2009. CellJoin: a parallel stream join operator for the cell processor. The VLDB journal 18, 2 (2009), 501--519.
[7]
Bugra Gedik, Kun-Lung Wu, S Yu Philip, and Ling Liu. 2007. Grubjoin: An adaptive, multi-way, windowed stream join with time correlation-aware cpu load shedding. IEEE Transactions on Knowledge and Data Engineering 19, 10 (2007), 1363--1380.
[8]
Vincenzo Gulisano, Ricardo Jimenez-Peris, Marta Patino-Martinez, Claudio Soriente, and Patrick Valduriez. 2012. StreamCloud: An Elastic and Scalable Data Streaming System. IEEE Transactions on Parallel and Distributed Systems 23, 12 (2012), 2351--2365.
[9]
Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. 2016. Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join. IEEE Transactions on Big Data PP, 99 (2016).
[10]
Yuanzhen Ji, Hongjin Zhou, Zbigniew Jerzak, Anisoara Nica, Gregor Hackenbroich, and Christof Fetzer. 2015. Quality-Driven Continuous Query Execution over Out-of-Order Data Streams. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA, 889--894.
[11]
Jaewoo Kang, Jeffrey F Naughton, and Stratis D Viglas. 2003. Evaluating window joins over unbounded streams. In Data Engineering, 2003. Proceedings. 19th International Conference on. IEEE, 341--352.
[12]
Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A Tucker. 2005. Semantics and evaluation techniques for window aggregates in data streams. In Proceedings of the ACM SIGMOD international conference on Management of data. ACM, New York, NY, USA, 311--322.
[13]
Qian Lin, Beng Chin Ooi, Zhengkui Wang, and Cui Yu. 2015. Scalable Distributed Stream Join Processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA, 811--825.
[14]
Mohammadreza Najafi, Mohammad Sadoghi, and Hans-Arno Jacobsen. 2016. SplitJoin: A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference. USENIX Association, Berkeley, CA, USA, 493--505.
[15]
Pratanu Roy, Jens Teubner, and Rainer Gemulla. 2014. Low-Latency Handshake Join. Proceedings of the VLDB Endowment 7, 9 (2014), 709--720.
[16]
Jens Teubner and Rene Mueller. 2011. How soccer players would do stream joins. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, New York, NY, USA, 625--636.
[17]
Song Wang and Elke Rundensteiner. 2009. Scalable Stream Join Processing with Expensive Predicates: Workload Distribution and Adaptation by Time-slicing. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, New York, NY, USA, 299--310.

Cited By

View all
  • (2024)Toward Stream Processing Elasticity in Realistic Geo-Distributed Environments2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00020(118-125)Online publication date: 24-Sep-2024
  • (2024)Evolutionary Computation Meets Stream ProcessingApplications of Evolutionary Computation10.1007/978-3-031-56852-7_24(377-393)Online publication date: 21-Mar-2024
  • (2022)Research Summary: Deterministic, Explainable and Efficient Stream ProcessingProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542750(65-69)Online publication date: 25-Jul-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems
June 2017
393 pages
ISBN:9781450350655
DOI:10.1145/3093742
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Streaming
  2. Modeling
  3. Stream Join

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DEBS '17

Acceptance Rates

DEBS '17 Paper Acceptance Rate 22 of 60 submissions, 37%;
Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Toward Stream Processing Elasticity in Realistic Geo-Distributed Environments2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00020(118-125)Online publication date: 24-Sep-2024
  • (2024)Evolutionary Computation Meets Stream ProcessingApplications of Evolutionary Computation10.1007/978-3-031-56852-7_24(377-393)Online publication date: 21-Mar-2024
  • (2022)Research Summary: Deterministic, Explainable and Efficient Stream ProcessingProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542750(65-69)Online publication date: 25-Jul-2022
  • (2022)Testing Self-Adaptive Software With Probabilistic Guarantees on Performance Metrics: Extended and Comparative ResultsIEEE Transactions on Software Engineering10.1109/TSE.2021.310113048:9(3554-3572)Online publication date: 1-Sep-2022
  • (2022)STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318197933:12(4221-4238)Online publication date: 1-Dec-2022
  • (2022)Designing Self-Adaptive Software Systems with Control Theory: An Overview2022 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)10.1109/ACSOSC56246.2022.00027(51-52)Online publication date: Sep-2022
  • (2021)Motivations and Challenges for Stream Processing in Edge ComputingCompanion of the ACM/SPEC International Conference on Performance Engineering10.1145/3447545.3451899(17-18)Online publication date: 19-Apr-2021
  • (2021)Performance Prediction Method for Stream Computing Platform Based on Time SeriesIEEE Access10.1109/ACCESS.2021.30792079(70322-70336)Online publication date: 2021
  • (2020)The role of event-time order in data streaming analysisProceedings of the 14th ACM International Conference on Distributed and Event-based Systems10.1145/3401025.3404088(214-217)Online publication date: 13-Jul-2020
  • (2020)Testing self-adaptive software with probabilistic guarantees on performance metricsProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3409685(1002-1014)Online publication date: 8-Nov-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media