Loading [MathJax]/extensions/MathMenu.js
Deterministic Time-Series Joins for Asynchronous High-Throughput Data Streams | IEEE Conference Publication | IEEE Xplore

Deterministic Time-Series Joins for Asynchronous High-Throughput Data Streams


Abstract:

A variety of data stream problems that affect two or more data streams rely on joining them based on a common or similar timing attribute. With the advent of stream proce...Show More

Abstract:

A variety of data stream problems that affect two or more data streams rely on joining them based on a common or similar timing attribute. With the advent of stream processing frameworks like Apache Spark and Apache Flink within the last years, processing of streamed data has become much easier. Repeated processing of relatively small data batches in so-called windows increases flexibility with respect to implementation and task distribution across multiple nodes. Using event times instead of ingestion times avoids, among other problems, incorrect joins. However, in this work we argue that batch-processing leads to a significant trade-off between increased computational complexity and latency of the resulting join pairs. A concept for time-series joins of streaming data is presented. This concept, which is built upon a resilient data stream framework, minimizes both the computational costs and latency times. It uses the guarantees associated with this underlying framework to join the data records deterministically according to event times instead of processing times. This work represents a work-in-progress paper, as detailed benchmarks are pending.
Date of Conference: 08-11 September 2020
Date Added to IEEE Xplore: 05 October 2020
ISBN Information:

ISSN Information:

Conference Location: Vienna, Austria

Contact IEEE to Subscribe

References

References is not available for this document.