Abstract:
In this paper, we provide a novel approach that enables the execution of top-k join queries over sliding windows in a way that reduces the amount of data that need to be ...Show MoreMetadata
Abstract:
In this paper, we provide a novel approach that enables the execution of top-k join queries over sliding windows in a way that reduces the amount of data that need to be analyzed by the stream processing operators. The main idea is that brokers individually invoke the query on their received messages and forward the top-k results to a stream processing operator that performs the merging of the results and provides to the end-user the final top-k results. Moreover, our system exploits the Bayesian Optimization technique to determine automatically the number of top-k results that should be provided by each broker. Our approach has been developed in the Kappa architecture that exploits topic-based scalable publish/subscribe (pub/sub) systems like Apache Kafka to efficiently forward the high volume of incoming messages to distributed processing systems (i.e., Apache Spark or Apache Flink) that perform the batch and stream analytics operations. Our detailed experimental evaluation on our local cluster illustrates that we can efficiently execute top-k join queries on our system with high accuracy and low latency.
Date of Conference: 10-13 December 2018
Date Added to IEEE Xplore: 24 January 2019
ISBN Information: