Conferences >2018 IEEE International Confe...

Scalable Distributed Top-k Join Queries in Topic-Based Pub/Sub Systems

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, we provide a novel approach that enables the execution of top-k join queries over sliding windows in a way that reduces the amount of data that need to be ...Show More

Metadata

Abstract:

In this paper, we provide a novel approach that enables the execution of top-k join queries over sliding windows in a way that reduces the amount of data that need to be analyzed by the stream processing operators. The main idea is that brokers individually invoke the query on their received messages and forward the top-k results to a stream processing operator that performs the merging of the results and provides to the end-user the final top-k results. Moreover, our system exploits the Bayesian Optimization technique to determine automatically the number of top-k results that should be provided by each broker. Our approach has been developed in the Kappa architecture that exploits topic-based scalable publish/subscribe (pub/sub) systems like Apache Kafka to efficiently forward the high volume of incoming messages to distributed processing systems (i.e., Apache Spark or Apache Flink) that perform the batch and stream analytics operations. Our detailed experimental evaluation on our local cluster illustrates that we can efficiently execute top-k join queries on our system with high accuracy and low latency.

Published in: 2018 IEEE International Conference on Big Data (Big Data)

Date of Conference: 10-13 December 2018

Date Added to IEEE Xplore: 24 January 2019

ISBN Information:

DOI: 10.1109/BigData.2018.8621949

Conference Location: Seattle, WA, USA