Conferences >2021 IEEE 20th International ...

QSpark: Distributed Execution of Batch & Streaming Analytics in Spark Platform

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A significant portion of research work in the past decade has been devoted on developing resource allocation and task scheduling solutions for large-scale data processing...Show More

Metadata

Abstract:

A significant portion of research work in the past decade has been devoted on developing resource allocation and task scheduling solutions for large-scale data processing platforms. Such algorithms are designed to facilitate deployment of data analytic applications across either conventional cluster computing systems or modern virtualized data-centers. The main reason for such a huge research effort stems from the fact that even a slight improvement in the performance of such platforms can bring a considerable monetary savings for vendors, especially for modern data processing engines that are designed solely to perform high throughput or/and low-latency computations over massive-scale batch or streaming data. A challenging question to be yet answered in such a context is to design an effective resource allocation solution that can prevent low resource utilization while meeting the enforced performance level (such as 99-th latency percentile) in circumstances where contention among applications to obtain the capacity of shared resources is a non negligible performance-limiting parameter. This paper proposes a resource controller system, called QSpark, to cope with the problem of (i) low performance (i.e., resource utilization in the batch mode and p-99 response time in the streaming mode), and (ii) the shared resource interference among collocated applications in a multi-tenancy modern Spark platform. The proposed solution leverages a set of controlling mechanisms for dynamic partitioning of the allocation of computing resources, in a way that it can fulfill the QoS re-quirements of latency-critical data processing applications, while enhancing the throughput for all working nodes without reaching their saturation points. Through extensive experiments in our in-house Spark cluster, we compared the achieved performance of proposed solution against the default Spark resource allocation policy for a variety of Machine Learning (ML), Artificial Intelligence (AI), and De...

Published in: 2021 IEEE 20th International Symposium on Network Computing and Applications (NCA)

Date of Conference: 23-26 November 2021

Date Added to IEEE Xplore: 31 January 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/NCA53618.2021.9685833

Conference Location: Boston, MA, USA

Funding Agency:

Contents

References is not available for this document.

QSpark: Distributed Execution of Batch & Streaming Analytics in Spark Platform

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

QSpark: Distributed Execution of Batch & Streaming Analytics in Spark Platform

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?