Semi-Clairvoyant Scheduling in Data Analytics Systems | IEEE Journals & Magazine | IEEE Xplore

Semi-Clairvoyant Scheduling in Data Analytics Systems


Abstract:

Popular data analytics systems including Apache Hadoop, Dryad, and Apache Spark abstract jobs as directed acyclic graphs (DAGs). Speeding up completions for DAG jobs matt...Show More

Abstract:

Popular data analytics systems including Apache Hadoop, Dryad, and Apache Spark abstract jobs as directed acyclic graphs (DAGs). Speeding up completions for DAG jobs matter in practice in order to support real-time decisions. State-of-the-art works propose clairvoyant schedulers to optimize these goals, however, they assume complete job information as a prior knowledge which includes the precise DAG structure, and fine-grained resource requirement and duration time of each task. This assumption limits their applicability. In this paper, to be more practical, we relax the complete prior knowledge assumption and rely solely on partial prior information, based on which, we design a semi-clairvoyant task scheduler Cobra operating within each job. When managing resources for a job, Cobra adaptively adjusts its resource desires in a multiplicative-increase multiplicative-decrease manner on the basis of the its current resource utilization and the presence of current waiting tasks. When assigning tasks to run on the allocated resources, Cobra strives to satisfy task locality preference by tolerating each task waiting for some time that is bounded by a parameterized threshold. Even with the partial prior job information, when a set of jobs in which each employing Cobra as its task scheduler, run on a cluster that employs the fair job scheduler, we theoretically prove the produced makespan and average job response time are O(1)-competitive in different settings. We implement our design in Spark on YARN system, and use experiments from both real deployments and simulations on Google's trace to verify the performance promotion and sensitivity of Cobra.
Published in: IEEE Transactions on Computers ( Volume: 68, Issue: 9, 01 September 2019)
Page(s): 1376 - 1389
Date of Publication: 19 March 2019

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.