skip to main content
10.1145/2612669.2612702acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
abstract

Deadline-aware scheduling of big-data processing jobs

Published: 21 June 2014 Publication History

Abstract

This paper presents a novel algorithm for scheduling big data jobs on large compute clusters. In our model, each job is represented by a DAG consisting of several stages linked by precedence constraints. The resource allocation per stage is malleable, in the sense that the processing time of a stage depends on the resources allocated to it (the dependency can be arbitrary in general).The goal of the scheduler is to maximize the total value of completed jobs, where the value for each job depends on its completion time. We design an algorithm for the problem which guarantees an expected constant approximation factor when the cluster capacity is sufficiently high. To the best of our knowledge, this is the first constant-factor approximation algorithm for the problem. The algorithm is based on formulating the problem as a linear program and then rounding an optimal (fractional) solution into a feasible (integral) schedule using randomized rounding.

References

[1]
A. Bar-Noy, R. Bar-Yehuda, A. Freund, J. Naor, and B. Schieber. A unified approach to approximating resource allocation and scheduling. Journal of the ACM (JACM), 48:1069--1090, 2001.
[2]
R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Datasets. In VLDB, 2008.
[3]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI, 2004.
[4]
A. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: guaranteed job latency in data parallel clusters. In ECCS, pages 99--112. ACM, 2012.
[5]
E. Günther, F. G. König, and N. Megow. Scheduling and packing malleable and parallel tasks with precedence constraints of bounded width. J Combinatorial Optimization, 27:164--181, 2012.
[6]
N. Jain, I. Menache, J. Naor, and J. Yaniv. A truthful mechanism for value-based scheduling in cloud computing. In SAGT, pages 178--189, 2011.
[7]
N. Jain, I. Menache, J. Naor, and J. Yaniv. Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters. In SPAA, pages 255--266, 2012.
[8]
K. Jansen and H. Zhang. Scheduling malleable tasks with precedence constraints. Journal of Computer and System Sciences, 78(1):245--259, 2012.
[9]
Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys (CSUR), 31(4):406--471, 1999.
[10]
B. Lucier, I. Menache, J. Naor, and J. Yaniv. Efficient online scheduling for deadline-sensitive jobs: extended abstract. In SPAA, pages 305--314, 2013.
[11]
C. A. Phillips, R. N. Uma, and J. Wein. Off-line admission control for general scheduling problems. In SODA, pages 879--888, 2000.

Cited By

View all
  • (2021)The cosmos big data platform at MicrosoftProceedings of the VLDB Endowment10.14778/3476311.347639014:12(3148-3161)Online publication date: 1-Jul-2021
  • (2021)An energy-aware scheduling of dynamic workflows using big data similarity statistical analysis in cloud computingThe Journal of Supercomputing10.1007/s11227-021-04016-8Online publication date: 27-Aug-2021
  • (2020)A Review of Dynamic Scalability and Dynamic Scheduling in Cloud-Native Distributed Stream Processing SystemsICDSMLA 201910.1007/978-981-15-1420-3_161(1539-1553)Online publication date: 19-May-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '14: Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures
June 2014
356 pages
ISBN:9781450328210
DOI:10.1145/2612669
  • General Chair:
  • Guy Blelloch,
  • Program Chair:
  • Peter Sanders
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2014

Check for updates

Author Tags

  1. big data
  2. deadline-aware scheduling
  3. scheduling algorithms

Qualifiers

  • Abstract

Conference

SPAA '14

Acceptance Rates

SPAA '14 Paper Acceptance Rate 30 of 122 submissions, 25%;
Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25
37th ACM Symposium on Parallelism in Algorithms and Architectures
July 28 - August 1, 2025
Portland , OR , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)The cosmos big data platform at MicrosoftProceedings of the VLDB Endowment10.14778/3476311.347639014:12(3148-3161)Online publication date: 1-Jul-2021
  • (2021)An energy-aware scheduling of dynamic workflows using big data similarity statistical analysis in cloud computingThe Journal of Supercomputing10.1007/s11227-021-04016-8Online publication date: 27-Aug-2021
  • (2020)A Review of Dynamic Scalability and Dynamic Scheduling in Cloud-Native Distributed Stream Processing SystemsICDSMLA 201910.1007/978-981-15-1420-3_161(1539-1553)Online publication date: 19-May-2020
  • (2019)Minimum Makespan Workflow Scheduling for Malleable Jobs with Precedence Constraints and Lifetime Resource Demands2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2019.00204(2068-2078)Online publication date: Jul-2019
  • (2019)An Algorithmic Framework for Geo-Distributed AnalyticsComputational Intelligence and Intelligent Systems10.1007/978-3-030-10880-9_6(89-105)Online publication date: 8-Feb-2019
  • (2018)An Improved Approximation for Scheduling Malleable Tasks with Precedence Constraints via Iterative MethodIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.281338729:9(1937-1946)Online publication date: 1-Sep-2018
  • (2017)Efficient Approximation Algorithms for the Bounded Flexible Scheduling Problem in CloudsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.273184328:12(3511-3520)Online publication date: 1-Dec-2017
  • (2017)Task Scheduling in Big Data - Review, Research Challenges, and Prospects2017 Ninth International Conference on Advanced Computing (ICoAC)10.1109/ICoAC.2017.8441494(165-173)Online publication date: Dec-2017
  • (2015)Truthful Online Scheduling with CommitmentsProceedings of the Sixteenth ACM Conference on Economics and Computation10.1145/2764468.2764535(715-732)Online publication date: 15-Jun-2015
  • (2015)Near-Optimal Scheduling Mechanisms for Deadline-Sensitive Jobs in Large Computing ClustersACM Transactions on Parallel Computing10.1145/27423432:1(1-29)Online publication date: 13-Apr-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media