research-article

Free access

Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can

Authors:

Virajith Jalaparti,

Konstantin Makarychev,

Matthew CaesarAuthors Info & Claims

SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication

Pages 407 - 420

https://doi.org/10.1145/2785956.2787488

Published: 17 August 2015 Publication History

Abstract

To reduce the impact of network congestion on big data jobs, cluster management frameworks use various heuristics to schedule compute tasks and/or network flows. Most of these schedulers consider the job input data fixed and greedily schedule the tasks and flows that are ready to run. However, a large fraction of production jobs are recurring with predictable characteristics, which allows us to plan ahead for them. Coordinating the placement of data and tasks of these jobs allows for significantly improving their network locality and freeing up bandwidth, which can be used by other jobs running on the cluster. With this intuition, we develop Corral, a scheduling framework that uses characteristics of future workloads to determine an offline schedule which (i) jointly places data and compute to achieve better data locality, and (ii) isolates jobs both spatially (by scheduling them in different parts of the cluster) and temporally, improving their performance. We implement Corral on Apache Yarn, and evaluate it on a 210 machine cluster using production workloads. Compared to Yarn's capacity scheduler, Corral reduces the makespan of these workloads up to 33% and the median completion time up to 56%, with 20-90% reduction in data transferred across racks.

Supplementary Material

WEBM File (p407-jalaparti.webm)

Download
129.61 MB

References

[1]

Amazon S3. https://aws.amazon.com/s3/.

[2]

Amazon Web Services. http://aws.amazon.com/.

[3]

Apache Hadoop. http://hadoop.apache.org/.

[4]

Apache Tez. http://hortonworks.com/hadoop/tez/.

[5]

Facebook data grows by over 500 TB daily. http://tinyurl.com/96d8oqj/.

[6]

Hadoop Distributed Filesystem. http://hadoop.apache.org/hdfs.

[7]

Hadoop MapReduce Next Generation - Capacity Scheduler. http://tinyurl.com/no2evu5.

[8]

Hadoop YARN Project. http://tinyurl.com/bnadg9l.

[9]

Microsoft Azure. https://azure.microsoft.com/.

[10]

Microsoft Azure Storage. https://azure.microsoft.com/en-us/services/storage/.

[11]

ORC File Format. http://tinyurl.com/n4pxofh.

[12]

TPC Benchmark H. http://www.tpc.org/tpch/.

[13]

Windows Azure's Flat Network Storage and 2012 Scalability Targets. http://bit.ly/1A4Hbjt.

[14]

S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Re-optimizing Data-parallel Computing. In NSDI 2012.

Digital Library

[15]

S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Reoptimizing Data Parallel Computing. In NSDI'12, 2012.

Digital Library

[16]

F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. Vijaykumar. ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters. In USENIX ATC, 2014.

Digital Library

[17]

G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris. Scarlett: Coping with Skewed Content Popularity in Mapreduce Clusters. In EuroSys, 2011.

Digital Library

[18]

G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In NSDI, 2012.

Digital Library

[19]

K. P. Belkhale and P. Banerjee. An approximate algorithm for the partitionable independent task scheduling problem. Urbana, 51:61801, 1990.

[20]

R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Datasets. In VLDB, 2008.

Digital Library

[21]

Y. Chen, A. Ganapathi, R. Griffith, and Y. Katz. The Case for Evaluating MapReduce Performance Using Workload Suites. In MASCOTS, 2011.

Digital Library

[22]

M. Chowdhury, S. Kandula, and I. Stoica. Leveraging Endpoint Flexibility in Data-Intensive Clusters. In ACM SIGCOMM, 2013.

Digital Library

[23]

M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica. Managing Data Transfers in Computer Clusters with Orchestra. In ACM SIGCOMM, 2011.

Digital Library

[24]

M. Chowdhury, Y. Zhong, and I. Stoica. Efficient Coflow Scheduling with Varys. In ACM SIGCOMM, 2014.

Digital Library

[25]

J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.

Digital Library

[26]

F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized Task-aware Scheduling for Data Center Networks. In ACM SIGCOMM, August 2014.

Digital Library

[27]

J. Du and J. Y.-T. Leung. Complexity of Scheduling Parallel Task Systems. SIAM J. Discret. Math., 1989.

Digital Library

[28]

K. Elmeleegy. Piranha: Optimizing Short Jobs in Hadoop. Proc. VLDB Endow., 2013.

Digital Library

[29]

M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek, and J. McPherson. CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop. Proc. VLDB Endow., 2011.

Digital Library

[30]

A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: Guaranteed Job Latency in Data Parallel Clusters. In EuroSys, 2012.

Digital Library

[31]

R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416--429, 1969.

Digital Library

[32]

R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource Packing for Cluster Schedulers. In SIGCOMM, 2014.

Digital Library

[33]

H. Herodotou, F. Dong, and S. Babu. No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics. In SOCC, 2011.

Digital Library

[34]

H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu. Starfish: A Self-tuning System for Big Data Analytics. In CIDR, 2011.

[35]

C. Y. Hong, M. Caesar, and P. B. Godfrey. Finishing Flows Quickly with Preemptive Scheduling. In SIGCOMM, 2012.

Digital Library

[36]

M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In SOSP, 2009.

Digital Library

[37]

V. Jalaparti, H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron. Bridging the Tenant-provider Gap in Cloud Services. In SOCC, 2012.

Digital Library

[38]

Y. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys (CSUR), 1999.

Digital Library

[39]

Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. SkewTune: Mitigating Skew in Mapreduce Applications. In ACM SIGMOD, 2012.

Digital Library

[40]

R. Lepère, D. Trystram, and G. J. Woeginger. Approximation Algorithms for Scheduling Malleable Tasks Under Precedence Constraints. International Journal of Foundations of Computer Science, 13(04):613--627, 2002.

[41]

M. Li, D. Subhraveti, A. R. Butt, A. Khasymski, and P. Sarkar. CAM: A Topology Aware Minimum Cost Flow Based Resource Manager for MapReduce Applications in the Cloud. In HPDC, 2012.

Digital Library

[42]

M. Ovsiannikov, S. Rus, D. Reeves, P. Sutter, S. Rao, and J. Kelly. The Quantcast File System. Proc. VLDB Endow.

Digital Library

[43]

B. Palanisamy, A. Singh, L. Liu, and B. Jain. Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud. In SC, 2011.

Digital Library

[44]

A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive- a warehousing solution over a map-reduce framework. In VLDB, 2009.

Digital Library

[45]

J. Turek, J. L. Wolf, and P. S. Yu. Approximate Algorithms Scheduling Parallelizable Tasks. In SPAA, 1992.

Digital Library

[46]

G. Wang, A. Butt, P. Pandey, and K. Gupta. A Simulation Approach to Evaluating Design Decisions in MapReduce Setups. In MASCOTS, 2009.

[47]

C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better Never than Late: Meeting Deadlines in Datacenter Networks. In ACM SIGCOMM, 2011.

Digital Library

[48]

M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In EuroSys, 2010.

Digital Library

[49]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In HotCloud, 2010.

Digital Library

[50]

J. Zhou, N. Bruno, M.-C. Wu, P.-Å. Larson, R. Chaiken, and D. Shakib. Scope: parallel databases meet mapreduce. VLDB J., 21(5):611--636, 2012.

Digital Library

Cited By

Faisal AMartin NBashir HLamelas SDogar FGavrilovska ATerry D(2024)When will my ML job finish? toward providing completion time estimates through predictability-centric schedulingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691964(487-505)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691964
Jasny MThostrup LTamimi SKoch AIstván ZBinnig C(2024)Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSsProceedings of the ACM on Management of Data10.1145/36392912:1(1-28)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639291
Luo YWang QShi SLai JQi SZhang JWang X(2024)Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682877(1-10)Online publication date: 19-Jun-2024
https://doi.org/10.1109/IWQoS61813.2024.10682877
Show More Cited By

Recommendations

Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
SIGCOMM'15

To reduce the impact of network congestion on big data jobs, cluster management frameworks use various heuristics to schedule compute tasks and/or network flows. Most of these schedulers consider the job input data fixed and greedily schedule the tasks ...
Network Scheduling Aware Task Placement in Datacenters
CoNEXT '16: Proceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies

To improve the performance of data-intensive applications, existing datacenter schedulers optimize either the placement of tasks or the scheduling of network flows. The task scheduler strives to place tasks close to their input data (i.e., maximize data ...
Single machine parallel-batch scheduling with deteriorating jobs

We consider several single machine parallel-batch scheduling problems in which the processing time of a job is a linear function of its starting time. We give a polynomial-time algorithm for minimizing the maximum cost, an O(n⁵) time algorithm for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication

August 2015

684 pages

ISBN:9781450335423

DOI:10.1145/2785956

General Chairs:
Steve Uhlig
Queen Mary University of London, UK
,
Olaf Maennel
Tallinn U. of Technology in Estonia, Estonia
,
Program Chairs:
Brad Karp
University College London, UK
,
Jitendra Padhye
Microsoft, USA

ACM SIGCOMM Computer Communication Review Volume 45, Issue 4
SIGCOMM'15
October 2015
659 pages
ISSN:0146-4833
DOI:10.1145/2829988
Editors:
Konstantina Papagiannaki
Telefonica Research, Barcelona, Spain
,
Katerina Argyraki
EPFL, Switzerland
,
Hitesh Ballani
Microsoft Research Cambridge, UK
,
Fabián Bustamante
Northwestern University, USA
,
Joseph Camp
SMU, USA
,
Augustin Chaintreau
Columbia University, USA
,
Phillipa Gill
Stony Brook University, USA
,
Marco Mellia
Politecnico di Torino, Italy
,
Bhaskaran Raman
IIT Bombay, India
,
Joel Sommers
Colgate University, USA
,
Aline Carneiro Viana
INRIA, France
Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGCOMM '15

Sponsor:

SIGCOMM

SIGCOMM '15: ACM SIGCOMM 2015 Conference

August 17 - 21, 2015

London, United Kingdom

Acceptance Rates

SIGCOMM '15 Paper Acceptance Rate 40 of 242 submissions, 17%;

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

154
Total Citations
View Citations
1,285
Total Downloads

Downloads (Last 12 months)159
Downloads (Last 6 weeks)23

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Faisal AMartin NBashir HLamelas SDogar FGavrilovska ATerry D(2024)When will my ML job finish? toward providing completion time estimates through predictability-centric schedulingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691964(487-505)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691964
Jasny MThostrup LTamimi SKoch AIstván ZBinnig C(2024)Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSsProceedings of the ACM on Management of Data10.1145/36392912:1(1-28)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639291
Luo YWang QShi SLai JQi SZhang JWang X(2024)Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682877(1-10)Online publication date: 19-Jun-2024
https://doi.org/10.1109/IWQoS61813.2024.10682877
Hanafy WLiang QBashir NIrwin DShenoy P(2023)CarbonScaler: Leveraging Cloud Workload Elasticity for Optimizing Carbon-EfficiencyProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36267887:3(1-28)Online publication date: 7-Dec-2023
https://dl.acm.org/doi/10.1145/3626788
Hu QZhang MSun PWen YZhang TAamodt TJerger NSwift M(2023)Lucid: A Non-intrusive, Scalable and Interpretable Scheduler for Deep Learning Training JobsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575705(457-472)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575705
Sang BGu SZhan XTang MLiu JChen XTan JGe HZhang KRuan RYan W(2023)Cougar: A General Framework for Jobs Optimization In Cloud2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00262(3417-3429)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00262
Vistro DRehman AHameed Z(2023)An Efficient Approach for Resilience and Reliability Against Cascading Failure2023 15th International Conference on Developments in eSystems Engineering (DeSE)10.1109/DeSE58274.2023.10100283(71-76)Online publication date: 9-Jan-2023
https://doi.org/10.1109/DeSE58274.2023.10100283
Chiang MZhang LChou YChou J(2023)Dynamic Resource Management for Machine Learning Pipeline WorkloadsSN Computer Science10.1007/s42979-023-02101-84:5Online publication date: 30-Aug-2023
https://doi.org/10.1007/s42979-023-02101-8
Zhang X(2023)Mixtran: an efficient and fair scheduler for mixed deep learning workloads in heterogeneous GPU environmentsCluster Computing10.1007/s10586-023-04104-927:3(2775-2784)Online publication date: 12-Aug-2023
https://doi.org/10.1007/s10586-023-04104-9
Tian CWang YTian BZhao YZhou YWang CGuan HDou WChen G(2022)PushBox: Making Use of Every Bit of Time to Accelerate Completion of Data-Parallel JobsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318203733:12(4256-4269)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TPDS.2022.3182037
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten