ABSTRACT
MapReduce and Hadoop represent an economically compelling alternative for efficient large scale data processing and advanced analytics in the enterprise. A key challenge in shared MapReduce clusters is the ability to automatically tailor and control resource allocations to different applications for achieving their performance goals. Currently, there is no job scheduler for MapReduce environments that given a job completion deadline, could allocate the appropriate amount of resources to the job so that it meets the required Service Level Objective (SLO). In this work, we propose a framework, called ARIA, to address this problem. It comprises of three inter-related components. First, for a production job that is routinely executed on a new dataset, we build a job profile that compactly summarizes critical performance characteristics of the underlying application during the map and reduce stages. Second, we design a MapReduce performance model, that for a given job (with a known profile) and its SLO (soft deadline), estimates the amount of resources required for job completion within the deadline. Finally, we implement a novel SLO-based scheduler in Hadoop that determines job ordering and the amount of resources to allocate for meeting the job deadlines.
We validate our approach using a set of realistic applications. The new scheduler effectively meets the jobs' SLOs until the job demands exceed the cluster resources. The results of the extensive simulation study are validated through detailed experiments on a 66-node Hadoop cluster.
- G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the Outliers in Map-Reduce Clusters using Mantri. In Proc. of OSDI'2010. Google ScholarDigital Library
- Apache. Capacity Scheduler Guide, 2010. URL http://hadoop. apache.org/common/docs/r0.20.1/capacity_scheduler.html.Google Scholar
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51 (1):107--113, 2008. Google ScholarDigital Library
- A. Ganapathi, Y. Chen, A. Fox, R. Katz, and D. Patterson. Statistics-driven workload modeling for the cloud. In Proc. of 5th Intl. Workshop on Self Managing Database Systems, 2010.Google ScholarCross Ref
- R.L. Graham. Bounds for certain multiprocessing anomalies. Bell System Tech. Journal, 45(9):1563--1581, 1966.Google Scholar
- Intel. Optimizing Hadoop* Deployments, 2010. URL http://communities.intel.com/docs/DOC-4218.Google Scholar
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS OS Review, 41(3):72, 2007. Google ScholarDigital Library
- M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In Proc. of SOSP'2009. Google ScholarDigital Library
- K. Kambatla, A. Pathak, and H. Pucha. Towards optimizing hadoop provisioning in the cloud. In Proc. of the First Workshop on Hot Topics in Cloud Computing, 2009. Google ScholarDigital Library
- S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan. An Analysis of Traces from a Production MapReduce Cluster. In Proc. of CCGrid'2010. Google ScholarDigital Library
- A. Konwinski, M. Zaharia, R. Katz, and I. Stoica. X-tracing Hadoop. Hadoop Summit, 2008.Google Scholar
- H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In Proc. of WWW'2010. Google ScholarDigital Library
- O. O'Malley and A.C. Murthy. Winning a 60 second dash with a yellow elephant, 2009.Google Scholar
- K. Morton, M. Balazinska, D. Grossman.ParaTimer: a progress indicator for MapReduce DAGs. In Proc. of SIGMOD'2010. Google ScholarDigital Library
- L. Phan, Z. Zhang, B. Loo, and I. Lee. Real-time MapReduce Scheduling. Tech. Report No. MS-CIS-10-32, UPenn, 2010.Google Scholar
- J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguade, M. Steinder, and I. Whalley. Performance-driven task co-scheduling for MapReduce environments. In 12th IEEE/IFIP Network Operations and Management Symposium. ACM, 2010.Google ScholarCross Ref
- T. Sandholm and K. Lai. Dynamic Proportional Share Scheduling in Hadoop. LNCS: Proc. of the 15th Workshop on Job Scheduling Strategies for Parallel Processing, 2010. Google ScholarDigital Library
- J. Tan, X. Pan, S. Kavulya, E. Marinelli, R. Gandhi, and P. Narasimhan. Kahuna: Problem Diagnosis for MapReduce-based Cloud Computing Environments. In 12th IEEE/IFIP NOMS, 2010.Google Scholar
- G. Wang, A.R. Butt, P. Pandey, and K. Gupta. A simulation approach to evaluating design decisions in MapReduce setups. In Proc of MASCOTS'2009.Google Scholar
- J. Wolf, et al. FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads. In Proc.of Middleware'2010. Google ScholarDigital Library
- M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proc. of EuroSys, pages 265--278. ACM, 2010. Google ScholarDigital Library
- M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In OSDI, 2008. Google ScholarDigital Library
Index Terms
- ARIA: automatic resource inference and allocation for mapreduce environments
Recommendations
Single Machine Batch Scheduling Problem with Resource Dependent Setup and Processing Time in the Presence of Fuzzy Due Date
We consider a batch scheduling problem on a single machine which processes jobs with resource dependent setup and processing time in the presence of fuzzy due-dates given as follows:
1. There are n independent non-preemptive and simultaneously ...
Single-machine due-window assignment and scheduling with resource allocation, aging effect, and a deteriorating rate-modifying activity
We consider single-machine scheduling with a common due-window and a deteriorating rate-modifying activity. We assume that the processing time of a job is a function of the amount of a resource allocated to it, its position in the processing sequence, ...
HRF: a resource allocation scheme for moldable jobs
CF '15: Proceedings of the 12th ACM International Conference on Computing FrontiersMoldable jobs, which allow the number of allocated processors to be adjusted before running in clusters, have attracted increasing concern in parallel job scheduling research. Compared with traditional rigid jobs where the number of allocated processors ...
Comments