survey

Classification Framework of MapReduce Scheduling Algorithms

Authors:

Santonu Sarkar,

Maria IndrawanAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 47, Issue 3

Article No.: 49, Pages 1 - 38

https://doi.org/10.1145/2693315

Published: 16 April 2015 Publication History

Abstract

A MapReduce scheduling algorithm plays a critical role in managing large clusters of hardware nodes and meeting multiple quality requirements by controlling the order and distribution of users, jobs, and tasks execution. A comprehensive and structured survey of the scheduling algorithms proposed so far is presented here using a novel multidimensional classification framework. These dimensions are (i) meeting quality requirements, (ii) scheduling entities, and (iii) adapting to dynamic environments; each dimension has its own taxonomy. An empirical evaluation framework for these algorithms is recommended. This survey identifies various open issues and directions for future research.

Supplementary Material

a49-tiwari-apndx.pdf (tiwari.zip)

Supplemental movie, appendix, image and software files for, Classification Framework of MapReduce Scheduling Algorithms

Download
52.72 KB

References

[1]

AMAZON. 2012. Amazon EC2. (Sep 2012). Retrieved October 19, 2012, from http://aws.amazon.com/ec2/.

[2]

APHIVE. 2013. Apache HIVE. Retrieved June 19, 2013, from http://hive.apache.org/.

[3]

APPIG. 2013. Apache Pig. Retrieved June 19, 2013, from http://pig.apache.org/.

[4]

Peter Brucker. 2004. Scheduling Algorithms. Springer-Verlag.

Digital Library

[5]

X. Bu, J. Rao, and C. Z. Xu. 2013. Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In Proceedings of the HPDC. 227--238.

Digital Library

[6]

F. Chen, M. Kodialam, and T. V. Lakshman. 2012. Joint scheduling of processing and shuffle phases in MapReduce systems. In Proceedings of INFOCOM. 1143--1151.

[7]

Q. Chen, M. Guo, Q. Deng, L. Zheng, S. Guo, and Y. Shen. 2013. HAT: History-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing 64, 3 (2013), 1038--1054.

Digital Library

[8]

Q. Chen, D. Zhang, M. Guo, Q. Deng, and S. Guo. 2010. SAMR: A self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In Proceedings of CIT. 2736--2743.

Digital Library

[9]

J. Dean and S. Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51 (2008), 107--113.

Digital Library

[10]

J. Dhok, N. Maheshwari, and V. Varma. 2010. Learning based opportunistic admission control algorithm for MapReduce as a service. In Proceedings of ISEC. 153--160.

Digital Library

[11]

M. J. Fischer, X. Su, and Y. Yin. 2010. Assigning tasks for efficiency in Hadoop: Extended abstract. In Proceedings of SPAA. 30--39.

Digital Library

[12]

Z. Guo, G. Fox, and M. Zhou. 2012. Improving resource utilization in MapReduce. Technical Report of Indiana University (2012).

[13]

HADOOP. 2012. The Apache Hadoop Project. (September 2012). Retrieved October 2, 2012, from http://hadoop.apache.org/docs/r1.2.1/.

[14]

M. Hammoud, M. S. Rehman, and M. F. Sakr. 2012. Center-of-gravity reduce task scheduling to lower MapReduce network traffic. In IEEE CLOUD. 49--58.

Digital Library

[15]

J. J. Hanson. 2011. An introduction to the Hadoop distributed file system. IBM Developer Works, Technical Library (2011).

[16]

HDPAPPS. 2012a. Apache Hadoop YARN. Retrieved April 2014, from http://hadoop.apache.org/docs/current/.

[17]

HDPAPPS. 2012b. Applications powered by Hadoop. Retrieved November 19, 2012, from http://wiki.apache.org/hadoop/PoweredBy.

[18]

C. He, Y. Lu, and D. Swanson. 2011. Matchmaking: A new MapReduce scheduling technique. In Proceedings of CloudCom. 40--47.

Digital Library

[19]

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of NSDI. 295--308.

Digital Library

[20]

S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu, and S. Wu. 2012. Maestro: Replica-aware map scheduling for MapReduce. IEEE International Symposium on Cluster Computing and the Grid 0 (2012), 435--442.

Digital Library

[21]

M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. 2009. Quincy: Fair scheduling for distributed computing clusters. In Proceedings of SOSP. 261--276.

Digital Library

[22]

R. Jain. 1991. The Art of Computer Systems Performance Analysis - Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley. I--XXVII, 1--685.

[23]

J. Jin, J. Luo, A. Song, F. Dong, and R. Xiong. 2011. BAR: An efficient data locality driven task scheduling algorithm for cloud computing. In Proceedings of CCGRID. 295--304.

Digital Library

[24]

K. Kc and K. Anyanwu. 2010. Scheduling Hadoop jobs to meet deadlines. In Proceedings of CLOUDCOM. 388--392.

Digital Library

[25]

K. A. Kumar, V. K. Konishetty, K. Voruganti, and G. V. P. Rao. 2012. CASH: Context aware scheduler for Hadoop. In Proceedings of ICACCI. 52--61.

Digital Library

[26]

W. Lang and J. M. Patel. 2010. Energy management for MapReduce clusters. Proceedings of VLDB Endowment 3, 1--2 (2010), 129--139.

Digital Library

[27]

E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys. 1993. Sequencing and scheduling: Algorithms and complexity. Handbooks in Operations Research and Management Science 4 (1993), 445--522.

[28]

J. Leverich and C. Kozyrakis. 2010. On the energy (in)efficiency of Hadoop clusters. SIGOPS Operating Systems Review 44, 1 (2010), 61--65.

Digital Library

[29]

H. Lin, X. Ma, J. Archuleta, W. Feng, M. Gardner, and Z. Zhang. 2010. MOON: MapReduce on opportunistic environments. In Proceedings of HPDC. 95--106.

Digital Library

[30]

H. Mao, S. Hu, Z. Zhang, L. Xiao, and L. Ruan. 2011. A load-driven task scheduler with adaptive DSC for MapReduce. In Proceedings of GREENCOM. 28--33.

Digital Library

[31]

M. Mattess, R. N. Calheiros, and R. Buyya. 2013. Scaling MapReduce applications across hybrid clouds to meet soft deadlines. In Proceedings of AINA. 629--636.

Digital Library

[32]

R. Nanduri, N. Maheshwari, A. Reddyraja, and V. Varma. 2011. Job aware scheduling algorithm for MapReduce framework. In Proceedings of CloudCom. 724--729.

Digital Library

[33]

P. Nguyen, T. Simon, M. Halem, D. Chapman, and Q. Le. 2012. A hybrid scheduling algorithm for data intensive workloads in a MapReduce environment. In Proceedings of UCC. 161--167.

Digital Library

[34]

K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of SOSP. 69--84.

Digital Library

[35]

P. Visalakshi and T. U. Karthik. 2011. MapReduce scheduler using classifiers for heterogeneous workloads. International Journal of Computer Science and Network Security 11 (2011), 68--73.

[36]

J. Park, D. Lee, B. Kim, J. Huh, and S. Maeng. 2012. Locality-aware dynamic VM reconfiguration on MapReduce clouds. In Proceedings of HPDC. 27--36.

Digital Library

[37]

Z. Peng and Y. Ma. 2011. A new scheduling algorithm in Hadoop MapReduce. Communications in Computer and Information Science 237 (2011), 537--543.

[38]

L. T. X. Phan, Z. Zhang, Q. Zheng, B. T. Loo, and I. Lee. 2011. An empirical analysis of scheduling techniques for real-time cloud-based data processing. In Proceedings of SOCA. 1--8.

Digital Library

[39]

J. Polo, D. de Nadal, D. Carrera, Y. Becerra, V. Beltran, J. Torres, and E. Ayguade. 2009. Adaptive task scheduling for multijob MapReduce environments. In Proceedings of Jornadas de Paralelismo Conference. 96--101A.

[40]

X. Qiu, W. L. Yeow, C. Wu, and F. C. M. Lau. 2013. Cost-minimizing preemptive scheduling of MapReduce workloads on hybrid clouds. In Proceedings of IWQoS. 1--6.

[41]

B. T. Rao and L. S. S. Reddy. 2011. Survey on improved scheduling in Hadoop MapReduce in cloud environments. International Journal of Computer Applications 34 (2011), 29--33.

[42]

A. Rasooli and D. G. Down. 2011. An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In Proceedings of CASCON. 30--44.

Digital Library

[43]

A. Rasooli and D. G. Down. 2012. A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In Proceedings of SCC. 1284--1291.

Digital Library

[44]

T. Sandholm and K. Lai. 2010. Dynamic proportional share scheduling in Hadoop. In Proceedings of JSSPP. 110--131.

Digital Library

[45]

M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. 2013. Omega: Flexible, scalable schedulers for large compute clusters. In Proceedings of EuroSys. 351--364.

Digital Library

[46]

B. Sharma, T. Wood, and C. R. Das. 2013. HybridMR: A hierarchical MapReduce scheduler for hybrid data centers. In Proceedings of ICDCS. 102--111.

Digital Library

[47]

B. Shi and A. Srivastava. 2010. Thermal and power-aware task scheduling for Hadoop based storage centric datacenters. In Proceedings of GreenComp. 73--83.

Digital Library

[48]

X. Sun, C. He, and Y. Lu. 2012. ESAMR: An enhanced self-adaptive MapReduce scheduling algorithm. In Proceedings of ICPADS. 148--155.

Digital Library

[49]

J. Tan, X. Meng, and L. Zhang. 2012. Coupling scheduler for Mapreduce/Hadoop. In Proceedings of HPDC. 129--130.

Digital Library

[50]

Z. Tang, J. Zhou, K. Li, and R. Li. 2012. A MapReduce task scheduling algorithm for deadline constraints. Cluster Computing, Springer (Dec 2012), 1--8.

Digital Library

[51]

A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. 2009. Hive- A warehousing solution over a map-reduce framework. In Proceedings of VLDB Endowment. 1626--1629.

Digital Library

[52]

C. Tian, H. Zhou, Y. He, and L. Zha. 2009. A dynamic MapReduce scheduler for heterogeneous workloads. In Proceedings of GCC. 218--224.

Digital Library

[53]

V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler. 2013. Apache Hadoop yarn: Yet another resource negotiator. In Proceedings of SOCC. 5:1--5:16.

Digital Library

[54]

A. Verma, L. Cherkasova, and R. H. Campbell. 2012a. Two sides of a coin: Optimizing the schedule of MapReduce jobs to minimize their makespan and improve cluster performance. In Proceedings of MASCOTS. 11--18.

Digital Library

[55]

A. Verma, L. Cherkasova, V. S. Kumar, and R. H. Campbell. 2012b. Deadline-based workload management for MapReduce environments: Pieces of the performance puzzle. In Proceedings of NOMS. 900--905.

[56]

X. Wang and Y. Wang. 2011. Energy-efficient multi-task scheduling based on MapReduce for cloud computing. In Proceedings of CIS. 57--62.

Digital Library

[57]

Y. Wang and W. Shi. 2013. On scheduling algorithms for MapReduce jobs in heterogeneous clouds with budget constraints. In Proceedings of OPODIS. 251--265.

Digital Library

[58]

T. White. 2009. Hadoop: The Definitive Guide (1st ed.). O’Reilly Media, Inc.

Digital Library

[59]

J. Wolf, A. Balmin, D. Rajan, K. Hildrum, R. Khandekar, S. Parekh, K. Wu, and R. Vernica. 2012. CIRCUMFLEX: A scheduling optimizer for MapReduce workloads with shared scans. SIGOPS Operating Systems Review. 46 (2012), 26--32.

Digital Library

[60]

J. Wolf, D. Rajan, K. Hildrum, R. Khandekar, V. Kumar, S. Parekh, K. Wu, and A. balmin. 2010. FLEX: A slot allocation scheduling optimizer for MapReduce workloads. In Proceedings of Middleware. 1--20.

Digital Library

[61]

Y. Xia, L. Wang, Q. Zhao, and G. Zhang. 2011. Research on job scheduling algorithm in Hadoop. Journal of Computational Information Systems 7 (2011), 5769--5775.

[62]

N. Yigitbasi, K. Datta, N. Jain, and T. Willke. 2011. Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In Proceedings of GCM. 1:1--1:6.

Digital Library

[63]

D. Yoo and K. M. Sim. 2011. A comparative review of job scheduling for MapReduce. In Proceedings of CCIS. 353--358.

[64]

D. Yoo and K. M. Sim. 2012. A locality enhanced scheduling method for multiple MapReduce jobs in a workflow application. IPCSIT 24 (Feb 2012), 142--146.

[65]

M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. 2009. Job Scheduling for Multi-User MapReduce Clusters. Technical Report. EECS Department, University of California, Berkeley.

[66]

M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. 2010. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of EuroSys. 265--278.

Digital Library

[67]

M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. 2008. Improving MapReduce performance in heterogeneous environments. In Proceedings of OSDI. 29--42.

Digital Library

[68]

X. Zhang, Z. Zhong, S. Feng, B. Tu, and J. Fan. 2011. Improving data locality of MapReduce by scheduling in homogeneous computing environments. In Proceedings of ISPA. 120--126.

Digital Library

Cited By

Wang PLi JWang Yliu YZhang Y(2024)Assessment of soil fertility in Xinjiang oasis cotton field based on big data techniquesBig Data Research10.1016/j.bdr.2024.10048037(100480)Online publication date: Aug-2024
https://doi.org/10.1016/j.bdr.2024.100480
Hedayati SMaleki NOlsson TAhlgren FSeyednezhad MBerahmand K(2023)MapReduce scheduling algorithms in Hadoop: a systematic studyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00520-912:1Online publication date: 10-Oct-2023
https://dl.acm.org/doi/10.1186/s13677-023-00520-9
Wang ZZheng FXu YLiu MSun L(2022)Total weighted tardiness for scheduling MapReduce jobs on parallel batch machinesJournal of Industrial and Management Optimization10.3934/jimo.2022201(0)Online publication date: 2022
https://doi.org/10.3934/jimo.2022201
Show More Cited By

Index Terms

Classification Framework of MapReduce Scheduling Algorithms

Recommendations

An optimized MapReduce workflow scheduling algorithm for heterogeneous computing

The MapReduce framework is considered to be an effective resolution for huge and parallel data processing. This paper treats a massive data processing workflow as a DAG graph consisting of MapReduce jobs. In a heterogeneous computing environment, the ...
TaskTracker aware scheduler with resource availability control for Hadoop MapReduce

Schedulers are playing a vital role in task assignment for Hadoop MapReduce. In some scenario, the default schedulers of Hadoop spawn tasks in TaskTracker without checking the external dependency and may fail. As a result, Hadoop should rerun the tasks in ...
MapReduce scheduling algorithms in Hadoop: a systematic study
Abstract
Hadoop is a framework for storing and processing huge volumes of data on clusters. It uses Hadoop Distributed File System (HDFS) for storing data and uses MapReduce to process that data. MapReduce is a parallel computing framework for processing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 47, Issue 3

April 2015

602 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/2737799

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2015

Accepted: 01 December 2014

Revised: 01 October 2014

Received: 01 January 2014

Published in CSUR Volume 47, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

49
Total Citations
View Citations
1,658
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang PLi JWang Yliu YZhang Y(2024)Assessment of soil fertility in Xinjiang oasis cotton field based on big data techniquesBig Data Research10.1016/j.bdr.2024.10048037(100480)Online publication date: Aug-2024
https://doi.org/10.1016/j.bdr.2024.100480
Hedayati SMaleki NOlsson TAhlgren FSeyednezhad MBerahmand K(2023)MapReduce scheduling algorithms in Hadoop: a systematic studyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00520-912:1Online publication date: 10-Oct-2023
https://dl.acm.org/doi/10.1186/s13677-023-00520-9
Wang ZZheng FXu YLiu MSun L(2022)Total weighted tardiness for scheduling MapReduce jobs on parallel batch machinesJournal of Industrial and Management Optimization10.3934/jimo.2022201(0)Online publication date: 2022
https://doi.org/10.3934/jimo.2022201
Wang JLi XRuiz RYang JChu D(2022)Energy Utilization Task Scheduling for MapReduce in Heterogeneous ClustersIEEE Transactions on Services Computing10.1109/TSC.2020.296669715:2(931-944)Online publication date: 1-Mar-2022
https://doi.org/10.1109/TSC.2020.2966697
Bawankule KDewang RSingh A(2022)A classification framework for straggler mitigation and management in a heterogeneous Hadoop cluster: A state-of-art surveyJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.02.02134:9(7621-7644)Online publication date: Oct-2022
https://doi.org/10.1016/j.jksuci.2022.02.021
Shabestari FRahmani ANavimipour NJabbehdari S(2022)A YARN-based Energy-Aware Scheduling Method for Big Data Applications under Deadline ConstraintsJournal of Grid Computing10.1007/s10723-022-09627-w20:4Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s10723-022-09627-w
Wang XWang CBai MMa QLi G(2022)HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduceDistributed and Parallel Databases10.1007/s10619-021-07375-640:1(135-163)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1007/s10619-021-07375-6
Awaysheh FAlazab MGarg SNiyato DVerikoukis C(2021)Big Data Resource Management & Networks: Taxonomy, Survey, and Future DirectionsIEEE Communications Surveys & Tutorials10.1109/COMST.2021.309499323:4(2098-2130)Online publication date: Dec-2022
https://doi.org/10.1109/COMST.2021.3094993
Maleki NRahmani AConti M(2021)SPO: A Secure and Performance-aware Optimization for MapReduce SchedulingJournal of Network and Computer Applications10.1016/j.jnca.2020.102944176(102944)Online publication date: Feb-2021
https://doi.org/10.1016/j.jnca.2020.102944
Bawankule KDewang RSingh A(2021)Historical data based approach for straggler avoidance in a heterogeneous Hadoop clusterJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-020-02699-0Online publication date: 8-Feb-2021
https://doi.org/10.1007/s12652-020-02699-0
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents