skip to main content
survey

Classification Framework of MapReduce Scheduling Algorithms

Published: 16 April 2015 Publication History

Abstract

A MapReduce scheduling algorithm plays a critical role in managing large clusters of hardware nodes and meeting multiple quality requirements by controlling the order and distribution of users, jobs, and tasks execution. A comprehensive and structured survey of the scheduling algorithms proposed so far is presented here using a novel multidimensional classification framework. These dimensions are (i) meeting quality requirements, (ii) scheduling entities, and (iii) adapting to dynamic environments; each dimension has its own taxonomy. An empirical evaluation framework for these algorithms is recommended. This survey identifies various open issues and directions for future research.

Supplementary Material

a49-tiwari-apndx.pdf (tiwari.zip)
Supplemental movie, appendix, image and software files for, Classification Framework of MapReduce Scheduling Algorithms

References

[1]
AMAZON. 2012. Amazon EC2. (Sep 2012). Retrieved October 19, 2012, from http://aws.amazon.com/ec2/.
[2]
APHIVE. 2013. Apache HIVE. Retrieved June 19, 2013, from http://hive.apache.org/.
[3]
APPIG. 2013. Apache Pig. Retrieved June 19, 2013, from http://pig.apache.org/.
[4]
Peter Brucker. 2004. Scheduling Algorithms. Springer-Verlag.
[5]
X. Bu, J. Rao, and C. Z. Xu. 2013. Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In Proceedings of the HPDC. 227--238.
[6]
F. Chen, M. Kodialam, and T. V. Lakshman. 2012. Joint scheduling of processing and shuffle phases in MapReduce systems. In Proceedings of INFOCOM. 1143--1151.
[7]
Q. Chen, M. Guo, Q. Deng, L. Zheng, S. Guo, and Y. Shen. 2013. HAT: History-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing 64, 3 (2013), 1038--1054.
[8]
Q. Chen, D. Zhang, M. Guo, Q. Deng, and S. Guo. 2010. SAMR: A self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In Proceedings of CIT. 2736--2743.
[9]
J. Dean and S. Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51 (2008), 107--113.
[10]
J. Dhok, N. Maheshwari, and V. Varma. 2010. Learning based opportunistic admission control algorithm for MapReduce as a service. In Proceedings of ISEC. 153--160.
[11]
M. J. Fischer, X. Su, and Y. Yin. 2010. Assigning tasks for efficiency in Hadoop: Extended abstract. In Proceedings of SPAA. 30--39.
[12]
Z. Guo, G. Fox, and M. Zhou. 2012. Improving resource utilization in MapReduce. Technical Report of Indiana University (2012).
[13]
HADOOP. 2012. The Apache Hadoop Project. (September 2012). Retrieved October 2, 2012, from http://hadoop.apache.org/docs/r1.2.1/.
[14]
M. Hammoud, M. S. Rehman, and M. F. Sakr. 2012. Center-of-gravity reduce task scheduling to lower MapReduce network traffic. In IEEE CLOUD. 49--58.
[15]
J. J. Hanson. 2011. An introduction to the Hadoop distributed file system. IBM Developer Works, Technical Library (2011).
[16]
HDPAPPS. 2012a. Apache Hadoop YARN. Retrieved April 2014, from http://hadoop.apache.org/docs/current/.
[17]
HDPAPPS. 2012b. Applications powered by Hadoop. Retrieved November 19, 2012, from http://wiki.apache.org/hadoop/PoweredBy.
[18]
C. He, Y. Lu, and D. Swanson. 2011. Matchmaking: A new MapReduce scheduling technique. In Proceedings of CloudCom. 40--47.
[19]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of NSDI. 295--308.
[20]
S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu, and S. Wu. 2012. Maestro: Replica-aware map scheduling for MapReduce. IEEE International Symposium on Cluster Computing and the Grid 0 (2012), 435--442.
[21]
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. 2009. Quincy: Fair scheduling for distributed computing clusters. In Proceedings of SOSP. 261--276.
[22]
R. Jain. 1991. The Art of Computer Systems Performance Analysis - Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley. I--XXVII, 1--685.
[23]
J. Jin, J. Luo, A. Song, F. Dong, and R. Xiong. 2011. BAR: An efficient data locality driven task scheduling algorithm for cloud computing. In Proceedings of CCGRID. 295--304.
[24]
K. Kc and K. Anyanwu. 2010. Scheduling Hadoop jobs to meet deadlines. In Proceedings of CLOUDCOM. 388--392.
[25]
K. A. Kumar, V. K. Konishetty, K. Voruganti, and G. V. P. Rao. 2012. CASH: Context aware scheduler for Hadoop. In Proceedings of ICACCI. 52--61.
[26]
W. Lang and J. M. Patel. 2010. Energy management for MapReduce clusters. Proceedings of VLDB Endowment 3, 1--2 (2010), 129--139.
[27]
E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys. 1993. Sequencing and scheduling: Algorithms and complexity. Handbooks in Operations Research and Management Science 4 (1993), 445--522.
[28]
J. Leverich and C. Kozyrakis. 2010. On the energy (in)efficiency of Hadoop clusters. SIGOPS Operating Systems Review 44, 1 (2010), 61--65.
[29]
H. Lin, X. Ma, J. Archuleta, W. Feng, M. Gardner, and Z. Zhang. 2010. MOON: MapReduce on opportunistic environments. In Proceedings of HPDC. 95--106.
[30]
H. Mao, S. Hu, Z. Zhang, L. Xiao, and L. Ruan. 2011. A load-driven task scheduler with adaptive DSC for MapReduce. In Proceedings of GREENCOM. 28--33.
[31]
M. Mattess, R. N. Calheiros, and R. Buyya. 2013. Scaling MapReduce applications across hybrid clouds to meet soft deadlines. In Proceedings of AINA. 629--636.
[32]
R. Nanduri, N. Maheshwari, A. Reddyraja, and V. Varma. 2011. Job aware scheduling algorithm for MapReduce framework. In Proceedings of CloudCom. 724--729.
[33]
P. Nguyen, T. Simon, M. Halem, D. Chapman, and Q. Le. 2012. A hybrid scheduling algorithm for data intensive workloads in a MapReduce environment. In Proceedings of UCC. 161--167.
[34]
K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of SOSP. 69--84.
[35]
P. Visalakshi and T. U. Karthik. 2011. MapReduce scheduler using classifiers for heterogeneous workloads. International Journal of Computer Science and Network Security 11 (2011), 68--73.
[36]
J. Park, D. Lee, B. Kim, J. Huh, and S. Maeng. 2012. Locality-aware dynamic VM reconfiguration on MapReduce clouds. In Proceedings of HPDC. 27--36.
[37]
Z. Peng and Y. Ma. 2011. A new scheduling algorithm in Hadoop MapReduce. Communications in Computer and Information Science 237 (2011), 537--543.
[38]
L. T. X. Phan, Z. Zhang, Q. Zheng, B. T. Loo, and I. Lee. 2011. An empirical analysis of scheduling techniques for real-time cloud-based data processing. In Proceedings of SOCA. 1--8.
[39]
J. Polo, D. de Nadal, D. Carrera, Y. Becerra, V. Beltran, J. Torres, and E. Ayguade. 2009. Adaptive task scheduling for multijob MapReduce environments. In Proceedings of Jornadas de Paralelismo Conference. 96--101A.
[40]
X. Qiu, W. L. Yeow, C. Wu, and F. C. M. Lau. 2013. Cost-minimizing preemptive scheduling of MapReduce workloads on hybrid clouds. In Proceedings of IWQoS. 1--6.
[41]
B. T. Rao and L. S. S. Reddy. 2011. Survey on improved scheduling in Hadoop MapReduce in cloud environments. International Journal of Computer Applications 34 (2011), 29--33.
[42]
A. Rasooli and D. G. Down. 2011. An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In Proceedings of CASCON. 30--44.
[43]
A. Rasooli and D. G. Down. 2012. A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In Proceedings of SCC. 1284--1291.
[44]
T. Sandholm and K. Lai. 2010. Dynamic proportional share scheduling in Hadoop. In Proceedings of JSSPP. 110--131.
[45]
M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. 2013. Omega: Flexible, scalable schedulers for large compute clusters. In Proceedings of EuroSys. 351--364.
[46]
B. Sharma, T. Wood, and C. R. Das. 2013. HybridMR: A hierarchical MapReduce scheduler for hybrid data centers. In Proceedings of ICDCS. 102--111.
[47]
B. Shi and A. Srivastava. 2010. Thermal and power-aware task scheduling for Hadoop based storage centric datacenters. In Proceedings of GreenComp. 73--83.
[48]
X. Sun, C. He, and Y. Lu. 2012. ESAMR: An enhanced self-adaptive MapReduce scheduling algorithm. In Proceedings of ICPADS. 148--155.
[49]
J. Tan, X. Meng, and L. Zhang. 2012. Coupling scheduler for Mapreduce/Hadoop. In Proceedings of HPDC. 129--130.
[50]
Z. Tang, J. Zhou, K. Li, and R. Li. 2012. A MapReduce task scheduling algorithm for deadline constraints. Cluster Computing, Springer (Dec 2012), 1--8.
[51]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. 2009. Hive- A warehousing solution over a map-reduce framework. In Proceedings of VLDB Endowment. 1626--1629.
[52]
C. Tian, H. Zhou, Y. He, and L. Zha. 2009. A dynamic MapReduce scheduler for heterogeneous workloads. In Proceedings of GCC. 218--224.
[53]
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler. 2013. Apache Hadoop yarn: Yet another resource negotiator. In Proceedings of SOCC. 5:1--5:16.
[54]
A. Verma, L. Cherkasova, and R. H. Campbell. 2012a. Two sides of a coin: Optimizing the schedule of MapReduce jobs to minimize their makespan and improve cluster performance. In Proceedings of MASCOTS. 11--18.
[55]
A. Verma, L. Cherkasova, V. S. Kumar, and R. H. Campbell. 2012b. Deadline-based workload management for MapReduce environments: Pieces of the performance puzzle. In Proceedings of NOMS. 900--905.
[56]
X. Wang and Y. Wang. 2011. Energy-efficient multi-task scheduling based on MapReduce for cloud computing. In Proceedings of CIS. 57--62.
[57]
Y. Wang and W. Shi. 2013. On scheduling algorithms for MapReduce jobs in heterogeneous clouds with budget constraints. In Proceedings of OPODIS. 251--265.
[58]
T. White. 2009. Hadoop: The Definitive Guide (1st ed.). O’Reilly Media, Inc.
[59]
J. Wolf, A. Balmin, D. Rajan, K. Hildrum, R. Khandekar, S. Parekh, K. Wu, and R. Vernica. 2012. CIRCUMFLEX: A scheduling optimizer for MapReduce workloads with shared scans. SIGOPS Operating Systems Review. 46 (2012), 26--32.
[60]
J. Wolf, D. Rajan, K. Hildrum, R. Khandekar, V. Kumar, S. Parekh, K. Wu, and A. balmin. 2010. FLEX: A slot allocation scheduling optimizer for MapReduce workloads. In Proceedings of Middleware. 1--20.
[61]
Y. Xia, L. Wang, Q. Zhao, and G. Zhang. 2011. Research on job scheduling algorithm in Hadoop. Journal of Computational Information Systems 7 (2011), 5769--5775.
[62]
N. Yigitbasi, K. Datta, N. Jain, and T. Willke. 2011. Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In Proceedings of GCM. 1:1--1:6.
[63]
D. Yoo and K. M. Sim. 2011. A comparative review of job scheduling for MapReduce. In Proceedings of CCIS. 353--358.
[64]
D. Yoo and K. M. Sim. 2012. A locality enhanced scheduling method for multiple MapReduce jobs in a workflow application. IPCSIT 24 (Feb 2012), 142--146.
[65]
M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. 2009. Job Scheduling for Multi-User MapReduce Clusters. Technical Report. EECS Department, University of California, Berkeley.
[66]
M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. 2010. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of EuroSys. 265--278.
[67]
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. 2008. Improving MapReduce performance in heterogeneous environments. In Proceedings of OSDI. 29--42.
[68]
X. Zhang, Z. Zhong, S. Feng, B. Tu, and J. Fan. 2011. Improving data locality of MapReduce by scheduling in homogeneous computing environments. In Proceedings of ISPA. 120--126.

Cited By

View all
  • (2024)Assessment of soil fertility in Xinjiang oasis cotton field based on big data techniquesBig Data Research10.1016/j.bdr.2024.10048037(100480)Online publication date: Aug-2024
  • (2023)MapReduce scheduling algorithms in Hadoop: a systematic studyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00520-912:1Online publication date: 10-Oct-2023
  • (2022)Total weighted tardiness for scheduling MapReduce jobs on parallel batch machinesJournal of Industrial and Management Optimization10.3934/jimo.2022201(0)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 47, Issue 3
April 2015
602 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2737799
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2015
Accepted: 01 December 2014
Revised: 01 October 2014
Received: 01 January 2014
Published in CSUR Volume 47, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Distributed computing
  2. Hadoop
  3. MapReduce
  4. big-data
  5. distributed data
  6. scheduling

Qualifiers

  • Survey
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Assessment of soil fertility in Xinjiang oasis cotton field based on big data techniquesBig Data Research10.1016/j.bdr.2024.10048037(100480)Online publication date: Aug-2024
  • (2023)MapReduce scheduling algorithms in Hadoop: a systematic studyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00520-912:1Online publication date: 10-Oct-2023
  • (2022)Total weighted tardiness for scheduling MapReduce jobs on parallel batch machinesJournal of Industrial and Management Optimization10.3934/jimo.2022201(0)Online publication date: 2022
  • (2022)Energy Utilization Task Scheduling for MapReduce in Heterogeneous ClustersIEEE Transactions on Services Computing10.1109/TSC.2020.296669715:2(931-944)Online publication date: 1-Mar-2022
  • (2022)A classification framework for straggler mitigation and management in a heterogeneous Hadoop cluster: A state-of-art surveyJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.02.02134:9(7621-7644)Online publication date: Oct-2022
  • (2022)A YARN-based Energy-Aware Scheduling Method for Big Data Applications under Deadline ConstraintsJournal of Grid Computing10.1007/s10723-022-09627-w20:4Online publication date: 1-Dec-2022
  • (2022)HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduceDistributed and Parallel Databases10.1007/s10619-021-07375-640:1(135-163)Online publication date: 1-Mar-2022
  • (2021)Big Data Resource Management & Networks: Taxonomy, Survey, and Future DirectionsIEEE Communications Surveys & Tutorials10.1109/COMST.2021.309499323:4(2098-2130)Online publication date: Dec-2022
  • (2021)SPO: A Secure and Performance-aware Optimization for MapReduce SchedulingJournal of Network and Computer Applications10.1016/j.jnca.2020.102944176(102944)Online publication date: Feb-2021
  • (2021)Historical data based approach for straggler avoidance in a heterogeneous Hadoop clusterJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-020-02699-0Online publication date: 8-Feb-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media