Guidelines for Selecting Hadoop Schedulers Based on System Heterogeneity

Rasooli, Aysan; Down, Douglas G.

doi:10.1007/s10723-014-9299-2

Guidelines for Selecting Hadoop Schedulers Based on System Heterogeneity

Published: 22 July 2014

Volume 12, pages 499–519, (2014)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Aysan Rasooli¹ &
Douglas G. Down¹

386 Accesses
18 Citations
Explore all metrics

Abstract

Hadoop has been developed as a solution for performing large-scale data-parallel applications in Cloud computing. A Hadoop system can be described based on three factors: cluster, workload, and user. Each factor is either heterogeneous or homogeneous, which reflects the heterogeneity level of the Hadoop system. This paper studies the effect of heterogeneity in each of these factors on the performance of Hadoop schedulers. Three schedulers which consider different levels of Hadoop heterogeneity are used for the analysis: FIFO, Fair sharing, and COSHH (Classification and Optimization based Scheduler for Heterogeneous Hadoop). Performance issues are introduced for Hadoop schedulers, and experiments are provided to evaluate these issues. The reported results suggest guidelines for selecting an appropriate scheduler for Hadoop systems. Finally, the proposed guidelines are evaluated in different Hadoop systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008). doi:10.1145/1327452.1327492
Article Google Scholar
Sankar, K., Bouchard, S.A.: Enterprise Web 2.0. Cisco Press (2009)
Rasooli, A., Down, D.G.: A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In: Proceedings of the 5th IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS12), Salt Lake City 2012
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, Paris, 265–278 April 2010. doi:10.1145/1755913.1755940
Rasooli, A., Down, D.G.: An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, CASCON ’11, IBM Corporation, Toronto, 30–44 2011. http://dl.acm.org/citation.cfm?id=2093889.2093893
Sandholm, T., Lai, K.: Dynamic proportional share scheduling in Hadoop. In: Proceedings of the 15th Workshop on Job Scheduling Strategies for Parallel Processing, Heidelberg, 110–131 2010
Chen, Y., Ganapathi, A., Griffith, R., Katz, R.H.: The case for evaluating MapReduce performance using workload suites. In: Proceedings of the 19th Annual IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Washington, 390–399 2011. doi:10.1109/MASCOTS.2011
Apache: Hadoop on demand documentation. http://hadoop.apache.org/common/docs/r0.17.2/hod.html (2007). Accessed 30 Nov 2010
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, 24–24 2011. http://dl.acm.org/citation.cfm?id=1972457.1972490
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: A cross-industry study of MapReduce workloads. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) Endowment. 5(12) 1802–1813 2012. http://dl.acm.org/citation.cfm?id=2367502.2367519
Hammoud, S., Li, M., Liu, Y., Alham, N.K., Liu, Z.: MRSim: A discrete event based MapReduce simulator. In: Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010), IEEE, pp. 2993–2997 2010
Gottfrid, D., Self-service: Prorated super computing fun. http://tinyurl.com/2pjh5n (2009)
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job scheduling for multi-user MapReduce clusters. Tech. Rep. UCB/EECS-2009-55, EECS Department, University of California, Berkeley 2009. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.html
Aboulnaga, A., Wang, Z., Zhang, Z.Y.: Packing the most onto your Cloud. In: Proceedings of the First International Workshop on Cloud Data Management, 25–28 2009. doi:10.1145/1651263.1651268
Yang, H., Luan, Z., Li, W., Qian, D.: MapReduce workload modeling with statistical approach. J. Grid Comput 10(2), 279–310 (2012). doi:10.1007/s10723-011-9201-4
Article Google Scholar
Zhang, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. J. Grid Comput 10(1), 47–68 (2012). doi:10.1007/s10723-012-9204-9
Article Google Scholar
Rimal, B., Jukan, A., Katsaros, D., Goeleven, Y.: Architectural requirements for cloud computing systems: an enterprise cloud approach. J. Grid Comput 9(1), 3–26 (2011). doi:10.1007/s10723-010-9171-y
Article Google Scholar
Shamsi, J., Khojaye, M., Qasmi, M.: Data-intensive cloud computing: requirements, expectations, challenges, and solutions. J. Grid Comput 9(1), 3–26 (2011). doi:10.1007/s10723-010-9171-y
Article Google Scholar
Jones, M, Self-service: Scheduling in Hadoop: an introduction to the pluggable scheduler framework. http://www.ibm.com/developerworks/library/os-hadoop-scheduling/ (2011)
White, T.: Hadoop: The Definitive Guide, 3rd edn. Book, O’Reilly Media. ISBN-10:1449311520
He-yang, K., Qun, Y., Li-song, W., Xi, D.: Improved delay-scheduler algorithm in homogeneous Hadoop cluster. In: Application Research of Computers, 5, pp. 1397-1401 (2013)
Ahmad, F., Chakradhar, S., Raghunathan, A., Vijaykumar, T.: Tarazu: Optimizing MapReduce on heterogeneous clusters. ACM SIGARCH Comput. Architure News 40(1), 61–74 (2012). doi:10.1145/2189750.2150984
Article Google Scholar
Zaharia, M., Konwinski, A., Joseph, A., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, 29-42 2008
Zaharia, M., Konwinski, A., Joseph, A., Katz, R., Stoica, I.: Big data processing with Hadoop MapReduce in cloud systems. (IJ-CLOSER) Int. J. Cloud Comput. Serv. Sci 2(1), 16–27 (2013)
Google Scholar
Rasooli, A., Down, D.G.: COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems, to appear In: Future Generation Computer Systems. doi:10.1016/j.future.2014.01.002
Rasooli, A.: Improving scheduling in heterogeneous Grid and Hadoop systems, Ph.D. thesis, McMaster University, Hamilton, July 2013
Agarwal, S., Stoica, I.: Chronos: a predictive task scheduler for MapReduce, Tech. rep., EECS Department, University of California, Berkeley, December 2010 http://www.cs.berkeley.edu/~sameerag/

Download references

Author information

Authors and Affiliations

Department of Computing and Software, McMaster University, L8S 4K1, Hamilton, Canada
Aysan Rasooli & Douglas G. Down

Authors

Aysan Rasooli
View author publications
You can also search for this author in PubMed Google Scholar
Douglas G. Down
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Douglas G. Down.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rasooli, A., Down, D.G. Guidelines for Selecting Hadoop Schedulers Based on System Heterogeneity. J Grid Computing 12, 499–519 (2014). https://doi.org/10.1007/s10723-014-9299-2

Download citation

Received: 04 January 2013
Accepted: 26 March 2014
Published: 22 July 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10723-014-9299-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Guidelines for Selecting Hadoop Schedulers Based on System Heterogeneity

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Dynamic resource allocation in cloud computing: analysis and taxonomies

An Overview of Multi-cloud Computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Guidelines for Selecting Hadoop Schedulers Based on System Heterogeneity

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Dynamic resource allocation in cloud computing: analysis and taxonomies

An Overview of Multi-cloud Computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation