Skip to main content
Log in

Guidelines for Selecting Hadoop Schedulers Based on System Heterogeneity

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Hadoop has been developed as a solution for performing large-scale data-parallel applications in Cloud computing. A Hadoop system can be described based on three factors: cluster, workload, and user. Each factor is either heterogeneous or homogeneous, which reflects the heterogeneity level of the Hadoop system. This paper studies the effect of heterogeneity in each of these factors on the performance of Hadoop schedulers. Three schedulers which consider different levels of Hadoop heterogeneity are used for the analysis: FIFO, Fair sharing, and COSHH (Classification and Optimization based Scheduler for Heterogeneous Hadoop). Performance issues are introduced for Hadoop schedulers, and experiments are provided to evaluate these issues. The reported results suggest guidelines for selecting an appropriate scheduler for Hadoop systems. Finally, the proposed guidelines are evaluated in different Hadoop systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008). doi:10.1145/1327452.1327492

    Article  Google Scholar 

  2. Sankar, K., Bouchard, S.A.: Enterprise Web 2.0. Cisco Press (2009)

  3. Rasooli, A., Down, D.G.: A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In: Proceedings of the 5th IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS12), Salt Lake City 2012

  4. Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, Paris, 265–278 April 2010. doi:10.1145/1755913.1755940

  5. Rasooli, A., Down, D.G.: An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, CASCON ’11, IBM Corporation, Toronto, 30–44 2011. http://dl.acm.org/citation.cfm?id=2093889.2093893

  6. Sandholm, T., Lai, K.: Dynamic proportional share scheduling in Hadoop. In: Proceedings of the 15th Workshop on Job Scheduling Strategies for Parallel Processing, Heidelberg, 110–131 2010

  7. Chen, Y., Ganapathi, A., Griffith, R., Katz, R.H.: The case for evaluating MapReduce performance using workload suites. In: Proceedings of the 19th Annual IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Washington, 390–399 2011. doi:10.1109/MASCOTS.2011

  8. Apache: Hadoop on demand documentation. http://hadoop.apache.org/common/docs/r0.17.2/hod.html (2007). Accessed 30 Nov 2010

  9. Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, 24–24 2011. http://dl.acm.org/citation.cfm?id=1972457.1972490

  10. Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: A cross-industry study of MapReduce workloads. In: Proceedings of the International Conference on Very Large Data Bases (VLDB) Endowment. 5(12) 1802–1813 2012. http://dl.acm.org/citation.cfm?id=2367502.2367519

  11. Hammoud, S., Li, M., Liu, Y., Alham, N.K., Liu, Z.: MRSim: A discrete event based MapReduce simulator. In: Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010), IEEE, pp. 2993–2997 2010

  12. Gottfrid, D., Self-service: Prorated super computing fun. http://tinyurl.com/2pjh5n (2009)

  13. Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job scheduling for multi-user MapReduce clusters. Tech. Rep. UCB/EECS-2009-55, EECS Department, University of California, Berkeley 2009. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.html

  14. Aboulnaga, A., Wang, Z., Zhang, Z.Y.: Packing the most onto your Cloud. In: Proceedings of the First International Workshop on Cloud Data Management, 25–28 2009. doi:10.1145/1651263.1651268

  15. Yang, H., Luan, Z., Li, W., Qian, D.: MapReduce workload modeling with statistical approach. J. Grid Comput 10(2), 279–310 (2012). doi:10.1007/s10723-011-9201-4

    Article  Google Scholar 

  16. Zhang, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. J. Grid Comput 10(1), 47–68 (2012). doi:10.1007/s10723-012-9204-9

    Article  Google Scholar 

  17. Rimal, B., Jukan, A., Katsaros, D., Goeleven, Y.: Architectural requirements for cloud computing systems: an enterprise cloud approach. J. Grid Comput 9(1), 3–26 (2011). doi:10.1007/s10723-010-9171-y

    Article  Google Scholar 

  18. Shamsi, J., Khojaye, M., Qasmi, M.: Data-intensive cloud computing: requirements, expectations, challenges, and solutions. J. Grid Comput 9(1), 3–26 (2011). doi:10.1007/s10723-010-9171-y

    Article  Google Scholar 

  19. Jones, M, Self-service: Scheduling in Hadoop: an introduction to the pluggable scheduler framework. http://www.ibm.com/developerworks/library/os-hadoop-scheduling/ (2011)

  20. White, T.: Hadoop: The Definitive Guide, 3rd edn. Book, O’Reilly Media. ISBN-10:1449311520

  21. He-yang, K., Qun, Y., Li-song, W., Xi, D.: Improved delay-scheduler algorithm in homogeneous Hadoop cluster. In: Application Research of Computers, 5, pp. 1397-1401 (2013)

  22. Ahmad, F., Chakradhar, S., Raghunathan, A., Vijaykumar, T.: Tarazu: Optimizing MapReduce on heterogeneous clusters. ACM SIGARCH Comput. Architure News 40(1), 61–74 (2012). doi:10.1145/2189750.2150984

    Article  Google Scholar 

  23. Zaharia, M., Konwinski, A., Joseph, A., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, 29-42 2008

  24. Zaharia, M., Konwinski, A., Joseph, A., Katz, R., Stoica, I.: Big data processing with Hadoop MapReduce in cloud systems. (IJ-CLOSER) Int. J. Cloud Comput. Serv. Sci 2(1), 16–27 (2013)

    Google Scholar 

  25. Rasooli, A., Down, D.G.: COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems, to appear In: Future Generation Computer Systems. doi:10.1016/j.future.2014.01.002

  26. Rasooli, A.: Improving scheduling in heterogeneous Grid and Hadoop systems, Ph.D. thesis, McMaster University, Hamilton, July 2013

  27. Agarwal, S., Stoica, I.: Chronos: a predictive task scheduler for MapReduce, Tech. rep., EECS Department, University of California, Berkeley, December 2010 http://www.cs.berkeley.edu/~sameerag/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas G. Down.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rasooli, A., Down, D.G. Guidelines for Selecting Hadoop Schedulers Based on System Heterogeneity. J Grid Computing 12, 499–519 (2014). https://doi.org/10.1007/s10723-014-9299-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-014-9299-2

Keywords

Navigation