Abstract
A significant amount of research in the field of job scheduling is carried out in Hadoop. However, there is still need for research to overcome some challenges regarding scheduling jobs in Hadoop clusters. There are various factors affecting the performance of scheduling policies like data volume (storage), data source format (different data), speed (data rate), security and privacy, cost, connection and data sharing. To reach a better utilization of resources and managing big data, scheduling policies have been designed. In this paper, an algorithm has been presented that can run on heterogeneous Hadoop clusters and runs job in parallel. This algorithm first distributes data based on the performance of the nodes and then schedules the jobs according to their cost of execution and decreases the cost of executing the jobs. The presented algorithm offers better performance in terms of execution time, cost and locality compared to FIFO and Fair schedulers.
Similar content being viewed by others
References
Khan N, Yaqoob I, Hashem IA, Inayat Z, Ali WK, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:712826
Guo Y, Wu L, Yu W, Wang B, Wang X] (2015) The improved job scheduling algorithm of Hadoop platform.pdf . arXiv e-prints
Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints, arXiv:1802.04819
Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15
Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124
Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366
Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219
Brahmwar M, Kumar M, Sikka G (2016) Tolhit—a scheduling algorithm for Hadoop cluster. Proc Comput Sci 89:203–208
Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proc 10:70–75
Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC’13), pp 159–165
Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proc Comput Sci 18:2468–2471
Usama M, Liu M, Chen M (2017) Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273
Bidgoli A, Tabar M, Rahmani A (2010) An artificial immune system for task scheduling in grid computing with task balancing, pp 25–31
Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based MapReduce simulator. In: Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, pp 2993–2997
Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating MapReduce performance using workload suites. In: 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. EECS Department, University of California, Berkeley
Acknowledgements
This paper has been extracted from a PhD thesis entitled "Improve of scheduling in Hadoop clusters" with the supervision of Dr Yaghoubian, Dr BagheriFard, Dr Nejatian, Dr Parvin.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Javanmardi, A.K., Yaghoubyan, S.H., Bagherifard, K. et al. A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems. J Supercomput 77, 1–22 (2021). https://doi.org/10.1007/s11227-020-03256-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03256-4