A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

Javanmardi, Abdol Karim; Yaghoubyan, S. Hadi; Bagherifard, Karamollah; Nejatian, Samad; Parvin, Hamid

doi:10.1007/s11227-020-03256-4

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

Published: 19 March 2020

Volume 77, pages 1–22, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Abdol Karim Javanmardi¹,
S. Hadi Yaghoubyan^1,3,
Karamollah Bagherifard^1,3,
Samad Nejatian^2,3 &
…
Hamid Parvin^4,5

374 Accesses
26 Citations
Explore all metrics

Abstract

A significant amount of research in the field of job scheduling is carried out in Hadoop. However, there is still need for research to overcome some challenges regarding scheduling jobs in Hadoop clusters. There are various factors affecting the performance of scheduling policies like data volume (storage), data source format (different data), speed (data rate), security and privacy, cost, connection and data sharing. To reach a better utilization of resources and managing big data, scheduling policies have been designed. In this paper, an algorithm has been presented that can run on heterogeneous Hadoop clusters and runs job in parallel. This algorithm first distributes data based on the performance of the nodes and then schedules the jobs according to their cost of execution and decreases the cost of executing the jobs. The presented algorithm offers better performance in terms of execution time, cost and locality compared to FIFO and Fair schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Article Open access 17 April 2024

Dynamic resource allocation in cloud computing: analysis and taxonomies

Article 28 January 2022

References

Khan N, Yaqoob I, Hashem IA, Inayat Z, Ali WK, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:712826
Google Scholar
Guo Y, Wu L, Yu W, Wang B, Wang X] (2015) The improved job scheduling algorithm of Hadoop platform.pdf . arXiv e-prints
Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints, arXiv:1802.04819
Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15
Article Google Scholar
Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124
Article Google Scholar
Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366
Article Google Scholar
Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219
Article Google Scholar
Brahmwar M, Kumar M, Sikka G (2016) Tolhit—a scheduling algorithm for Hadoop cluster. Proc Comput Sci 89:203–208
Article Google Scholar
Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proc 10:70–75
Article Google Scholar
Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC’13), pp 159–165
Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proc Comput Sci 18:2468–2471
Article Google Scholar
Usama M, Liu M, Chen M (2017) Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273
Article Google Scholar
Bidgoli A, Tabar M, Rahmani A (2010) An artificial immune system for task scheduling in grid computing with task balancing, pp 25–31
Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based MapReduce simulator. In: Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, pp 2993–2997
Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating MapReduce performance using workload suites. In: 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. EECS Department, University of California, Berkeley
Google Scholar

Download references

Acknowledgements

This paper has been extracted from a PhD thesis entitled "Improve of scheduling in Hadoop clusters" with the supervision of Dr Yaghoubian, Dr BagheriFard, Dr Nejatian, Dr Parvin.

Author information

Authors and Affiliations

Department of Computer Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Abdol Karim Javanmardi, S. Hadi Yaghoubyan & Karamollah Bagherifard
Department of Electrical Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Samad Nejatian
Young Researchers and Elite Club, Yasooj Branch, Islamic Azad University, Yasooj, Iran
S. Hadi Yaghoubyan, Karamollah Bagherifard & Samad Nejatian
Department of Computer Engineering, Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran
Hamid Parvin
Young Researchers and Elite Club, Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran
Hamid Parvin

Authors

Abdol Karim Javanmardi
View author publications
You can also search for this author in PubMed Google Scholar
S. Hadi Yaghoubyan
View author publications
You can also search for this author in PubMed Google Scholar
Karamollah Bagherifard
View author publications
You can also search for this author in PubMed Google Scholar
Samad Nejatian
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Parvin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Hadi Yaghoubyan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Javanmardi, A.K., Yaghoubyan, S.H., Bagherifard, K. et al. A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems. J Supercomput 77, 1–22 (2021). https://doi.org/10.1007/s11227-020-03256-4

Download citation

Published: 19 March 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11227-020-03256-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Dynamic resource allocation in cloud computing: analysis and taxonomies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Dynamic resource allocation in cloud computing: analysis and taxonomies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation