An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

Javanmardi, Abdol Karim; Yaghoubyan, S. Hadi; BagheriFard, Karamollah; Nejatian, Samad; Parvin, Hamid

doi:10.1007/s11227-020-03487-5

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

Published: 05 November 2020

Volume 77, pages 5289–5318, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Abdol Karim Javanmardi¹,
S. Hadi Yaghoubyan^1,3,
Karamollah BagheriFard^1,3,
Samad Nejatian^2,3 &
…
Hamid Parvin^4,5,6

172 Accesses
5 Citations
Explore all metrics

Abstract

Job scheduling in Hadoop has been thus far investigated in several studies. However, some challenges including minimum share (min-share), heterogeneous cluster, execution time estimation, and scheduling program size facing Hadoop clusters have received less attention. Accordingly, one of the most important algorithms with regard to min-share is that presented by Facebook Inc., i.e., FAIR scheduler, based on its own needs, in which an equal min-share has been considered for users. In this article, an attempt has been made to make the proposed method superior to existing methods through automation and configuration, performance optimization, fairness and data locality. A high-level architectural model is designed. Then a scheduler is defined on this architectural model. The provided scheduler contains four components. Three components schedule jobs and one component distributes the data for each job among the nodes. The given scheduler will be capable of being executed on heterogeneous Hadoop clusters and running jobs in parallel, in which disparate min-shares can be assigned to each job or user. Moreover, an approach is presented for each problem associated with min-share, cluster heterogeneity, execution time estimation, and scheduler program size. These approaches can be also utilized on its own to improve the performance of other scheduling algorithms. The scheduler presented in this paper showed acceptable performance compared with First-In, First-Out (FIFO), and FAIR schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Article Open access 17 April 2024

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

Article 08 July 2023

References

Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209
Article Google Scholar
Breur T (2016) Statistical power analysis and the contemporary “crisis” in social sciences. J Mark Anal 4(2–3):61–65
Article Google Scholar
Zhou S, Xie J, Du N, Pang Y (2018) A random-keys genetic algorithm for scheduling unrelated parallel batch processing machines with different capacities and arbitrary job sizes. Appl Math Comput 334:254–268
MathSciNet MATH Google Scholar
Cheng B, Cai J, Yang S, Hu X (2014) Algorithms for scheduling incompatible job families on single batching machine with limited capacity. Comput Ind Eng 75:116–120
Article Google Scholar
Hu Y, Zhou H, de Laat C, Zhao Z (2020) Concurrent container scheduling on heterogeneous clusters with multi-resource constraints. Future Gener Comput Syst 102:562–573
Article Google Scholar
Osorio-Valenzuela L, Pereira J, Quezada F, Vásquez ÓC (2019) Minimizing the number of machines with limited workload capacity for scheduling jobs with interval constraints. Appl Math Model 74:512–527
Article MathSciNet Google Scholar
Moon Y-H, Youn C-H (2015) Multihybrid job scheduling for fault-tolerant distributed computing in policy-constrained resource networks. Comput Netw 82:81–95
Article Google Scholar
Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366
Article Google Scholar
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user mapreduce clusters. In: EECS Department, University of California, Berkeley
Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219
Article Google Scholar
Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proced 10:70–75
Article Google Scholar
Usama M, Liu M, Chen M (2017) Job schedulers for Big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273
Article Google Scholar
Guoa Y, Wu L, Yuc W, Wud B, Wange X (2015) The improved job scheduling algorithm of Hadoop platform.pdf. arXiv e-prints
Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC'13), pp 159–165.
Naik NS, Negi A, BR TB, Anitha R, (2019) A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434
Article Google Scholar
Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proced Comput Sci 18:2468–2471
Article Google Scholar
Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15
Article Google Scholar
Liang W, Chen Y, Liu J, An H (2019) CARS: a contention-aware scheduler for efficient resource management of HPC storage systems. Parallel Comput 87:25–34
Article Google Scholar
Brahmwar M, Kumar M, Sikka G (2016) Tolhit: a scheduling algorithm for Hadoop cluster. Proced Comput Sci 89:203–208
Article Google Scholar
Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints. arXiv:1802.04819
Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating mapreduce performance using workload suites. In: Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124
Article Google Scholar
Islam MT, Srirama SN, Karunasekera S, Buyya R (2020) Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw 162:110515
Article Google Scholar
Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based mapreduce simulator. In: Proceedings of the Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE. pp 2993–2997.
Hv A, Sebastian S (2017) Comparative study of job schedulers in Hadoop environment. Int J Adv Res Comput Sci 8(3).
Bahel E, Trudeau C (2019) Stability and fairness in the job scheduling problem. Games Econ Behav 117:1–14
Article MathSciNet Google Scholar
Hamad F (2018) An overview of Hadoop scheduler algorithms. Mod Appl Sci 12:69
Article Google Scholar

Download references

Acknowledgements

This paper has been extracted from a PhD thesis entitled “Improvement of scheduling in Hadoop clusters” with the supervision of Dr Yaghoubyan, Dr BagheriFard, Dr Nejatian, Dr Parvin.

Author information

Authors and Affiliations

Department of Computer Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Abdol Karim Javanmardi, S. Hadi Yaghoubyan & Karamollah BagheriFard
Department of Electrical Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran
Samad Nejatian
Young Researchers and Elite Club, Yasooj Branch, Islamic Azad University, Yasooj, Iran
S. Hadi Yaghoubyan, Karamollah BagheriFard & Samad Nejatian
Institute of Research and Development, Duy Tan University, Da Nang, 550000, Vietnam
Hamid Parvin
Faculty of Information Technology, Duy Tan University, Da Nang, 550000, Vietnam
Hamid Parvin
Department of Computer Engineering, Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran
Hamid Parvin

Authors

Abdol Karim Javanmardi
View author publications
You can also search for this author in PubMed Google Scholar
S. Hadi Yaghoubyan
View author publications
You can also search for this author in PubMed Google Scholar
Karamollah BagheriFard
View author publications
You can also search for this author in PubMed Google Scholar
Samad Nejatian
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Parvin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Hadi Yaghoubyan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Javanmardi, A.K., Yaghoubyan, S.H., BagheriFard, K. et al. An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems. J Supercomput 77, 5289–5318 (2021). https://doi.org/10.1007/s11227-020-03487-5

Download citation

Accepted: 22 October 2020
Published: 05 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11227-020-03487-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation