Skip to main content
Log in

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Job scheduling in Hadoop has been thus far investigated in several studies. However, some challenges including minimum share (min-share), heterogeneous cluster, execution time estimation, and scheduling program size facing Hadoop clusters have received less attention. Accordingly, one of the most important algorithms with regard to min-share is that presented by Facebook Inc., i.e., FAIR scheduler, based on its own needs, in which an equal min-share has been considered for users. In this article, an attempt has been made to make the proposed method superior to existing methods through automation and configuration, performance optimization, fairness and data locality. A high-level architectural model is designed. Then a scheduler is defined on this architectural model. The provided scheduler contains four components. Three components schedule jobs and one component distributes the data for each job among the nodes. The given scheduler will be capable of being executed on heterogeneous Hadoop clusters and running jobs in parallel, in which disparate min-shares can be assigned to each job or user. Moreover, an approach is presented for each problem associated with min-share, cluster heterogeneity, execution time estimation, and scheduler program size. These approaches can be also utilized on its own to improve the performance of other scheduling algorithms. The scheduler presented in this paper showed acceptable performance compared with First-In, First-Out (FIFO), and FAIR schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209

    Article  Google Scholar 

  2. Breur T (2016) Statistical power analysis and the contemporary “crisis” in social sciences. J Mark Anal 4(2–3):61–65

    Article  Google Scholar 

  3. Zhou S, Xie J, Du N, Pang Y (2018) A random-keys genetic algorithm for scheduling unrelated parallel batch processing machines with different capacities and arbitrary job sizes. Appl Math Comput 334:254–268

    MathSciNet  MATH  Google Scholar 

  4. Cheng B, Cai J, Yang S, Hu X (2014) Algorithms for scheduling incompatible job families on single batching machine with limited capacity. Comput Ind Eng 75:116–120

    Article  Google Scholar 

  5. Hu Y, Zhou H, de Laat C, Zhao Z (2020) Concurrent container scheduling on heterogeneous clusters with multi-resource constraints. Future Gener Comput Syst 102:562–573

    Article  Google Scholar 

  6. Osorio-Valenzuela L, Pereira J, Quezada F, Vásquez ÓC (2019) Minimizing the number of machines with limited workload capacity for scheduling jobs with interval constraints. Appl Math Model 74:512–527

    Article  MathSciNet  Google Scholar 

  7. Moon Y-H, Youn C-H (2015) Multihybrid job scheduling for fault-tolerant distributed computing in policy-constrained resource networks. Comput Netw 82:81–95

    Article  Google Scholar 

  8. Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366

    Article  Google Scholar 

  9. Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user mapreduce clusters. In: EECS Department, University of California, Berkeley

  10. Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219

    Article  Google Scholar 

  11. Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proced 10:70–75

    Article  Google Scholar 

  12. Usama M, Liu M, Chen M (2017) Job schedulers for Big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273

    Article  Google Scholar 

  13. Guoa Y, Wu L, Yuc W, Wud B, Wange X (2015) The improved job scheduling algorithm of Hadoop platform.pdf. arXiv e-prints

  14. Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC'13), pp 159–165.

  15. Naik NS, Negi A, BR TB, Anitha R, (2019) A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434

    Article  Google Scholar 

  16. Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proced Comput Sci 18:2468–2471

    Article  Google Scholar 

  17. Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15

    Article  Google Scholar 

  18. Liang W, Chen Y, Liu J, An H (2019) CARS: a contention-aware scheduler for efficient resource management of HPC storage systems. Parallel Comput 87:25–34

    Article  Google Scholar 

  19. Brahmwar M, Kumar M, Sikka G (2016) Tolhit: a scheduling algorithm for Hadoop cluster. Proced Comput Sci 89:203–208

    Article  Google Scholar 

  20. Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints. arXiv:1802.04819

  21. Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating mapreduce performance using workload suites. In: Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems

  22. Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124

    Article  Google Scholar 

  23. Islam MT, Srirama SN, Karunasekera S, Buyya R (2020) Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw 162:110515

    Article  Google Scholar 

  24. Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based mapreduce simulator. In: Proceedings of the Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE. pp 2993–2997.

  25. Hv A, Sebastian S (2017) Comparative study of job schedulers in Hadoop environment. Int J Adv Res Comput Sci 8(3).

  26. Bahel E, Trudeau C (2019) Stability and fairness in the job scheduling problem. Games Econ Behav 117:1–14

    Article  MathSciNet  Google Scholar 

  27. Hamad F (2018) An overview of Hadoop scheduler algorithms. Mod Appl Sci 12:69

    Article  Google Scholar 

Download references

Acknowledgements

This paper has been extracted from a PhD thesis entitled “Improvement of scheduling in Hadoop clusters” with the supervision of Dr Yaghoubyan, Dr BagheriFard, Dr Nejatian, Dr Parvin.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Hadi Yaghoubyan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Javanmardi, A.K., Yaghoubyan, S.H., BagheriFard, K. et al. An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems. J Supercomput 77, 5289–5318 (2021). https://doi.org/10.1007/s11227-020-03487-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03487-5

Keywords

Navigation