Skip to main content
Log in

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

A significant amount of research in the field of job scheduling is carried out in Hadoop. However, there is still need for research to overcome some challenges regarding scheduling jobs in Hadoop clusters. There are various factors affecting the performance of scheduling policies like data volume (storage), data source format (different data), speed (data rate), security and privacy, cost, connection and data sharing. To reach a better utilization of resources and managing big data, scheduling policies have been designed. In this paper, an algorithm has been presented that can run on heterogeneous Hadoop clusters and runs job in parallel. This algorithm first distributes data based on the performance of the nodes and then schedules the jobs according to their cost of execution and decreases the cost of executing the jobs. The presented algorithm offers better performance in terms of execution time, cost and locality compared to FIFO and Fair schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Khan N, Yaqoob I, Hashem IA, Inayat Z, Ali WK, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:712826

    Google Scholar 

  2. Guo Y, Wu L, Yu W, Wang B, Wang X] (2015) The improved job scheduling algorithm of Hadoop platform.pdf . arXiv e-prints

  3. Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints, arXiv:1802.04819

  4. Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15

    Article  Google Scholar 

  5. Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124

    Article  Google Scholar 

  6. Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366

    Article  Google Scholar 

  7. Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219

    Article  Google Scholar 

  8. Brahmwar M, Kumar M, Sikka G (2016) Tolhit—a scheduling algorithm for Hadoop cluster. Proc Comput Sci 89:203–208

    Article  Google Scholar 

  9. Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proc 10:70–75

    Article  Google Scholar 

  10. Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC’13), pp 159–165

  11. Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proc Comput Sci 18:2468–2471

    Article  Google Scholar 

  12. Usama M, Liu M, Chen M (2017) Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273

    Article  Google Scholar 

  13. Bidgoli A, Tabar M, Rahmani A (2010) An artificial immune system for task scheduling in grid computing with task balancing, pp 25–31

  14. Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based MapReduce simulator. In: Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, pp 2993–2997

  15. Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating MapReduce performance using workload suites. In: 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems

  16. Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. EECS Department, University of California, Berkeley

    Google Scholar 

Download references

Acknowledgements

This paper has been extracted from a PhD thesis entitled "Improve of scheduling in Hadoop clusters" with the supervision of Dr Yaghoubian, Dr BagheriFard, Dr Nejatian, Dr Parvin.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Hadi Yaghoubyan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Javanmardi, A.K., Yaghoubyan, S.H., Bagherifard, K. et al. A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems. J Supercomput 77, 1–22 (2021). https://doi.org/10.1007/s11227-020-03256-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03256-4

Keywords

Navigation