Abstract
Apache Hadoop is one of the most popular distributed computing systems, used largely for big data analysis and processing. The Hadoop cluster hosts multiple parallel workloads requiring various resource usage (CPU, RAM, etc.). In practice, in heterogeneous Hadoop environments, resource-intensive tasks may be allocated to the lower performing nodes, causing load imbalance between and within clusters and high data transfer cost. These weaknesses lead to performance deterioration of the Hadoop system and delays the completion of all submitted jobs. To overcome these challenges, this paper proposes an efficient and dynamic load balancing policy in a heterogeneous Hadoop YARN cluster. This novel load balancing model is based on clustering nodes into subgroups of nodes similar in performance, and then allocating different jobs in these subgroups using a multi-criteria ranking. This policy ensures the most accurate match between resource demands and available resources in real time, which decreases the data transfer in the cluster. The experimental results show that the introduced approach allows reducing noticeably the completion time s by 42% and 11% compared with the H-fair and a load balancing approach respectively. Thus, Hadoop can rapidly release the resources for the next job which enhance the overall performance of the distributed computing systems. The obtained finding also reveal that our approach optimizes the use of the available resources and avoids cluster over-load in real time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Bawankule, K.L., Dewang, R.K., Singh, A.K.: Load balancing approach for a mapreduce job running on a heterogeneous hadoop cluster. In: Goswami, D., Hoang, T.A. (eds.) ICDCIT 2021. LNCS, vol. 12582, pp. 289–298. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65621-8_19
Chen, W., Rao, J., Zhou, X.: Addressing performance heterogeneity in mapreduce clusters with elastic tasks. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1078–1087. IEEE (2017)
Delgado, P., Didona, D., Dinu, F., Zwaenepoel, W.: Kairos: Preemptive data center scheduling without runtime estimates. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 135–148 (2018)
Jia, R., Yang, Y., Grundy, J., Keung, J., Li, H.: A highly efficient data locality aware task scheduler for cloud-based systems. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), pp. 496–498. IEEE (2019)
Karun, A.K., Chitharanjan, K.: A review on hadoop-hdfs infrastructure extensions. In: 2013 IEEE Conference on Information & Communication Technologies, pp. 132–137. IEEE (2013)
Li, X., Da, X., L.: A review of internet of things-resource allocation. IEEE Internet Things J. 8(11), 8657–8666 (2020)
Naik, N.S., Negi, A., Br, T.B., Anitha, R.: A data locality based scheduler to enhance mapreduce performance in heterogeneous environments. Futur. Gener. Comput. Syst. 90, 423–434 (2019)
Paik, S.S., Goswami, R.S., Roy, D., Reddy, K.H.: Intelligent data placement in heterogeneous hadoop cluster. In: International Conference on Next Generation Computing Technologies, pp. 568–579. Springer (2017). https://doi.org/10.1007/978-981-10-8657-1_43
Postoaca, A.V., Pop, F., Prodan, R.: h-fair: asymptotic scheduling of heavy workloads in heterogeneous data centers. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 366–369. IEEE (2018)
Saaty, T.L.: Decision making for leaders: the analytic hierarchy process for decisions in a complex world. RWS publications (1990)
Syakur, M., Khotimah, B., Rochman, E., Satoto, B.D.: Integration k-means clustering method and elbow method for identification of the best customer profile cluster. In: IOP Conference Series: Materials Science and Engineering, vol. 336, p. 012017. IOP Publishing (2018)
Thu, M.P., Nwe, K.M., Aye, K.N.: Replication based on data locality for hadoop distributed file system. 9th International Workshop on Computer Science and Engineering, WCSE 2019 (2019)
Wang, M., Wu, C.Q., Cao, H., Liu, Y., Wang, Y., Hou, A.: On mapreduce scheduling in hadoop yarn on heterogeneous clusters. In: 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science And Engineering (TrustCom/BigDataSE), pp. 1747–1754. IEEE (2018)
Yan, W., Li, C., Du, S., Mao, X.: An optimization algorithm for heterogeneous hadoop clusters based on dynamic load balancing. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 250–255. IEEE (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hosni, E., chaari, W., Kolsi, N., Ghedira, K. (2022). Effective Resource Utilization in Heterogeneous Hadoop Environment Through a Dynamic Inter-cluster and Intra-cluster Load Balancing. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol 13758. Springer, Cham. https://doi.org/10.1007/978-3-031-21967-2_54
Download citation
DOI: https://doi.org/10.1007/978-3-031-21967-2_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21966-5
Online ISBN: 978-3-031-21967-2
eBook Packages: Computer ScienceComputer Science (R0)