Effective Resource Utilization in Heterogeneous Hadoop Environment Through a Dynamic Inter-cluster and Intra-cluster Load Balancing

Hosni, Emna; chaari, Wided; Kolsi, Nader; Ghedira, Khaled

doi:10.1007/978-3-031-21967-2_54

Emna Hosni ORCID: orcid.org/0000-0003-3430-2966¹³,
Wided chaari¹³,
Nader Kolsi¹⁴ &
…
Khaled Ghedira¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13758))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

805 Accesses

Abstract

Apache Hadoop is one of the most popular distributed computing systems, used largely for big data analysis and processing. The Hadoop cluster hosts multiple parallel workloads requiring various resource usage (CPU, RAM, etc.). In practice, in heterogeneous Hadoop environments, resource-intensive tasks may be allocated to the lower performing nodes, causing load imbalance between and within clusters and high data transfer cost. These weaknesses lead to performance deterioration of the Hadoop system and delays the completion of all submitted jobs. To overcome these challenges, this paper proposes an efficient and dynamic load balancing policy in a heterogeneous Hadoop YARN cluster. This novel load balancing model is based on clustering nodes into subgroups of nodes similar in performance, and then allocating different jobs in these subgroups using a multi-criteria ranking. This policy ensures the most accurate match between resource demands and available resources in real time, which decreases the data transfer in the cluster. The experimental results show that the introduced approach allows reducing noticeably the completion time s by 42% and 11% compared with the H-fair and a load balancing approach respectively. Thus, Hadoop can rapidly release the resources for the next job which enhance the overall performance of the distributed computing systems. The obtained finding also reveal that our approach optimizes the use of the available resources and avoids cluster over-load in real time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Resource Allocation Strategy on Yarn Using Modified AHP Multi-criteria Method for Various Jobs Performed on a Heterogeneous Hadoop Cluster

Adaptive load balancing in cluster computing environment

Article 10 June 2023

Apache Hadoop Yarn MapReduce Job Classification Based on CPU Utilization and Performance Evaluation on Multi-cluster Heterogeneous Environment

Notes

1.
https://hadoop.apache.org/.

References

Bawankule, K.L., Dewang, R.K., Singh, A.K.: Load balancing approach for a mapreduce job running on a heterogeneous hadoop cluster. In: Goswami, D., Hoang, T.A. (eds.) ICDCIT 2021. LNCS, vol. 12582, pp. 289–298. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65621-8_19
Chapter Google Scholar
Chen, W., Rao, J., Zhou, X.: Addressing performance heterogeneity in mapreduce clusters with elastic tasks. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1078–1087. IEEE (2017)
Google Scholar
Delgado, P., Didona, D., Dinu, F., Zwaenepoel, W.: Kairos: Preemptive data center scheduling without runtime estimates. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 135–148 (2018)
Google Scholar
Jia, R., Yang, Y., Grundy, J., Keung, J., Li, H.: A highly efficient data locality aware task scheduler for cloud-based systems. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), pp. 496–498. IEEE (2019)
Google Scholar
Karun, A.K., Chitharanjan, K.: A review on hadoop-hdfs infrastructure extensions. In: 2013 IEEE Conference on Information & Communication Technologies, pp. 132–137. IEEE (2013)
Google Scholar
Li, X., Da, X., L.: A review of internet of things-resource allocation. IEEE Internet Things J. 8(11), 8657–8666 (2020)
Google Scholar
Naik, N.S., Negi, A., Br, T.B., Anitha, R.: A data locality based scheduler to enhance mapreduce performance in heterogeneous environments. Futur. Gener. Comput. Syst. 90, 423–434 (2019)
Article Google Scholar
Paik, S.S., Goswami, R.S., Roy, D., Reddy, K.H.: Intelligent data placement in heterogeneous hadoop cluster. In: International Conference on Next Generation Computing Technologies, pp. 568–579. Springer (2017). https://doi.org/10.1007/978-981-10-8657-1_43
Postoaca, A.V., Pop, F., Prodan, R.: h-fair: asymptotic scheduling of heavy workloads in heterogeneous data centers. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 366–369. IEEE (2018)
Google Scholar
Saaty, T.L.: Decision making for leaders: the analytic hierarchy process for decisions in a complex world. RWS publications (1990)
Google Scholar
Syakur, M., Khotimah, B., Rochman, E., Satoto, B.D.: Integration k-means clustering method and elbow method for identification of the best customer profile cluster. In: IOP Conference Series: Materials Science and Engineering, vol. 336, p. 012017. IOP Publishing (2018)
Google Scholar
Thu, M.P., Nwe, K.M., Aye, K.N.: Replication based on data locality for hadoop distributed file system. 9th International Workshop on Computer Science and Engineering, WCSE 2019 (2019)
Google Scholar
Wang, M., Wu, C.Q., Cao, H., Liu, Y., Wang, Y., Hou, A.: On mapreduce scheduling in hadoop yarn on heterogeneous clusters. In: 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science And Engineering (TrustCom/BigDataSE), pp. 1747–1754. IEEE (2018)
Google Scholar
Yan, W., Li, C., Du, S., Mao, X.: An optimization algorithm for heterogeneous hadoop clusters based on dynamic load balancing. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 250–255. IEEE (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

National School of Computer Sciences, LARIA, Tunisia
Emna Hosni & Wided chaari
Tunis School of Business, LARIA, Tunisia
Nader Kolsi
Honoris United Universities, ESPRIT, Tunisia
Khaled Ghedira

Authors

Emna Hosni
View author publications
You can also search for this author in PubMed Google Scholar
Wided chaari
View author publications
You can also search for this author in PubMed Google Scholar
Nader Kolsi
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Ghedira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emna Hosni .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Vietnam National University, Ho Chi Minh City, Ho Chi Minh City, Vietnam
Tien Khoa Tran
Al-Farabi Kazakh National University, Almaty, Kazakhstan
Ualsher Tukayev
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński
University of Newcastle, Newcastle, NSW, Australia
Edward Szczerbicki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hosni, E., chaari, W., Kolsi, N., Ghedira, K. (2022). Effective Resource Utilization in Heterogeneous Hadoop Environment Through a Dynamic Inter-cluster and Intra-cluster Load Balancing. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol 13758. Springer, Cham. https://doi.org/10.1007/978-3-031-21967-2_54

Download citation

DOI: https://doi.org/10.1007/978-3-031-21967-2_54
Published: 09 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21966-5
Online ISBN: 978-3-031-21967-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Effective Resource Utilization in Heterogeneous Hadoop Environment Through a Dynamic Inter-cluster and Intra-cluster Load Balancing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Resource Allocation Strategy on Yarn Using Modified AHP Multi-criteria Method for Various Jobs Performed on a Heterogeneous Hadoop Cluster

Adaptive load balancing in cluster computing environment

Apache Hadoop Yarn MapReduce Job Classification Based on CPU Utilization and Performance Evaluation on Multi-cluster Heterogeneous Environment

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Effective Resource Utilization in Heterogeneous Hadoop Environment Through a Dynamic Inter-cluster and Intra-cluster Load Balancing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Resource Allocation Strategy on Yarn Using Modified AHP Multi-criteria Method for Various Jobs Performed on a Heterogeneous Hadoop Cluster

Adaptive load balancing in cluster computing environment

Apache Hadoop Yarn MapReduce Job Classification Based on CPU Utilization and Performance Evaluation on Multi-cluster Heterogeneous Environment

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation