Skip to main content

Effective Resource Utilization in Heterogeneous Hadoop Environment Through a Dynamic Inter-cluster and Intra-cluster Load Balancing

  • Conference paper
  • First Online:
Intelligent Information and Database Systems (ACIIDS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13758))

Included in the following conference series:

  • 638 Accesses

Abstract

Apache Hadoop is one of the most popular distributed computing systems, used largely for big data analysis and processing. The Hadoop cluster hosts multiple parallel workloads requiring various resource usage (CPU, RAM, etc.). In practice, in heterogeneous Hadoop environments, resource-intensive tasks may be allocated to the lower performing nodes, causing load imbalance between and within clusters and high data transfer cost. These weaknesses lead to performance deterioration of the Hadoop system and delays the completion of all submitted jobs. To overcome these challenges, this paper proposes an efficient and dynamic load balancing policy in a heterogeneous Hadoop YARN cluster. This novel load balancing model is based on clustering nodes into subgroups of nodes similar in performance, and then allocating different jobs in these subgroups using a multi-criteria ranking. This policy ensures the most accurate match between resource demands and available resources in real time, which decreases the data transfer in the cluster. The experimental results show that the introduced approach allows reducing noticeably the completion time s by 42% and 11% compared with the H-fair and a load balancing approach respectively. Thus, Hadoop can rapidly release the resources for the next job which enhance the overall performance of the distributed computing systems. The obtained finding also reveal that our approach optimizes the use of the available resources and avoids cluster over-load in real time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://hadoop.apache.org/.

References

  1. Bawankule, K.L., Dewang, R.K., Singh, A.K.: Load balancing approach for a mapreduce job running on a heterogeneous hadoop cluster. In: Goswami, D., Hoang, T.A. (eds.) ICDCIT 2021. LNCS, vol. 12582, pp. 289–298. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65621-8_19

    Chapter  Google Scholar 

  2. Chen, W., Rao, J., Zhou, X.: Addressing performance heterogeneity in mapreduce clusters with elastic tasks. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1078–1087. IEEE (2017)

    Google Scholar 

  3. Delgado, P., Didona, D., Dinu, F., Zwaenepoel, W.: Kairos: Preemptive data center scheduling without runtime estimates. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 135–148 (2018)

    Google Scholar 

  4. Jia, R., Yang, Y., Grundy, J., Keung, J., Li, H.: A highly efficient data locality aware task scheduler for cloud-based systems. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), pp. 496–498. IEEE (2019)

    Google Scholar 

  5. Karun, A.K., Chitharanjan, K.: A review on hadoop-hdfs infrastructure extensions. In: 2013 IEEE Conference on Information & Communication Technologies, pp. 132–137. IEEE (2013)

    Google Scholar 

  6. Li, X., Da, X., L.: A review of internet of things-resource allocation. IEEE Internet Things J. 8(11), 8657–8666 (2020)

    Google Scholar 

  7. Naik, N.S., Negi, A., Br, T.B., Anitha, R.: A data locality based scheduler to enhance mapreduce performance in heterogeneous environments. Futur. Gener. Comput. Syst. 90, 423–434 (2019)

    Article  Google Scholar 

  8. Paik, S.S., Goswami, R.S., Roy, D., Reddy, K.H.: Intelligent data placement in heterogeneous hadoop cluster. In: International Conference on Next Generation Computing Technologies, pp. 568–579. Springer (2017). https://doi.org/10.1007/978-981-10-8657-1_43

  9. Postoaca, A.V., Pop, F., Prodan, R.: h-fair: asymptotic scheduling of heavy workloads in heterogeneous data centers. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 366–369. IEEE (2018)

    Google Scholar 

  10. Saaty, T.L.: Decision making for leaders: the analytic hierarchy process for decisions in a complex world. RWS publications (1990)

    Google Scholar 

  11. Syakur, M., Khotimah, B., Rochman, E., Satoto, B.D.: Integration k-means clustering method and elbow method for identification of the best customer profile cluster. In: IOP Conference Series: Materials Science and Engineering, vol. 336, p. 012017. IOP Publishing (2018)

    Google Scholar 

  12. Thu, M.P., Nwe, K.M., Aye, K.N.: Replication based on data locality for hadoop distributed file system. 9th International Workshop on Computer Science and Engineering, WCSE 2019 (2019)

    Google Scholar 

  13. Wang, M., Wu, C.Q., Cao, H., Liu, Y., Wang, Y., Hou, A.: On mapreduce scheduling in hadoop yarn on heterogeneous clusters. In: 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science And Engineering (TrustCom/BigDataSE), pp. 1747–1754. IEEE (2018)

    Google Scholar 

  14. Yan, W., Li, C., Du, S., Mao, X.: An optimization algorithm for heterogeneous hadoop clusters based on dynamic load balancing. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 250–255. IEEE (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emna Hosni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hosni, E., chaari, W., Kolsi, N., Ghedira, K. (2022). Effective Resource Utilization in Heterogeneous Hadoop Environment Through a Dynamic Inter-cluster and Intra-cluster Load Balancing. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol 13758. Springer, Cham. https://doi.org/10.1007/978-3-031-21967-2_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21967-2_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21966-5

  • Online ISBN: 978-3-031-21967-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics