Skip to main content

Advertisement

Log in

Historical data based approach for straggler avoidance in a heterogeneous Hadoop cluster

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Cloud computing has emerged as a new way of sharing resources. MapReduce has become the de facto standard for cloud computing, which helps for data-intensive computation in parallel. Hadoop is an open-source framework that allows the implementation of MapReduce on the cluster of commodity hardware. An environment with different generations of commodity hardware (node) raises heterogeneity in the Hadoop environment. Today heterogeneity has become common in industries as well as in research centers. Hadoop’s current implementation assumes that nodes in the environment are homogeneous and distribute the workload evenly among these nodes. This homogeneity assumption creates a load imbalance among the nodes in the heterogeneous Hadoop environment, which furthers leads to stragglers. Stragglers are the nodes that are available in the environment, but their performance is abysmal. The paper proposed a Historical data based data placement (HDBDP) policy to balance the workload among heterogeneous nodes based on their computing capabilities to improve the Map tasks data locality and to reduce the job turnaround time in the heterogeneous Hadoop environment. The approach introduces an agent to measures the node computing capabilities using the job history information. It also helps NameNode to decide the block counts for each node in the environment. The proposed policy’s performance is evaluated on Hadoop’s most popular benchmark, i.e., HiBench benchmark suite. Finally, compared to the Hadoop’s default data placement policy and different policies, the proposed HDBDP policy minimizes the job turnaround time for several workloads by an average of 14–26%. Also, it improves the Map tasks data locality by nearly 27% in a heterogeneous Hadoop environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamalakant Laxman Bawankule.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bawankule, K.L., Dewang, R.K. & Singh, A.K. Historical data based approach for straggler avoidance in a heterogeneous Hadoop cluster. J Ambient Intell Human Comput 12, 9573–9589 (2021). https://doi.org/10.1007/s12652-020-02699-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02699-0

Keywords

Navigation