Reference Hub2
A Predictive Map Task Scheduler for Optimizing Data Locality in MapReduce Clusters

A Predictive Map Task Scheduler for Optimizing Data Locality in MapReduce Clusters

Mohamed Merabet, Sidi mohamed Benslimane, Mahmoud Barhamgi, Christine Bonnet
Copyright: © 2018 |Volume: 10 |Issue: 4 |Pages: 14
ISSN: 1938-0259|EISSN: 1938-0267|EISBN13: 9781522543398|DOI: 10.4018/IJGHPC.2018100101
Cite Article Cite Article

MLA

Merabet, Mohamed, et al. "A Predictive Map Task Scheduler for Optimizing Data Locality in MapReduce Clusters." IJGHPC vol.10, no.4 2018: pp.1-14. http://doi.org/10.4018/IJGHPC.2018100101

APA

Merabet, M., Benslimane, S. M., Barhamgi, M., & Bonnet, C. (2018). A Predictive Map Task Scheduler for Optimizing Data Locality in MapReduce Clusters. International Journal of Grid and High Performance Computing (IJGHPC), 10(4), 1-14. http://doi.org/10.4018/IJGHPC.2018100101

Chicago

Merabet, Mohamed, et al. "A Predictive Map Task Scheduler for Optimizing Data Locality in MapReduce Clusters," International Journal of Grid and High Performance Computing (IJGHPC) 10, no.4: 1-14. http://doi.org/10.4018/IJGHPC.2018100101

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

This article describes how data locality is becoming one of the most critical factors to affect performance of MapReduce clusters because of network bisection bandwidth becomes a bottleneck. Task scheduler assigns the most appropriate map tasks to nodes. If map tasks are scheduled to nodes without input data, these tasks will issue remote I/O operations to copy the data to local nodes that decrease execution time of map tasks. In that case, prefetching mechanism can be useful to preload the needed input data before tasks is launching. Therefore, the key challenge is how this article can accurately predict the execution time of map tasks to be able to use data prefetching effectively without any data access delay. In this article, it is proposed that a Predictive Map Task Scheduler assigns the most suitable map tasks to nodes ahead of time. Following this, a linear regression model is used for prediction and data locality based algorithm for tasks scheduling. The experimental results show that the method can greatly improve both data locality and execution time of map tasks.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.