ABSTRACT
Current schedulers of MapReduce/Hadoop are quite successful in providing good performance. However improving spaces still exist: map and reduce tasks are not jointly optimized for scheduling, albeit there is a strong dependence between them. This can cause job starvation and bad data locality. We design a resource-aware scheduler for Hadoop, which couples the progresses of mappers and reducers, and jointly optimize the placements for both of them. This mitigates the starvation problem and improves the overall data locality. Our experiments demonstrate improvements to job response times by up to an order of magnitude.
- Fair Scheduler, http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, January 2008. Google ScholarDigital Library
- Hadoop. http://hadoop.apache.org.Google Scholar
- M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Job scheduling for multi-user mapreduce clusters. Technical Report, University of California, Berkeley, April 2009.Google Scholar
Index Terms
- Coupling scheduler for MapReduce/Hadoop
Recommendations
Delay tails in MapReduce scheduling
SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer SystemsMapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job processing times. These phenomena are resultant of the workload features and the adopted scheduling algorithms. Analytically understanding the delays under different ...
TaskTracker aware scheduler with resource availability control for Hadoop MapReduce
Schedulers are playing a vital role in task assignment for Hadoop MapReduce. In some scenario, the default schedulers of Hadoop spawn tasks in TaskTracker without checking the external dependency and may fail. As a result, Hadoop should rerun the tasks in ...
Delay tails in MapReduce scheduling
Performance evaluation reviewMapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job processing times. These phenomena are resultant of the workload features and the adopted scheduling algorithms. Analytically understanding the delays under different ...
Comments