Abstract
MapReduce is a parallel programming system to deal with massive data. It can automatically parallelize MapReduce jobs into multiple tasks, schedule to a cluster built by PCs. This paper describes a data distribution aware MapReduce task scheduling strategy. When worker nodes requests for tasks, it will compute and obtain nodes’ priority according to the times for request, the number of tasks which can be executed locally and so on. Meanwhile, it can also calculate tasks’ priority according to the numbers of copies executed by the task, latency time of tasks and so on. This strategy is based on node and task’s scheduling priority, fully considers data distribution in the system and thus schedules Map tasks to nodes having data in high probability, to reduce network overhead and improve system efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004, 6th Symposiumon Operating Systems Design and Implementation, Sponsored by USENIX, incooperation with ACM SIGOPS, pp. 137–150 (2004)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles, pp. 20–43 (2003)
Hadoop opensource project, http://hadoop.apache.org/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, L., Sun, H., Luo, Z. (2009). A Data Distribution Aware Task Scheduling Strategy for MapReduce System. In: Jaatun, M.G., Zhao, G., Rong, C. (eds) Cloud Computing. CloudCom 2009. Lecture Notes in Computer Science, vol 5931. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10665-1_74
Download citation
DOI: https://doi.org/10.1007/978-3-642-10665-1_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10664-4
Online ISBN: 978-3-642-10665-1
eBook Packages: Computer ScienceComputer Science (R0)