Skip to main content

A Data Distribution Aware Task Scheduling Strategy for MapReduce System

  • Conference paper
Cloud Computing (CloudCom 2009)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 5931))

Included in the following conference series:

Abstract

MapReduce is a parallel programming system to deal with massive data. It can automatically parallelize MapReduce jobs into multiple tasks, schedule to a cluster built by PCs. This paper describes a data distribution aware MapReduce task scheduling strategy. When worker nodes requests for tasks, it will compute and obtain nodes’ priority according to the times for request, the number of tasks which can be executed locally and so on. Meanwhile, it can also calculate tasks’ priority according to the numbers of copies executed by the task, latency time of tasks and so on. This strategy is based on node and task’s scheduling priority, fully considers data distribution in the system and thus schedules Map tasks to nodes having data in high probability, to reduce network overhead and improve system efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004, 6th Symposiumon Operating Systems Design and Implementation, Sponsored by USENIX, incooperation with ACM SIGOPS, pp. 137–150 (2004)

    Google Scholar 

  2. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles, pp. 20–43 (2003)

    Google Scholar 

  3. Hadoop opensource project, http://hadoop.apache.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, L., Sun, H., Luo, Z. (2009). A Data Distribution Aware Task Scheduling Strategy for MapReduce System. In: Jaatun, M.G., Zhao, G., Rong, C. (eds) Cloud Computing. CloudCom 2009. Lecture Notes in Computer Science, vol 5931. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10665-1_74

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10665-1_74

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10664-4

  • Online ISBN: 978-3-642-10665-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics