A MapReduce task scheduling algorithm for deadline constraints

Tang, Zhuo; Zhou, Junqing; Li, Kenli; Li, Ruixuan

doi:10.1007/s10586-012-0236-5

A MapReduce task scheduling algorithm for deadline constraints

Published: 12 December 2012

Volume 16, pages 651–662, (2013)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Zhuo Tang¹,
Junqing Zhou¹,
Kenli Li¹ &
…
Ruixuan Li²

1430 Accesses
46 Citations
Explore all metrics

Abstract

The current works about MapReduce task scheduling with deadline constraints neither take the differences of Map and Reduce task, nor the cluster’s heterogeneity into account. This paper proposes an extensional MapReduce Task Scheduling algorithm for Deadline constraints in Hadoop platform: MTSD. It allows user specify a job’s deadline and tries to make the job be finished before the deadline. Through measuring the node’s computing capacity, a node classification algorithm is proposed in MTSD. This algorithm classifies the nodes into several levels in heterogeneous clusters. Under this algorithm, we firstly illuminate a novel data distribution model which distributes data according to the node’s capacity level respectively. The experiments show that the node classification algorithm can improved data locality observably to compare with default scheduler and it also can improve other scheduler’s locality. Secondly, we calculate the task’s average completion time which is based on the node level. It improves the precision of task’s remaining time evaluation. Finally, MTSD provides a mechanism to decide which job’s task should be scheduled by calculating the Map and Reduce task slot requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Article Open access 17 April 2024

The big data system, components, tools, and technologies: a survey

Article 18 September 2018

MTFP: matrix-based task-fog pairing method for task scheduling in fog computing

Article 15 April 2024

References

Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y.Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. http://www.cs.standford.edu/peop;e/ang//papers/nips06-mapreducemulticoure.pdf (2006). Accessed 1 March 2012
Ekanayake, J., Pallickara, S., Fox, G.: MapReduce for data intensive scientific analyses. In: Proceedings of the 2008 IEEE Fourth International Conference on eScience, pp. 277–284 (2008). doi:10.1109/eScience.2008.59
Chapter Google Scholar
Mackey, G., Sehrish, S., Bent, J., Lopez, J., Habib, S., Wang, J.: Introducing map-reduce to high end computing. In: Proceedings of the 2008 3rd Patascale Data Storage Workshop, pp. 1–6 (2008). doi:10.1109/PDSW.2008.4811889
Chapter Google Scholar
Matsunaga, A., Tsugawa, M., Fortes, J.: CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: Proceedings of 4th IEEE International Conference on eScience, pp. 222–229 (2008). doi:10.1109/eScience.2008.62
Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, vol. 41, pp. 59–72 (2007). doi:10.1145/1272996.1273005
Google Scholar
Han, H., Jung, H., Eom, H., Yeom, H.Y.: Scatter-Gather-Merge: an efficient star-join query processing algorithm for data-parallel frameworks. Clust. Comput. 14(2), 183–197 (2010). doi:10.1007/s10586-010-01445
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). doi:10.1145/1327452.1327492
Article Google Scholar
He, B., Luo, Q., Govindaraju, N.K.: Mars: accelerating MapReduce with graphics processors. IEEE Trans. Parallel Distrib. Syst. 22(4), 608–620 (2011). doi:10.1109/TPDS.2010.158
Article Google Scholar
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multi-processor systems. In: IEEE 13th International Symposium on High Performance Computer Architecture, pp. 13–24 (2007). doi:10.1109/HPCA.2007.346181
Google Scholar
The Apache Software Foundation: Hadoop (2012). http://hadoop.apache.org. Accessed 1 March 2012
Palit, I., Reddy, C.K.: Scalable and parallel boosting with MapReduce. IEEE Trans. Knowl. Data Eng. 24, 1904–1916 (2012). doi:10.1109/TKDE.2011.208
Article Google Scholar
Verma, A., Cho, B.: Breaking the MapReduce stage barrier. In: 2010 IEEE International Conference on Cluster Computing, pp. 235–244 (2010). doi:10.1109/CLUSTER.2010.29
Chapter Google Scholar
Seo, S., Jang, I., Woo, K., Kim, I., Kim, J.-S., Maeng, S.: HPMR: prefetching and pre-shuffling in shared MapReduce computation environment. In: IEEE International Conference on Cluster Computing and Workshops, pp. 1–8 (2009). doi:10.1109/CLUSTR.2009.5289171
Chapter Google Scholar
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pp. 29–42 (2008)
Google Scholar
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job scheduling for multi-user MapReduce clusters. (2009). http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.pdf. Accessed 1 March 2012
http://hadoop.apache.org/mapreduce/docs/r0.21.0/capacity_scheduler.html (2011). Accessed 1 March 2012
Alexandraki, A., Paterakis, M.: Performance evaluation of the deadline credit scheduling algorithm for soft-real-time applications in distributed video-on-demand systems. Clust. Comput. 8(1), 61–75 (2005). doi:10.1007/s10586-004-4437-4
Article Google Scholar
Sandholm, T., Lai, K.: Dynamic proportional share scheduling in hadoop. In: Proceedings of the 15th Workshop on Job Scheduling Strategies for Parallel Processing, pp. 110–131 (2010)
Chapter Google Scholar
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278 (2010)
Google Scholar
Xie, J., Yin, S., Ruan, X.J., Ding, Z.Y., Tian, Y.: Improving MapReduce performance through data placement in heterogeneous hadoop clusters. In: IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhdForum, pp. 1–9 (2010). doi:10.1109/IPDPSW.2010.5470880
Google Scholar
Zhang, X.H., Zhong, Z.Y., Feng, S.Z., Tu, B.B., Fan, J.P.: Improving data locality of MapReduce by scheduling in homogeneous computing environments. In: IEEE 9th International Symposium on Parallel and Distributed Processing with Applications, pp. 120–126 (2011). doi:10.1109/ISPA.2011.14
Google Scholar
Aboulnaga, A., Wang, Z., Zhang, Z.Y.: Packing the most onto your cloud. In: Proceedings of the First International Workshop on Cloud Data Management, pp. 25–28 (2009). doi:10.1145/1651263.1651268
Chapter Google Scholar
Morton, K., Balazinska, M., Grossman, D.: Paratimer: a progress indicator for mapreduce DAGs. In: Proceedings of the 2010 International Conference on Management of Data, pp. 507–518 (2010). doi:10.1145/1807167.1807223
Google Scholar
Kc, K., Anyanwu, K.: Scheduling hadoop jobs to meet deadlines. In: IEEE Second International Conference on Cloud Computing Technology and Science, pp. 388–392 (2010). doi:10.1109/CloudCom.2010.97
Google Scholar
Polo, J., Carrera, D., Becerra, Y., Torres, J., Ayguade, E., Steinder, M., Whalley, I.: Performance-driven task co-scheduling for MapReduce environments. In: IEEE Proceedings of Network Operations and Management Symposium, pp. 373–380 (2010). doi:10.1109/NOMS.2010.5488494
Google Scholar
Phan, L.T.X., Zhang, Z., Loo, B.T., Lee, I.: Real-time MapReduce scheduling. http://repository.upenn.edu/cgi/viewcontent.cgi?article=1988&context=cis_reports (2010). Accessed 1 March 2012
Qin, X., Jiang, H., Manzanares, A., Ruan, X., Yin, S.: Dynamic load balancing for IO-intensive applications on clusters. ACM Trans. Storage 5(3), 1–38 (2009). doi:10.1145/1629075.1629078
Article Google Scholar
Herodotou, H.: Hadoop Performance Models (2011). http://www.cs.duke.edu/starfish/files/hadoop-models.pdf. Accessed 1 March 2012
Dong, X., Wang, Y., Liao, H.: Scheduling mixed real-time and non-real-time applications in MapReduce Environment. In: IEEE 17th International Conference on Parallel and Distributed Systems, pp. 9–16 (2011). doi:10.1109/ICPADS.2011.115
Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61103047, 61173170), National Post doctor Science Foundation of China (20100480936).

Author information

Authors and Affiliations

School of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
Zhuo Tang, Junqing Zhou & Kenli Li
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
Ruixuan Li

Authors

Zhuo Tang
View author publications
You can also search for this author in PubMed Google Scholar
Junqing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Kenli Li
View author publications
You can also search for this author in PubMed Google Scholar
Ruixuan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuo Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, Z., Zhou, J., Li, K. et al. A MapReduce task scheduling algorithm for deadline constraints. Cluster Comput 16, 651–662 (2013). https://doi.org/10.1007/s10586-012-0236-5

Download citation

Received: 15 April 2012
Accepted: 15 October 2012
Published: 12 December 2012
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10586-012-0236-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A MapReduce task scheduling algorithm for deadline constraints

Abstract

Access this article

Similar content being viewed by others

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

The big data system, components, tools, and technologies: a survey

MTFP: matrix-based task-fog pairing method for task scheduling in fog computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A MapReduce task scheduling algorithm for deadline constraints

Abstract

Access this article

Similar content being viewed by others

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

The big data system, components, tools, and technologies: a survey

MTFP: matrix-based task-fog pairing method for task scheduling in fog computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation