Abstract
The current works about MapReduce task scheduling with deadline constraints neither take the differences of Map and Reduce task, nor the cluster’s heterogeneity into account. This paper proposes an extensional MapReduce Task Scheduling algorithm for Deadline constraints in Hadoop platform: MTSD. It allows user specify a job’s deadline and tries to make the job be finished before the deadline. Through measuring the node’s computing capacity, a node classification algorithm is proposed in MTSD. This algorithm classifies the nodes into several levels in heterogeneous clusters. Under this algorithm, we firstly illuminate a novel data distribution model which distributes data according to the node’s capacity level respectively. The experiments show that the node classification algorithm can improved data locality observably to compare with default scheduler and it also can improve other scheduler’s locality. Secondly, we calculate the task’s average completion time which is based on the node level. It improves the precision of task’s remaining time evaluation. Finally, MTSD provides a mechanism to decide which job’s task should be scheduled by calculating the Map and Reduce task slot requirements.
Similar content being viewed by others
References
Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y.Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. http://www.cs.standford.edu/peop;e/ang//papers/nips06-mapreducemulticoure.pdf (2006). Accessed 1 March 2012
Ekanayake, J., Pallickara, S., Fox, G.: MapReduce for data intensive scientific analyses. In: Proceedings of the 2008 IEEE Fourth International Conference on eScience, pp. 277–284 (2008). doi:10.1109/eScience.2008.59
Mackey, G., Sehrish, S., Bent, J., Lopez, J., Habib, S., Wang, J.: Introducing map-reduce to high end computing. In: Proceedings of the 2008 3rd Patascale Data Storage Workshop, pp. 1–6 (2008). doi:10.1109/PDSW.2008.4811889
Matsunaga, A., Tsugawa, M., Fortes, J.: CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: Proceedings of 4th IEEE International Conference on eScience, pp. 222–229 (2008). doi:10.1109/eScience.2008.62
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, vol. 41, pp. 59–72 (2007). doi:10.1145/1272996.1273005
Han, H., Jung, H., Eom, H., Yeom, H.Y.: Scatter-Gather-Merge: an efficient star-join query processing algorithm for data-parallel frameworks. Clust. Comput. 14(2), 183–197 (2010). doi:10.1007/s10586-010-01445
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). doi:10.1145/1327452.1327492
He, B., Luo, Q., Govindaraju, N.K.: Mars: accelerating MapReduce with graphics processors. IEEE Trans. Parallel Distrib. Syst. 22(4), 608–620 (2011). doi:10.1109/TPDS.2010.158
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multi-processor systems. In: IEEE 13th International Symposium on High Performance Computer Architecture, pp. 13–24 (2007). doi:10.1109/HPCA.2007.346181
The Apache Software Foundation: Hadoop (2012). http://hadoop.apache.org. Accessed 1 March 2012
Palit, I., Reddy, C.K.: Scalable and parallel boosting with MapReduce. IEEE Trans. Knowl. Data Eng. 24, 1904–1916 (2012). doi:10.1109/TKDE.2011.208
Verma, A., Cho, B.: Breaking the MapReduce stage barrier. In: 2010 IEEE International Conference on Cluster Computing, pp. 235–244 (2010). doi:10.1109/CLUSTER.2010.29
Seo, S., Jang, I., Woo, K., Kim, I., Kim, J.-S., Maeng, S.: HPMR: prefetching and pre-shuffling in shared MapReduce computation environment. In: IEEE International Conference on Cluster Computing and Workshops, pp. 1–8 (2009). doi:10.1109/CLUSTR.2009.5289171
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pp. 29–42 (2008)
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job scheduling for multi-user MapReduce clusters. (2009). http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.pdf. Accessed 1 March 2012
http://hadoop.apache.org/mapreduce/docs/r0.21.0/capacity_scheduler.html (2011). Accessed 1 March 2012
Alexandraki, A., Paterakis, M.: Performance evaluation of the deadline credit scheduling algorithm for soft-real-time applications in distributed video-on-demand systems. Clust. Comput. 8(1), 61–75 (2005). doi:10.1007/s10586-004-4437-4
Sandholm, T., Lai, K.: Dynamic proportional share scheduling in hadoop. In: Proceedings of the 15th Workshop on Job Scheduling Strategies for Parallel Processing, pp. 110–131 (2010)
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278 (2010)
Xie, J., Yin, S., Ruan, X.J., Ding, Z.Y., Tian, Y.: Improving MapReduce performance through data placement in heterogeneous hadoop clusters. In: IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhdForum, pp. 1–9 (2010). doi:10.1109/IPDPSW.2010.5470880
Zhang, X.H., Zhong, Z.Y., Feng, S.Z., Tu, B.B., Fan, J.P.: Improving data locality of MapReduce by scheduling in homogeneous computing environments. In: IEEE 9th International Symposium on Parallel and Distributed Processing with Applications, pp. 120–126 (2011). doi:10.1109/ISPA.2011.14
Aboulnaga, A., Wang, Z., Zhang, Z.Y.: Packing the most onto your cloud. In: Proceedings of the First International Workshop on Cloud Data Management, pp. 25–28 (2009). doi:10.1145/1651263.1651268
Morton, K., Balazinska, M., Grossman, D.: Paratimer: a progress indicator for mapreduce DAGs. In: Proceedings of the 2010 International Conference on Management of Data, pp. 507–518 (2010). doi:10.1145/1807167.1807223
Kc, K., Anyanwu, K.: Scheduling hadoop jobs to meet deadlines. In: IEEE Second International Conference on Cloud Computing Technology and Science, pp. 388–392 (2010). doi:10.1109/CloudCom.2010.97
Polo, J., Carrera, D., Becerra, Y., Torres, J., Ayguade, E., Steinder, M., Whalley, I.: Performance-driven task co-scheduling for MapReduce environments. In: IEEE Proceedings of Network Operations and Management Symposium, pp. 373–380 (2010). doi:10.1109/NOMS.2010.5488494
Phan, L.T.X., Zhang, Z., Loo, B.T., Lee, I.: Real-time MapReduce scheduling. http://repository.upenn.edu/cgi/viewcontent.cgi?article=1988&context=cis_reports (2010). Accessed 1 March 2012
Qin, X., Jiang, H., Manzanares, A., Ruan, X., Yin, S.: Dynamic load balancing for IO-intensive applications on clusters. ACM Trans. Storage 5(3), 1–38 (2009). doi:10.1145/1629075.1629078
Herodotou, H.: Hadoop Performance Models (2011). http://www.cs.duke.edu/starfish/files/hadoop-models.pdf. Accessed 1 March 2012
Dong, X., Wang, Y., Liao, H.: Scheduling mixed real-time and non-real-time applications in MapReduce Environment. In: IEEE 17th International Conference on Parallel and Distributed Systems, pp. 9–16 (2011). doi:10.1109/ICPADS.2011.115
Acknowledgements
This work is supported by the National Natural Science Foundation of China (61103047, 61173170), National Post doctor Science Foundation of China (20100480936).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tang, Z., Zhou, J., Li, K. et al. A MapReduce task scheduling algorithm for deadline constraints. Cluster Comput 16, 651–662 (2013). https://doi.org/10.1007/s10586-012-0236-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-012-0236-5