Abstract
MapReduce system is a popular big data processing framework, and the performance of it is closely related to the efficiency of the centralized scheduler. In practice, the centralized scheduler often has little information in advance, which means each job may be known only after being released. In this paper, hence, we consider the online MapReduce scheduling problem of minimizing the makespan, where jobs are released over time. Both preemptive and non-preemptive version of the problem are considered. In addition, we assume that reduce tasks cannot be parallelized because they are often complex and hard to be decomposed. For the non-preemptive version, we prove the lower bound is \(\frac{m+m(\Psi (m)-\Psi (k))}{k+m(\Psi (m)-\Psi (k))}\), higher than the basic online machine scheduling problem, where k is the root of the equation \(k=\big \lfloor {\frac{m-k}{1+\Psi (m)-\Psi (k)}+1 }\big \rfloor \) and m is the quantity of machines. Then we devise an \((2-\frac{1}{m})\)-competitive online algorithm called MF-LPT (Map First-Longest Processing Time) based on the LPT. For the preemptive version, we present a 1-competitive algorithm for two machines.
Similar content being viewed by others
References
Chang H, Kodialam M, Kompella R, Lakshman T, Lee M, Mukherjee S (2011) Scheduling in mapreduce-like systems for fast completion time. In: INFOCOM, 2011 Proceedings IEEE, pp 3074–3082. doi:10.1109/INFCOM.2011.5935152
Chen B, Vestjens A (1997) Scheduling on identical machines: how good is lpt in an on-line setting? Oper Res Lett 21(4):165–169. doi:10.1016/S0167-6377(97)00040-0
Chen F, Kodialam M, Lakshman TV (2012) Joint scheduling of processing and shuffle phases in mapreduce systems. In: INFOCOM, 2012 Proceedings IEEE, pp 1143–1151. doi: 10.1109/INFCOM.2012.6195473
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492
DeTemple DW (1993) A quicker convergence to euler’s constant. Am Math Mon 100(5):468–470
Fiat A, Woeginger G (1998) Competitive analysis of algorithms. In: Fiat A, Woeginger G (eds) Online algorithms, lecture notes in computer science, vol 1442. Springer, Berlin, pp 1–12. doi:10.1007/BFb0029562
Guo S, Kang L (2013) Online scheduling of parallel jobs with preemption on two identical machines. Oper Res Lett 41(2):207–209. doi:10.1016/j.orl.2013.01.002
Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A (2009) Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, ACM, pp 261–276. doi:10.1145/1629575.1629601
Luo T, Zhu Y, Wu W, Xu Y, Du DZ (2015) Online makespan minimization in mapreduce-like systems with complex reduce tasks. Optim Lett. doi:10.1007/s11590-015-0902-7
Moseley B, Dasgupta A, Kumar R, Sarlós T (2011) On scheduling in map-reduce and flow-shops. In: Proceedings of the twenty-third annual ACM symposium on parallelism in algorithms and architectures, ACM, SPAA ’11, pp 289–298. doi:10.1145/1989493.1989540
Olver FW (2010) NIST handbook of mathematical functions. Cambridge University Press, Cambridge
Pinedo M (2012) Parallel machine models (deterministic). In: Scheduling, Springer US, pp 111–149. doi:10.1007/978-1-4614-2361-4_5
Sandholm T, Lai K (2009) Mapreduce optimization using regulated dynamic prioritization. SIGMETRICS Perform Eval Rev 37(1):299–310. doi:10.1145/2492101.1555384
Tan J, Meng X, Zhang L (2012) Performance analysis of coupling scheduler for mapreduce/hadoop. In: INFOCOM, 2012 Proceedings IEEE, pp 2586–2590. doi:10.1109/INFCOM.2012.6195658
Wang W, Zhu K, Ying L, Tan J, Zhang L (2013) Map task scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality. In: INFOCOM, 2013 Proceedings IEEE, pp 1609–1617. doi:10.1109/INFCOM.2013.6566957
Yuan Y, Wang D, Liu J (2014) Joint scheduling of mapreduce jobs with servers: performance bounds and experiments. In: INFOCOM, 2014 Proceedings IEEE, pp 2175–2183. doi:10.1109/INFOCOM.2014.6848160
Zaharia M, Konwinski A, Joseph A, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI 08)
Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on Computer systems, ACM, pp 265–278. doi:10.1145/1755913.1755940
Zheng Y, Shroff N, Sinha P (2013) A new analytical technique for designing provably efficient mapreduce schedulers. In: INFOCOM, 2013 Proceedings IEEE, pp 1600–1608. doi:10.1109/INFCOM.2013.6566956
Zhu Y, Jiang Y, Wu W, Ding L, Teredesai A, Li D, Lee W (2014) Minimizing makespan and total completion time in mapreduce-like systems. In: INFOCOM, 2014 Proceedings IEEE, pp 2166–2174. doi:10.1109/INFOCOM.2014.6848159
Acknowledgments
The authors sincerely thank Sisi Zhao, the editor and the two referees for their comments and suggestions, which have improved this paper considerably. This work was partially supported by the National Natural Science Foundation of China (Grant 61221063), the Program for Changjiang Scholars and Innovative Research Team in University (IRT1173), and China Postdoctoral Science Foundation (2015T81040).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: The solution of \(\rho ^*\)
To make (3.5) easier to solve, we transform it into:
Substituting (7.1b) and (7.1c) into (7.1a), we have
Noticing that, in (7.2), \(\frac{\partial b_1}{\partial s_i}=m>0 ~ (\forall i\in \{2,3,\ldots ,m\}) \) and \(\frac{\partial b_j}{\partial s_i}=\frac{j-1}{1-(m-j+1)s_1}>0 ~ (\forall i,j\in \{2,3,\ldots ,m\})\), we have that \(\forall j\in \{1,2,\ldots ,m\}\), \(b_j\) is monotonically increasing in \(s_i\) for any \(i\in \{2,3,\ldots ,m\}\). Thus, to minimize the objective function \(\rho =\max \{b_1,b_2,\ldots ,b_m\}\), the \(s_i\) must be as small as possible. As well as (7.1d), we conclude that \(s_1=s_2=\cdots =s_m\). So we can simplify (7.2) into the following equation
For (7.3), we have that
Since \(\forall j\in \{1,2,\ldots ,m\}\), \(b_j\) is monotonically increasing in \(s_1\), to minimize the objective function \(\rho =\max \{b_1,b_2,\ldots ,b_m\}\), the \(s_1\) must be as small as possible, i.e. \(s_1=0\). As well as (7.1e), we can simplify (7.3) into the following
Suppose the number \(p\in \{1,2,\ldots ,m\}\) such that \(b_p=\max \{b_1,b_2,\ldots ,b_m\}\). For (7.4), \(\forall j \in \{1,2,\ldots ,p-1,p+1,\ldots ,m\}\), \(\frac{{\partial {b_p}}}{{\partial {b_j}}} < 0\) obviously. Hence, \(\forall j\in \{1,2,\ldots ,m\}\), \(b_p\) is monotonically increasing in \(b_j\). To minimizie \(b_p\), \(b_j\) must be as large as possible with the constraints \(b_j\le b_p\) and (7.5).
Firstly, For \(b_1\), it should be equal to \(b_p\). Then, if we temporarily ignore the first inequation in (7.5), \(\forall j\in \{2,3,\ldots ,m\}\), \(b_j\) can freely increase to \(b_p\). But taking the first inequation into account may lead to some \(b_j\) can not reach \(b_p\). We denote the last \(b_j\) who can not reach \(b_p\) by \(b_k\), i.e. \(\{b_2,b_3,\ldots ,b_k\}\) cannot increase to \(b_p\). To minimizie \(b_p\), \(\forall i\in \{2,3,\ldots ,k\}\), \(b_i\) can increase to \(\frac{i-1}{m}b_p+1\) so that \(\frac{b_p}{m}=\frac{b_i-1}{i-1}\). Lastly, \(\{b_{k+1},b_{k+2},\ldots ,b_m\}\) can increase to \(b_p\). Eventually, after each \(b_j\) become as large as possible, (7.4) turn into
where k satisfy
From (7.6) and (7.7), we have:
where \(\Psi (n+1)=\sum _{i=1}^n \frac{1}{i}-\gamma \), and k is the root of \(k =\Big \lfloor {\frac{m-k}{1+\Psi (m)-\Psi (k)}+1 }\Big \rfloor \).
Appendix 2: The proof of \(\rho ^*<\frac{3}{2}\)
To prove \(\rho ^*=\frac{m+m(\Psi (m)-\Psi (k))}{k+m(\Psi (m)-\Psi (k))}<\frac{3}{2}\), we just need to prove
Since we can hardly know the exact value of \(\frac{{3k}}{m} + \Psi (m)-\Psi (k)\) in (8.1), we have to use inequation to bound it. DeTemple (1993) propose that
Substituting (8.2) into (8.1), we have
After calculation, for \(k \ge 2\), we have
For \(k=1\), \(\frac{{3k}}{m} + \Psi (m)-\Psi (k)=\frac{{3}}{m} + \Psi (m) + \gamma \), which is increasing in m when \(m\ge 2\). Hence, \(\frac{{3}}{m} + \Psi (m) + \gamma \ge \frac{3}{2}+\Psi (2)+ \gamma =\frac{5}{2}>2\). From the above we can draw a conclusion that \(\rho ^*<\frac{3}{2}\).
Rights and permissions
About this article
Cite this article
Chen, C., Xu, Y., Zhu, Y. et al. Online MapReduce scheduling problem of minimizing the makespan. J Comb Optim 33, 590–608 (2017). https://doi.org/10.1007/s10878-015-9982-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-015-9982-7