Skip to main content
Log in

Online MapReduce scheduling problem of minimizing the makespan

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

MapReduce system is a popular big data processing framework, and the performance of it is closely related to the efficiency of the centralized scheduler. In practice, the centralized scheduler often has little information in advance, which means each job may be known only after being released. In this paper, hence, we consider the online MapReduce scheduling problem of minimizing the makespan, where jobs are released over time. Both preemptive and non-preemptive version of the problem are considered. In addition, we assume that reduce tasks cannot be parallelized because they are often complex and hard to be decomposed. For the non-preemptive version, we prove the lower bound is \(\frac{m+m(\Psi (m)-\Psi (k))}{k+m(\Psi (m)-\Psi (k))}\), higher than the basic online machine scheduling problem, where k is the root of the equation \(k=\big \lfloor {\frac{m-k}{1+\Psi (m)-\Psi (k)}+1 }\big \rfloor \) and m is the quantity of machines. Then we devise an \((2-\frac{1}{m})\)-competitive online algorithm called MF-LPT (Map First-Longest Processing Time) based on the LPT. For the preemptive version, we present a 1-competitive algorithm for two machines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Chang H, Kodialam M, Kompella R, Lakshman T, Lee M, Mukherjee S (2011) Scheduling in mapreduce-like systems for fast completion time. In: INFOCOM, 2011 Proceedings IEEE, pp 3074–3082. doi:10.1109/INFCOM.2011.5935152

  • Chen B, Vestjens A (1997) Scheduling on identical machines: how good is lpt in an on-line setting? Oper Res Lett 21(4):165–169. doi:10.1016/S0167-6377(97)00040-0

    Article  MathSciNet  MATH  Google Scholar 

  • Chen F, Kodialam M, Lakshman TV (2012) Joint scheduling of processing and shuffle phases in mapreduce systems. In: INFOCOM, 2012 Proceedings IEEE, pp 1143–1151. doi: 10.1109/INFCOM.2012.6195473

  • Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492

    Article  Google Scholar 

  • DeTemple DW (1993) A quicker convergence to euler’s constant. Am Math Mon 100(5):468–470

    Article  MathSciNet  MATH  Google Scholar 

  • Fiat A, Woeginger G (1998) Competitive analysis of algorithms. In: Fiat A, Woeginger G (eds) Online algorithms, lecture notes in computer science, vol 1442. Springer, Berlin, pp 1–12. doi:10.1007/BFb0029562

    Google Scholar 

  • Guo S, Kang L (2013) Online scheduling of parallel jobs with preemption on two identical machines. Oper Res Lett 41(2):207–209. doi:10.1016/j.orl.2013.01.002

    Article  MathSciNet  MATH  Google Scholar 

  • Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A (2009) Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, ACM, pp 261–276. doi:10.1145/1629575.1629601

  • Luo T, Zhu Y, Wu W, Xu Y, Du DZ (2015) Online makespan minimization in mapreduce-like systems with complex reduce tasks. Optim Lett. doi:10.1007/s11590-015-0902-7

    Google Scholar 

  • Moseley B, Dasgupta A, Kumar R, Sarlós T (2011) On scheduling in map-reduce and flow-shops. In: Proceedings of the twenty-third annual ACM symposium on parallelism in algorithms and architectures, ACM, SPAA ’11, pp 289–298. doi:10.1145/1989493.1989540

  • Olver FW (2010) NIST handbook of mathematical functions. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Pinedo M (2012) Parallel machine models (deterministic). In: Scheduling, Springer US, pp 111–149. doi:10.1007/978-1-4614-2361-4_5

  • Sandholm T, Lai K (2009) Mapreduce optimization using regulated dynamic prioritization. SIGMETRICS Perform Eval Rev 37(1):299–310. doi:10.1145/2492101.1555384

    Google Scholar 

  • Tan J, Meng X, Zhang L (2012) Performance analysis of coupling scheduler for mapreduce/hadoop. In: INFOCOM, 2012 Proceedings IEEE, pp 2586–2590. doi:10.1109/INFCOM.2012.6195658

  • Wang W, Zhu K, Ying L, Tan J, Zhang L (2013) Map task scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality. In: INFOCOM, 2013 Proceedings IEEE, pp 1609–1617. doi:10.1109/INFCOM.2013.6566957

  • Yuan Y, Wang D, Liu J (2014) Joint scheduling of mapreduce jobs with servers: performance bounds and experiments. In: INFOCOM, 2014 Proceedings IEEE, pp 2175–2183. doi:10.1109/INFOCOM.2014.6848160

  • Zaharia M, Konwinski A, Joseph A, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI 08)

  • Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on Computer systems, ACM, pp 265–278. doi:10.1145/1755913.1755940

  • Zheng Y, Shroff N, Sinha P (2013) A new analytical technique for designing provably efficient mapreduce schedulers. In: INFOCOM, 2013 Proceedings IEEE, pp 1600–1608. doi:10.1109/INFCOM.2013.6566956

  • Zhu Y, Jiang Y, Wu W, Ding L, Teredesai A, Li D, Lee W (2014) Minimizing makespan and total completion time in mapreduce-like systems. In: INFOCOM, 2014 Proceedings IEEE, pp 2166–2174. doi:10.1109/INFOCOM.2014.6848159

Download references

Acknowledgments

The authors sincerely thank Sisi Zhao, the editor and the two referees for their comments and suggestions, which have improved this paper considerably. This work was partially supported by the National Natural Science Foundation of China (Grant 61221063), the Program for Changjiang Scholars and Innovative Research Team in University (IRT1173), and China Postdoctoral Science Foundation (2015T81040).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cong Chen.

Appendices

Appendix 1: The solution of \(\rho ^*\)

To make (3.5) easier to solve, we transform it into:

$$\begin{aligned} \min \quad \rho =\max \{b_1,b_2,\ldots ,b_m\} \end{aligned}$$
figure c

Substituting (7.1b) and (7.1c) into (7.1a), we have

$$\begin{aligned} \sum _{i=1}^m x_i = \frac{b_1}{m}+\sum _{i=1}^{m-1}\frac{(b_{m-i+1}-1)(1-i\cdot s_1)}{m-i} +(m-1)s_1 = \sum _{i=1}^m s_i+1 \end{aligned}$$
(7.2)

Noticing that, in (7.2), \(\frac{\partial b_1}{\partial s_i}=m>0 ~ (\forall i\in \{2,3,\ldots ,m\}) \) and \(\frac{\partial b_j}{\partial s_i}=\frac{j-1}{1-(m-j+1)s_1}>0 ~ (\forall i,j\in \{2,3,\ldots ,m\})\), we have that \(\forall j\in \{1,2,\ldots ,m\}\), \(b_j\) is monotonically increasing in \(s_i\) for any \(i\in \{2,3,\ldots ,m\}\). Thus, to minimize the objective function \(\rho =\max \{b_1,b_2,\ldots ,b_m\}\), the \(s_i\) must be as small as possible. As well as (7.1d), we conclude that \(s_1=s_2=\cdots =s_m\). So we can simplify (7.2) into the following equation

$$\begin{aligned} \frac{b_1}{m}+\sum _{i=1}^{m-1}\frac{(b_{m-i+1}-1)(1-i\cdot s_1)}{m-i}-s_1 = 1 \end{aligned}$$
(7.3)

For (7.3), we have that

$$\begin{aligned} \left\{ \! \begin{aligned} \frac{{\partial {b_1}}}{{\partial {s_1}}}&= m\left( {1 + \sum \limits _{i = 1}^{m - 1} {\frac{{\left( {{b_{m + 1 - i}} - 1} \right) \cdot i}}{{\left( {m - i} \right) }}} } \right) > 0\\ \frac{{\partial {b_j}}}{{\partial {s_1}}}&= \frac{{\left( {j - 1} \right) \left( {1 + \sum \nolimits _{i = 1}^{m - 1} {\frac{{\left( {{b_{m + 1 - i}} - 1} \right) \cdot i}}{{\left( {m - i} \right) }}} } \right) }}{{1 - \left( {m - j + 1} \right) {s_1}}} > 0,\quad j = 2,3,\ldots ,m \end{aligned} \right. \end{aligned}$$

Since \(\forall j\in \{1,2,\ldots ,m\}\), \(b_j\) is monotonically increasing in \(s_1\), to minimize the objective function \(\rho =\max \{b_1,b_2,\ldots ,b_m\}\), the \(s_1\) must be as small as possible, i.e. \(s_1=0\). As well as (7.1e), we can simplify (7.3) into the following

$$\begin{aligned}&\frac{{{b_1}}}{m} + \frac{{{b_2} - 1}}{1} + \frac{{{b_3} - 1}}{2} + \dots + \frac{{{b_m} - 1}}{{m - 1}} = 1 \end{aligned}$$
(7.4)
$$\begin{aligned}&\frac{{{b_1}}}{m} \ge \frac{{{b_2} - 1}}{1} \ge \frac{{{b_3} - 1}}{2} \ge \dots \ge \frac{{{b_m} - 1}}{{m - 1}} \end{aligned}$$
(7.5)

Suppose the number \(p\in \{1,2,\ldots ,m\}\) such that \(b_p=\max \{b_1,b_2,\ldots ,b_m\}\). For (7.4), \(\forall j \in \{1,2,\ldots ,p-1,p+1,\ldots ,m\}\), \(\frac{{\partial {b_p}}}{{\partial {b_j}}} < 0\) obviously. Hence, \(\forall j\in \{1,2,\ldots ,m\}\), \(b_p\) is monotonically increasing in \(b_j\). To minimizie \(b_p\), \(b_j\) must be as large as possible with the constraints \(b_j\le b_p\) and (7.5).

Firstly, For \(b_1\), it should be equal to \(b_p\). Then, if we temporarily ignore the first inequation in (7.5), \(\forall j\in \{2,3,\ldots ,m\}\), \(b_j\) can freely increase to \(b_p\). But taking the first inequation into account may lead to some \(b_j\) can not reach \(b_p\). We denote the last \(b_j\) who can not reach \(b_p\) by \(b_k\), i.e. \(\{b_2,b_3,\ldots ,b_k\}\) cannot increase to \(b_p\). To minimizie \(b_p\), \(\forall i\in \{2,3,\ldots ,k\}\), \(b_i\) can increase to \(\frac{i-1}{m}b_p+1\) so that \(\frac{b_p}{m}=\frac{b_i-1}{i-1}\). Lastly, \(\{b_{k+1},b_{k+2},\ldots ,b_m\}\) can increase to \(b_p\). Eventually, after each \(b_j\) become as large as possible, (7.4) turn into

$$\begin{aligned} \begin{array}{l} \underbrace{\frac{{{b_p}}}{m} + \frac{{{b_p}}}{m} + \cdots +\frac{{{b_p}}}{m}}_k + \underbrace{\frac{{{b_p} - 1}}{{{k}}} + \frac{{{b_p} - 1}}{{{k} + 1}} + \cdots + \frac{{{b_p} - 1}}{{m - 1}}}_{m - k} \\ = \displaystyle {k} \cdot \frac{{{b_p}}}{m} + \left( {{b_p} - 1} \right) {\sum _{{k}}^{m - 1} {\frac{1}{i}} } = 1 \end{array} \end{aligned}$$
(7.6)

where k satisfy

$$\begin{aligned} \left\{ \! \begin{aligned} \frac{{{b_p} - 1}}{{k - 1}}&\ge \textstyle \frac{{{b_p}}}{m}\\ \frac{{{b_p} - 1}}{k}&< \textstyle \frac{{{b_p}}}{m}\\ \end{aligned} \right. \end{aligned}$$
(7.7)

From (7.6) and (7.7), we have:

$$\begin{aligned} \rho ^* =b_p = \frac{m+m(\Psi (m)-\Psi (k))}{k+m(\Psi (m)-\Psi (k))} \end{aligned}$$

where \(\Psi (n+1)=\sum _{i=1}^n \frac{1}{i}-\gamma \), and k is the root of \(k =\Big \lfloor {\frac{m-k}{1+\Psi (m)-\Psi (k)}+1 }\Big \rfloor \).

Appendix 2: The proof of \(\rho ^*<\frac{3}{2}\)

To prove \(\rho ^*=\frac{m+m(\Psi (m)-\Psi (k))}{k+m(\Psi (m)-\Psi (k))}<\frac{3}{2}\), we just need to prove

$$\begin{aligned} \frac{{3k}}{m} + \Psi (m)-\Psi (k) > 2 \end{aligned}$$
(8.1)

Since we can hardly know the exact value of \(\frac{{3k}}{m} + \Psi (m)-\Psi (k)\) in (8.1), we have to use inequation to bound it. DeTemple (1993) propose that

$$\begin{aligned} \frac{1}{24n^2} < \Psi (n)-\ln \left( n-\frac{1}{2}\right) < \frac{1}{24(n-1)^2} \end{aligned}$$
(8.2)

Substituting (8.2) into (8.1), we have

$$\begin{aligned} \frac{3k}{m} + \Psi (m)-\Psi (k) > \frac{{3k}}{m} + \frac{1}{{24{m^2}}} - \frac{1}{{24{{\left( {k - 1} \right) }^2}}} + \ln \left( \frac{m - \frac{1}{2}}{k - \frac{1}{2}} \right) \end{aligned}$$

After calculation, for \(k \ge 2\), we have

$$\begin{aligned} \frac{{3k}}{m} + \frac{1}{{24{m^2}}} - \frac{1}{{24{{\left( {k - 1} \right) }^2}}} + \ln \left( \frac{m - \frac{1}{2}}{k - \frac{1}{2}} \right) > 2 \end{aligned}$$

For \(k=1\), \(\frac{{3k}}{m} + \Psi (m)-\Psi (k)=\frac{{3}}{m} + \Psi (m) + \gamma \), which is increasing in m when \(m\ge 2\). Hence, \(\frac{{3}}{m} + \Psi (m) + \gamma \ge \frac{3}{2}+\Psi (2)+ \gamma =\frac{5}{2}>2\). From the above we can draw a conclusion that \(\rho ^*<\frac{3}{2}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Xu, Y., Zhu, Y. et al. Online MapReduce scheduling problem of minimizing the makespan. J Comb Optim 33, 590–608 (2017). https://doi.org/10.1007/s10878-015-9982-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-015-9982-7

Keywords

Navigation