Online MapReduce scheduling problem of minimizing the makespan

Chen, Cong; Xu, Yinfeng; Zhu, Yuqing; Sun, Chengyu

doi:10.1007/s10878-015-9982-7

Online MapReduce scheduling problem of minimizing the makespan

Published: 21 December 2015

Volume 33, pages 590–608, (2017)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Cong Chen¹,
Yinfeng Xu^1,2,
Yuqing Zhu³ &
…
Chengyu Sun³

626 Accesses
10 Citations
Explore all metrics

Abstract

MapReduce system is a popular big data processing framework, and the performance of it is closely related to the efficiency of the centralized scheduler. In practice, the centralized scheduler often has little information in advance, which means each job may be known only after being released. In this paper, hence, we consider the online MapReduce scheduling problem of minimizing the makespan, where jobs are released over time. Both preemptive and non-preemptive version of the problem are considered. In addition, we assume that reduce tasks cannot be parallelized because they are often complex and hard to be decomposed. For the non-preemptive version, we prove the lower bound is $\frac{m+m(\Psi (m)-\Psi (k))}{k+m(\Psi (m)-\Psi (k))}$, higher than the basic online machine scheduling problem, where k is the root of the equation $k=\big \lfloor {\frac{m-k}{1+\Psi (m)-\Psi (k)}+1 }\big \rfloor $ and m is the quantity of machines. Then we devise an $(2-\frac{1}{m})$-competitive online algorithm called MF-LPT (Map First-Longest Processing Time) based on the LPT. For the preemptive version, we present a 1-competitive algorithm for two machines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal online algorithms for MapReduce scheduling on two uniform machines

Article 11 January 2019

Online makespan minimization in MapReduce-like systems with complex reduce tasks

Article 31 May 2015

Online MapReduce processing on two identical parallel machines

Article 20 August 2017

References

Chang H, Kodialam M, Kompella R, Lakshman T, Lee M, Mukherjee S (2011) Scheduling in mapreduce-like systems for fast completion time. In: INFOCOM, 2011 Proceedings IEEE, pp 3074–3082. doi:10.1109/INFCOM.2011.5935152
Chen B, Vestjens A (1997) Scheduling on identical machines: how good is lpt in an on-line setting? Oper Res Lett 21(4):165–169. doi:10.1016/S0167-6377(97)00040-0
Article MathSciNet MATH Google Scholar
Chen F, Kodialam M, Lakshman TV (2012) Joint scheduling of processing and shuffle phases in mapreduce systems. In: INFOCOM, 2012 Proceedings IEEE, pp 1143–1151. doi: 10.1109/INFCOM.2012.6195473
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492
Article Google Scholar
DeTemple DW (1993) A quicker convergence to euler’s constant. Am Math Mon 100(5):468–470
Article MathSciNet MATH Google Scholar
Fiat A, Woeginger G (1998) Competitive analysis of algorithms. In: Fiat A, Woeginger G (eds) Online algorithms, lecture notes in computer science, vol 1442. Springer, Berlin, pp 1–12. doi:10.1007/BFb0029562
Google Scholar
Guo S, Kang L (2013) Online scheduling of parallel jobs with preemption on two identical machines. Oper Res Lett 41(2):207–209. doi:10.1016/j.orl.2013.01.002
Article MathSciNet MATH Google Scholar
Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A (2009) Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, ACM, pp 261–276. doi:10.1145/1629575.1629601
Luo T, Zhu Y, Wu W, Xu Y, Du DZ (2015) Online makespan minimization in mapreduce-like systems with complex reduce tasks. Optim Lett. doi:10.1007/s11590-015-0902-7
Google Scholar
Moseley B, Dasgupta A, Kumar R, Sarlós T (2011) On scheduling in map-reduce and flow-shops. In: Proceedings of the twenty-third annual ACM symposium on parallelism in algorithms and architectures, ACM, SPAA ’11, pp 289–298. doi:10.1145/1989493.1989540
Olver FW (2010) NIST handbook of mathematical functions. Cambridge University Press, Cambridge
MATH Google Scholar
Pinedo M (2012) Parallel machine models (deterministic). In: Scheduling, Springer US, pp 111–149. doi:10.1007/978-1-4614-2361-4_5
Sandholm T, Lai K (2009) Mapreduce optimization using regulated dynamic prioritization. SIGMETRICS Perform Eval Rev 37(1):299–310. doi:10.1145/2492101.1555384
Google Scholar
Tan J, Meng X, Zhang L (2012) Performance analysis of coupling scheduler for mapreduce/hadoop. In: INFOCOM, 2012 Proceedings IEEE, pp 2586–2590. doi:10.1109/INFCOM.2012.6195658
Wang W, Zhu K, Ying L, Tan J, Zhang L (2013) Map task scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality. In: INFOCOM, 2013 Proceedings IEEE, pp 1609–1617. doi:10.1109/INFCOM.2013.6566957
Yuan Y, Wang D, Liu J (2014) Joint scheduling of mapreduce jobs with servers: performance bounds and experiments. In: INFOCOM, 2014 Proceedings IEEE, pp 2175–2183. doi:10.1109/INFOCOM.2014.6848160
Zaharia M, Konwinski A, Joseph A, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI 08)
Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on Computer systems, ACM, pp 265–278. doi:10.1145/1755913.1755940
Zheng Y, Shroff N, Sinha P (2013) A new analytical technique for designing provably efficient mapreduce schedulers. In: INFOCOM, 2013 Proceedings IEEE, pp 1600–1608. doi:10.1109/INFCOM.2013.6566956
Zhu Y, Jiang Y, Wu W, Ding L, Teredesai A, Li D, Lee W (2014) Minimizing makespan and total completion time in mapreduce-like systems. In: INFOCOM, 2014 Proceedings IEEE, pp 2166–2174. doi:10.1109/INFOCOM.2014.6848159

Download references

Acknowledgments

The authors sincerely thank Sisi Zhao, the editor and the two referees for their comments and suggestions, which have improved this paper considerably. This work was partially supported by the National Natural Science Foundation of China (Grant 61221063), the Program for Changjiang Scholars and Innovative Research Team in University (IRT1173), and China Postdoctoral Science Foundation (2015T81040).

Author information

Authors and Affiliations

School of Management, Xi’an Jiaotong University, Xi’an, 710049, China
Cong Chen & Yinfeng Xu
The State Key Lab for Manufacturing Systems Engineering, Xi’an, 710049, China
Yinfeng Xu
Department of Computer Science, California State University, Los Angeles, CA, 90032, USA
Yuqing Zhu & Chengyu Sun

Authors

Cong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yinfeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuqing Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Chengyu Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cong Chen.

Appendices

Appendix 1: The solution of $\rho ^*$

To make (3.5) easier to solve, we transform it into:

$$\begin{aligned} \min \quad \rho =\max \{b_1,b_2,\ldots ,b_m\} \end{aligned}$$

Substituting (7.1b) and (7.1c) into (7.1a), we have

$$\begin{aligned} \sum _{i=1}^m x_i = \frac{b_1}{m}+\sum _{i=1}^{m-1}\frac{(b_{m-i+1}-1)(1-i\cdot s_1)}{m-i} +(m-1)s_1 = \sum _{i=1}^m s_i+1 \end{aligned}$$

(7.2)

Noticing that, in (7.2), $\frac{\partial b_1}{\partial s_i}=m>0 ~ (\forall i\in \{2,3,\ldots ,m\}) $ and $\frac{\partial b_j}{\partial s_i}=\frac{j-1}{1-(m-j+1)s_1}>0 ~ (\forall i,j\in \{2,3,\ldots ,m\})$, we have that $\forall j\in \{1,2,\ldots ,m\}$, $b_j$ is monotonically increasing in $s_i$ for any $i\in \{2,3,\ldots ,m\}$. Thus, to minimize the objective function $\rho =\max \{b_1,b_2,\ldots ,b_m\}$, the $s_i$ must be as small as possible. As well as (7.1d), we conclude that $s_1=s_2=\cdots =s_m$. So we can simplify (7.2) into the following equation

$$\begin{aligned} \frac{b_1}{m}+\sum _{i=1}^{m-1}\frac{(b_{m-i+1}-1)(1-i\cdot s_1)}{m-i}-s_1 = 1 \end{aligned}$$

(7.3)

For (7.3), we have that

$$\begin{aligned} \left\{ \! \begin{aligned} \frac{{\partial {b_1}}}{{\partial {s_1}}}&= m\left( {1 + \sum \limits _{i = 1}^{m - 1} {\frac{{\left( {{b_{m + 1 - i}} - 1} \right) \cdot i}}{{\left( {m - i} \right) }}} } \right) > 0\\ \frac{{\partial {b_j}}}{{\partial {s_1}}}&= \frac{{\left( {j - 1} \right) \left( {1 + \sum \nolimits _{i = 1}^{m - 1} {\frac{{\left( {{b_{m + 1 - i}} - 1} \right) \cdot i}}{{\left( {m - i} \right) }}} } \right) }}{{1 - \left( {m - j + 1} \right) {s_1}}} > 0,\quad j = 2,3,\ldots ,m \end{aligned} \right. \end{aligned}$$

Since $\forall j\in \{1,2,\ldots ,m\}$, $b_j$ is monotonically increasing in $s_1$, to minimize the objective function $\rho =\max \{b_1,b_2,\ldots ,b_m\}$, the $s_1$ must be as small as possible, i.e. $s_1=0$. As well as (7.1e), we can simplify (7.3) into the following

$$\begin{aligned}&\frac{{{b_1}}}{m} + \frac{{{b_2} - 1}}{1} + \frac{{{b_3} - 1}}{2} + \dots + \frac{{{b_m} - 1}}{{m - 1}} = 1 \end{aligned}$$

(7.4)

$$\begin{aligned}&\frac{{{b_1}}}{m} \ge \frac{{{b_2} - 1}}{1} \ge \frac{{{b_3} - 1}}{2} \ge \dots \ge \frac{{{b_m} - 1}}{{m - 1}} \end{aligned}$$

(7.5)

Suppose the number $p\in \{1,2,\ldots ,m\}$ such that $b_p=\max \{b_1,b_2,\ldots ,b_m\}$. For (7.4), $\forall j \in \{1,2,\ldots ,p-1,p+1,\ldots ,m\}$, $\frac{{\partial {b_p}}}{{\partial {b_j}}} < 0$ obviously. Hence, $\forall j\in \{1,2,\ldots ,m\}$, $b_p$ is monotonically increasing in $b_j$. To minimizie $b_p$, $b_j$ must be as large as possible with the constraints $b_j\le b_p$ and (7.5).

Firstly, For $b_1$, it should be equal to $b_p$. Then, if we temporarily ignore the first inequation in (7.5), $\forall j\in \{2,3,\ldots ,m\}$, $b_j$ can freely increase to $b_p$. But taking the first inequation into account may lead to some $b_j$ can not reach $b_p$. We denote the last $b_j$ who can not reach $b_p$ by $b_k$, i.e. $\{b_2,b_3,\ldots ,b_k\}$ cannot increase to $b_p$. To minimizie $b_p$, $\forall i\in \{2,3,\ldots ,k\}$, $b_i$ can increase to $\frac{i-1}{m}b_p+1$ so that $\frac{b_p}{m}=\frac{b_i-1}{i-1}$. Lastly, $\{b_{k+1},b_{k+2},\ldots ,b_m\}$ can increase to $b_p$. Eventually, after each $b_j$ become as large as possible, (7.4) turn into

$$\begin{aligned} \begin{array}{l} \underbrace{\frac{{{b_p}}}{m} + \frac{{{b_p}}}{m} + \cdots +\frac{{{b_p}}}{m}}_k + \underbrace{\frac{{{b_p} - 1}}{{{k}}} + \frac{{{b_p} - 1}}{{{k} + 1}} + \cdots + \frac{{{b_p} - 1}}{{m - 1}}}_{m - k} \\ = \displaystyle {k} \cdot \frac{{{b_p}}}{m} + \left( {{b_p} - 1} \right) {\sum _{{k}}^{m - 1} {\frac{1}{i}} } = 1 \end{array} \end{aligned}$$

(7.6)

where k satisfy

$$\begin{aligned} \left\{ \! \begin{aligned} \frac{{{b_p} - 1}}{{k - 1}}&\ge \textstyle \frac{{{b_p}}}{m}\\ \frac{{{b_p} - 1}}{k}&< \textstyle \frac{{{b_p}}}{m}\\ \end{aligned} \right. \end{aligned}$$

(7.7)

From (7.6) and (7.7), we have:

$$\begin{aligned} \rho ^* =b_p = \frac{m+m(\Psi (m)-\Psi (k))}{k+m(\Psi (m)-\Psi (k))} \end{aligned}$$

where $\Psi (n+1)=\sum _{i=1}^n \frac{1}{i}-\gamma $, and k is the root of $k =\Big \lfloor {\frac{m-k}{1+\Psi (m)-\Psi (k)}+1 }\Big \rfloor $.

Appendix 2: The proof of $\rho ^*<\frac{3}{2}$

To prove $\rho ^*=\frac{m+m(\Psi (m)-\Psi (k))}{k+m(\Psi (m)-\Psi (k))}<\frac{3}{2}$, we just need to prove

$$\begin{aligned} \frac{{3k}}{m} + \Psi (m)-\Psi (k) > 2 \end{aligned}$$

(8.1)

Since we can hardly know the exact value of $\frac{{3k}}{m} + \Psi (m)-\Psi (k)$ in (8.1), we have to use inequation to bound it. DeTemple (1993) propose that

$$\begin{aligned} \frac{1}{24n^2} < \Psi (n)-\ln \left( n-\frac{1}{2}\right) < \frac{1}{24(n-1)^2} \end{aligned}$$

(8.2)

Substituting (8.2) into (8.1), we have

$$\begin{aligned} \frac{3k}{m} + \Psi (m)-\Psi (k) > \frac{{3k}}{m} + \frac{1}{{24{m^2}}} - \frac{1}{{24{{\left( {k - 1} \right) }^2}}} + \ln \left( \frac{m - \frac{1}{2}}{k - \frac{1}{2}} \right) \end{aligned}$$

After calculation, for $k \ge 2$, we have

$$\begin{aligned} \frac{{3k}}{m} + \frac{1}{{24{m^2}}} - \frac{1}{{24{{\left( {k - 1} \right) }^2}}} + \ln \left( \frac{m - \frac{1}{2}}{k - \frac{1}{2}} \right) > 2 \end{aligned}$$

For $k=1$, $\frac{{3k}}{m} + \Psi (m)-\Psi (k)=\frac{{3}}{m} + \Psi (m) + \gamma $, which is increasing in m when $m\ge 2$. Hence, $\frac{{3}}{m} + \Psi (m) + \gamma \ge \frac{3}{2}+\Psi (2)+ \gamma =\frac{5}{2}>2$. From the above we can draw a conclusion that $\rho ^*<\frac{3}{2}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, C., Xu, Y., Zhu, Y. et al. Online MapReduce scheduling problem of minimizing the makespan. J Comb Optim 33, 590–608 (2017). https://doi.org/10.1007/s10878-015-9982-7

Download citation

Published: 21 December 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10878-015-9982-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online MapReduce scheduling problem of minimizing the makespan

Abstract

Access this article

Similar content being viewed by others

Optimal online algorithms for MapReduce scheduling on two uniform machines

Online makespan minimization in MapReduce-like systems with complex reduce tasks

Online MapReduce processing on two identical parallel machines

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: The solution of \(\rho ^*\)

Appendix 2: The proof of \(\rho ^*<\frac{3}{2}\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online MapReduce scheduling problem of minimizing the makespan

Abstract

Access this article

Similar content being viewed by others

Optimal online algorithms for MapReduce scheduling on two uniform machines

Online makespan minimization in MapReduce-like systems with complex reduce tasks

Online MapReduce processing on two identical parallel machines

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: The solution of \(\rho ^*\)

Appendix 2: The proof of \(\rho ^*<\frac{3}{2}\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation