Skip to main content
Log in

On Scheduling Coflows

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Applications designed for data-parallel computation frameworks such as MapReduce usually alternate between computation and communication stages. Coflow scheduling is a recent popular networking abstraction introduced to capture such application-level communication patterns in datacenters. In this framework, a datacenter is modeled as a single non-blocking switch with m input ports and m output ports. A coflow j is a collection of flow demands \(\{d^j_{io}\}_{i \in \{1,\ldots ,m\}, o \in \{1,\ldots ,m\}}\) that is said to be complete once all of its requisite flows have been scheduled. We consider the offline coflow scheduling problem with and without release times to minimize the total weighted completion time. Coflow scheduling generalizes the well studied concurrent open shop scheduling problem and is thus NP-hard. Qiu et al. (in: ACM Symposium on parallelism in algorithms and architectures. ACM, New York, pp 294–303, 2015) obtain the first constant approximation algorithms for this problem via LP rounding and give a deterministic \(\frac{67}{3}\)-approximation and a randomized \((9 + \frac{16\sqrt{2}}{3}) \approx 16.54\)-approximation algorithm. In this paper, we give a combinatorial algorithm that yields a deterministic 5-approximation algorithm for coflow scheduling with release times, and a deterministic 4-approximation for the case without release times. As for concurrent open shop problem with release times, we give a combinatorial 3-approximation algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Apache Software Foundation: Hadoop. https://hadoop.apache.org

  2. Bansal, N., Khot, S.: Inapproximability of hypergraph vertex cover and applications to scheduling problems. In: International Colloquium on Automata, Languages and Programming, pp. 250–261. Springer (2010)

  3. Chen, Z.L., Hall, N.G.: Supply chain scheduling: conflict and cooperation in assembly systems. Oper. Res. 55(6), 1072–1089 (2007)

    MathSciNet  MATH  Google Scholar 

  4. Chowdhury, M., Stoica, I.: Coflow: A networking abstraction for cluster applications. In: ACM Workshop on Hot Topics in Networks, pp. 31–36. ACM (2012)

  5. Chowdhury, M., Stoica, I.: Efficient coflow scheduling without prior knowledge. In: SIGCOMM, pp. 393–406. ACM (2015)

  6. Chowdhury, M., Zhong, Y., Stoica, I.: Efficient coflow scheduling with varys. In: SIGCOMM, SIGCOMM ’14, pp. 443–454. ACM, New York, NY, USA (2014)

  7. Davis, J.M., Gandhi, R., Kothari, V.H.: Combinatorial algorithms for minimizing the weighted sum of completion times on a single machine. Oper. Res. Lett. 41(2), 121–125 (2013)

    MathSciNet  MATH  Google Scholar 

  8. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Google Scholar 

  9. Garg, N., Kumar, A., Pandit, V.: Order scheduling models: Hardness and algorithms. In: FSTTCS, pp. 96–107. Springer (2007)

  10. Im, S., Moseley, B., Pruhs, K., Purohit, M.: Matroid coflow scheduling. In: International Colloquium on Automata, Languages and Programming (2019)

  11. Khuller, S., Li, J., Sturmfels, P., Sun, K., Venkat, P.: Select and permute: an improved online framework for scheduling to minimize weighted completion time. Theor. Comput. Sci. 795, 420–431 (2019)

    MathSciNet  MATH  Google Scholar 

  12. Khuller, S., Purohit, M.: Brief announcement: Improved approximation algorithms for scheduling co-flows. In: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 239–240. ACM, New York, NY, USA (2016)

  13. Leung, J.Y.T., Li, H., Pinedo, M.: Scheduling orders for multiple product types to minimize total weighted completion time. Discrete Appl. Math. 155(8), 945–970 (2007)

    MathSciNet  MATH  Google Scholar 

  14. Luo, S., Yu, H., Zhao, Y., Wang, S., Yu, S., Li, L.: Towards practical and near-optimal coflow scheduling for data center networks. IEEE Trans. Parallel Distrib. Syst. 27(11), 3366–3380 (2016)

    Google Scholar 

  15. Mastrolilli, M., Queyranne, M., Schulz, A.S., Svensson, O., Uhan, N.A.: Minimizing the sum of weighted completion times in a concurrent open shop. Oper. Res. Lett. 38(5), 390–395 (2010)

    MathSciNet  MATH  Google Scholar 

  16. Qiu, Z., Stein, C., Zhong, Y.: Minimizing the total weighted completion time of coflows in datacenter networks. In: ACM Symposium on Parallelism in Algorithms and Architectures, pp. 294–303. ACM, New York, NY, USA (2015)

  17. Queyranne, M.: Structure of a simple scheduling polyhedron. Math. Program. 58(1–3), 263–285 (1993)

    MathSciNet  MATH  Google Scholar 

  18. Sachdeva, S., Saket, R.: Optimal inapproximability for scheduling problems via structural hardness for hypergraph vertex cover. In: IEEE Conference on Computational Complexity, pp. 219–229. IEEE (2013)

  19. Shafiee, M., Ghaderi, J.: An improved bound for minimizing the total weighted completion time of coflows in datacenters. IEEE/ACM Trans. Netw. 26(4), 1674–1687 (2018)

    Google Scholar 

  20. Wang, G., Cheng, T.E.: Customer order scheduling to minimize total weighted completion time. Omega 35(5), 623–626 (2007)

    Google Scholar 

  21. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10, 10–10 (2010)

    Google Scholar 

  22. Zhao, Y., Chen, K., Bai, W., Yu, M., Tian, C., Geng, Y., Zhang, Y., Li, D., Wang, S.: Rapier: Integrating routing and scheduling for coflow-aware data center networks. In: IEEE International Conference on Computer Communications, pp. 424–432. IEEE (2015)

Download references

Acknowledgements

This work is supported by NSF Grants CNS 156019 and CCF 1655073 (Eager), and partially supported by an Amazon Grant

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saba Ahmadi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this paper appears in Proceedings of IPCO 2017.

Appendices

Appendix 1: A Combinatorial 3-Approximation Algorithm for Concurrent Open Shop with Release Times

Theorem 1

Algorithm 1 gives a 3-approximation for concurrent open shop scheduling with release times.

Proof

We use Algorithm 1 to get a permutation \(\{1,2,\ldots ,n\}\) for a set of jobs J. If we schedule the jobs according to this permutation sequentially, we’ll get:

$$\begin{aligned} C_{j} \le \max _{i'\le j}r_{i'} + \sum _{k\le j} L_{\mu (j),k} \end{aligned}$$

Lemma 5 with \(a=1\) and \(b=1\), imply that:

$$\begin{aligned} \sum _j w_j C_j(alg)&\le \left( 1+\frac{1}{\kappa }\right) \sum _{j=1}^n \sum _{i\in M} \alpha _{i,j} r_{j} + 2(\kappa + 1) \sum _{i\in M} \sum _{S\subseteq J} \beta _{i,S} f_i(S) \end{aligned}$$

To minimize the approximation ratio, we substitute \(\kappa = \frac{1}{2}\) and obtain

$$\begin{aligned} \sum _j w_j C_j(alg)&\le 3 \left( \sum _{j=1}^n \sum _{i\in M} \alpha _{i,j} r_{j} + \sum _{i\in M} \sum _{S\subseteq J} \beta _{i,S} f_i(S) \right) \le 3 \cdot OPT \end{aligned}$$

where the last inequality follows from weak duality as \(\alpha\) and \(\beta\) constitute a feasible dual solution. \(\square\)

Appendix 2: Correction of Algorithm by Qiu et al. [16]

We now give a brief overview of the approximation algorithm given by Qiu, Stein, and Zhong [16].

1.1 Interval-Indexed LP Formulation

In the first step we write an interval-indexed linear programming relaxation for the coflow scheduling problem similar to that for the concurrent open shop problem by Wang and Cheng [20].

Let \(\bar{C_j}\) denote the approximated completion time of coflow j obtained by an optimal feasible solution to this LP relaxation. We first order the coflows in non-decreasing order of these approximated completion times, i.e. we have the following.

$$\begin{aligned} \bar{C_1} \le \bar{C_2} \cdots \le \bar{C_n} \end{aligned}$$
(19)

Let \(V_j\) denote the maximum load on any port by the first j coflows taken together in the above ordering, i.e.

$$\begin{aligned} V_j = \max \left[ \max _i \left\{ \sum _{k = 1}^j \sum _{o} d^k_{io}\right\} , \max _o \left\{ \sum _{k = 1}^j \sum _{i} d^k_{io}\right\} \right] . \end{aligned}$$

Qiu et al. [16] prove that these \(V_j\) values provide a good approximation for the optimal completion times of the coflows. In particular, they show the following where \(C_j^*\) denotes the completion time of coflow j in an optimal schedule.

$$\begin{aligned} \sum _{j} w_j V_j \le \frac{16}{3} \sum _j w_j C_j^* \end{aligned}$$
(20)

1.2 Grouping Coflows

Divide time into geometrically increasing intervals as follows - \([1], [2], [3,4], [5,8], [9,16], \ldots\). Let \(I_l = (2^{l-2}, 2^{l-1}]\) denote the \(l^{\text {th}}\) interval.

Now group the coflows based on the interval where their V values lie and let \(S_l\) denote the set of coflows assigned to interval \(I_l\). In other words, all coflows \(j \in S_l\), we have \(2^{l-2} < V_j \le 2^{l-1}\).

1.2.1 Algorithm 1

  • For \(l = 1, 2, \ldots\)

    • Wait until the last coflow in \(S_l\) is released.

    • Group all coflows in \(S_l\) and schedule as per Algorithm 1 in [16]. This would take time at most \(V_k \le 2^{l-1}\) where k is the last job in the group.

1.2.2 Analysis

Qiu et al. claim the following (Proposition 1 in [16]).

Proposition 1

For any coflow j, let \(C_j(alg)\) denote the completion time of coflow j as per Algorithm 1. Then we have

$$\begin{aligned} C_j(alg) \le \max _{1 \le g \le j}\{r_g\} + 4 V_j. \end{aligned}$$

Since \(\displaystyle C_j^* \ge \max _{1 \le g \le j}\{r_g\}\), Proposition 1 and Equation (20) together imply the following theorem (Theorem 1 in [16]).

Theorem 2

There exists a deterministic polynomial time 67/3 approximation algorithm for coflow scheduling, i.e.

$$\begin{aligned} \sum _{j} w_j C_j(alg) \le \frac{67}{3} \sum _j w_j C_j^*. \end{aligned}$$

1.3 Error

We now show that the Proposition 1 stated above is incorrect. Consequently, Theorem 1 no longer holds. Recall that Algorithm 1 groups jobs based on their V values alone and does not consider their release times.

Consider a simple case where \(m = 1\) and we have just one input port and one output port. Say we have two jobs \(j_1\) and \(j_2\) such that \(j_1\) needs to send 3 units of data and \(j_2\) needs to send 1 unit of data. Also say \(r_{j_1} = 0\) and \(r_{j_2} = 100\). By definition, we have \(V_{j_1} = 3\) and \(V_{j_2} = 4\); note that both the jobs belong to the same interval \(I_3 = (2,4]\). Now since both jobs belong to the same interval, Algorithm 1 waits for both the jobs to be released and then schedules them together (after time 100). In this case, the claim in Proposition 1 clearly does not hold for job \(j_1\).

Proposition 2 in [16] makes a similar claim for a grouping algorithm using randomized intervals. Again, the above instance serves as a counterexample to the claim. Consequently, Theorem 2 in [16] does not hold.

In the following section, we show that the deterministic grouping algorithm can be modified to yield a \(\frac{76}{3}\)-approximation algorithm. Note that this is worse than the \(\frac{67}{3}\) factor claimed earlier. It is not immediately clear whether the randomized algorithm from [16] can be corrected via a similar modification.

1.4 Corrected Grouping Algorithm

We first solve the interval-indexed LP formulation to obtain approximated completion times \(\bar{C_j}\). Without loss of generality, we assume that the coflows are ordered as per Equation (19).

As shown by Leung, Li, and Pinedo (Theorem 13 in [13]), the analysis of Wang and Cheng [20] can be extended to the case of general release times to obtain the following.

$$\begin{aligned} \sum _{j} w_j\bar{C_j} \le \frac{19}{3} \sum _{j} w_jC_j^* \end{aligned}$$
(21)

This is analogous to Lemma 3 in [16] that shows that \(\sum _j w_j V_j \le \frac{16}{3} \sum _j w_j C_j^*\) where \(V_j\) is the maximum load on any port by the first j coflows taken together (as per the ordering).

Since \(\bar{C_j}\) denotes the approximation completion time of coflow j as computed by the valid LP relaxation, we also have the following where \(r_j\) denotes the release time of coflow j.

$$\begin{aligned} \bar{C_j}&\ge r_j \end{aligned}$$
(22)
$$\begin{aligned} \bar{C_j}&\ge V_j \end{aligned}$$
(23)

1.4.1 Algorithm

Divide time into geometrically increasing intervals as follows - \([1], [2], [3,4], [5,8], [9,16], \ldots\). Let \(I_l = (2^{l-2}, 2^{l-1}]\) denote the \(l^{\text {th}}\) interval.

Now group the coflows based on the interval where their \({\bar{C}}\) values lie and let \(S_l\) denote the set of coflows assigned to interval \(I_l\). So for all coflows \(j \in S_l\), we have \(2^{l-2} < \bar{C_j} \le 2^{l-1}\).

1.4.2 Algorithm

  • For \(l = 1, 2, \ldots\)

    • Wait until the last coflow in \(S_l\) is released AND all coflows in \(S_{l-1}\) have finished. (whichever is later).

    • Group all coflows in \(S_l\) and schedule as per Algorithm 1 in [16]. This would take time at most \(V_k \le 2^{l-1}\) where k is the last job in the group.

1.4.3 Analysis

Let \(\tilde{C_l}\) denote the time by which all coflows in \(S_{l}\) have been scheduled by the above algorithm.

Claim

\(\tilde{C_l} \le 2 \times 2^{l-1} = 2^l\) for every group \(S_l\).

Proof

We prove by induction. For group \(S_1\), we start executing the schedule at \(\max _{j \in S_1} r_j \le \max _{j \in S_1} \bar{C_j} \le 2^{1-1} = 1\) and the schedule takes time at most \(V_k \le 2^{1-1} = 1\) where k is the last coflow in the group. So the base case is true.

Now assume that the claim is true for some group \(S_l\). As per the algorithm, the coflows in group \(S_{l+1}\) start executing at \(\tilde{C_l}\) or \(\max _{j \in S_{l+1}} r_j\) whichever is later. By induction, we are guaranteed that \(\tilde{C_l} \le 2^l\). Also \(\max _{j \in S_{l+1}} r_j \le \max _{j \in S_{l+1}} \bar{C_j} \le 2^l\). Thus the coflows in group \(S_{l+1}\) start executing latest at time \(2^l\). We know that all these coflows require at most \(V_k \le \bar{C_k} \le 2^l\) time units to complete. As a result, all the coflows in this group are scheduled by time \(2^l + 2^l = 2^{l+1}\).

And thus the claim follows by induction. \(\square\)

Claim

For any coflow j, let \(C_j(alg)\) denote the completion time of coflow j as per the algorithm. Then \(C_j(alg) < 4\bar{C_j}\).

Proof

Consider any coflow j, and let l be such that \(j \in S_l\). Hence we have \(\bar{C_j} > 2^{l-2}\). By the previous claim, we have

$$\begin{aligned} C_j(alg) \le \tilde{C_l} \le 2^l = 4 \times 2^{l-2} < 4\bar{C_j} \end{aligned}$$

\(\square\)

Corollary 2

There is a deterministic \(\frac{76}{3}\)-approximation for coflow scheduling with arbitrary release times.

Proof

Claim 1 and Equation (21) together imply a \(\frac{76}{3}\)-approximation algorithm for coflow scheduling with release times. \(\square\)

Appendix 3: Counterexample to Claim by Luo et al. [14]

Luo et al. [14] claim a 2-approximation algorithm for the coflow scheduling problem by proving that it is equivalent to concurrent open shop scheduling. One of the key ingredients of their proof is the following claim that is implicit in Lemma 3 in [14].

Claim

(Restated from [14]) Given two coflows \(G_k\) and \(G_l\), we can find a feasible schedule for both the coflows such that \(C_k + C_l = \min \{\varDelta (G_k) + \varDelta (G_k \bigcup G_l), \varDelta (G_l) + \varDelta (G_k \bigcup G_l)\}\).

1.1 Counterexample

We show that Claim 3 is erroneous via a simple counterexample. Consider two coflows on a \(3 \times 3\) datacenter as shown in Fig. 5. Note that while coflows \(G_1\) and \(G_2\) have \(\varDelta (G_1) = 1\) and \(\varDelta (G_2) = 2\), the combined coflow \(G_1 \bigcup G_2\) also has \(\varDelta (G_1 \bigcup G_2) = 2\). Consequently, the RHS in Claim 3 equals \(\varDelta (G_1) + \varDelta (G_1 \bigcup G_2) = 3\).

Fig. 5
figure 5

Simple counterexample to Claim 3

On the other hand, as seen in Fig. 5, if coflow \(G_1\) is scheduled so that \(C_1 = \varDelta (G_1) = 1\), then the matching constraints force coflow \(G_2\) to have completion time \(C_2 = 3\). On the other hand, delaying one edge of coflow \(G_1\), leads to a schedule with \(C_1 = C_2 = 2\). In both cases, we have \(C_1 + C_2 = 4\) (instead of 3) leading to a contradiction to the claim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmadi, S., Khuller, S., Purohit, M. et al. On Scheduling Coflows. Algorithmica 82, 3604–3629 (2020). https://doi.org/10.1007/s00453-020-00741-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-020-00741-3

Keywords

Navigation