Abstract
In this paper, we consider the homogeneous scheduling on speed-scalable processors, where the energy consumption is minimized. While most previous works have studied single-processor jobs, we focus on rigid parallel jobs, using more than one processor at the same time. Each job is specified by release date, deadline, processing volume and the number of required processors. Firstly, we develop constant-factor approximation algorithms for such interesting cases as agreeable jobs without migration and preemptive instances. Next, we propose a configuration linear program, which allows us to obtain an “almost exact” solution for the preemptive setting. Finally, in the case of non-preemptive agreeable jobs with unit-work operations, we present a three-approximation algorithm by generalization of the known exact algorithm for single-processor jobs.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Albers, S., Antoniadis, A., & Greiner, G. (2015). On multi-processor speed scaling with migration. Journal of Computer and System Sciences, 81, 1194–1209.
Albers, S., Bampis, E., Letsios, D., Lucarelli, G., & Stotz, R. (2017). Scheduling on power-heterogeneous processors. Information and Computation, 257, 22–33.
Albers, S., Müller, F., & Schmelzer, S. (2014). Speed scaling on parallel processors. Algorithmica, 68(2), 404–425.
Angel, E., Bampis, E., Kacem, F., & Letsios, D. (2019). Speed scaling on parallel processors with migration. Journal of Combinatorial Optimization, 37(4), 1266–1282.
Antoniadis, A., & Huang, C. C. (2013). Non-preemptive speed scaling. Journal of Scheduling, 16(4), 385–394.
Bampis, E., Kononov, A., Letsios, D., Lucarelli, G., & Nemparis, I. (2015). From preemptive to non-preemptive speed-scaling scheduling. Discrete Applied Mathematics, 181, 11–20.
Bampis, E., Kononov, A., Letsios, D., Lucarelli, G., & Sviridenko, M. (2018). Energy efficient scheduling and routing via randomized rounding. Journal of Scheduling, 21(1), 35–51.
Bampis, E., Letsios, D., & Lucarelli, G. (2015). Green scheduling, flows and matchings. Theoretical Computer Science, 579, 126–136.
Bingham, B. D., & Greenstreet, M. R. (2008). Energy optimal scheduling on multiprocessors with migration. In International symposium on parallel and distributed processing with applications, ISPA 2008 (pp. 153–161).
Brodtkorb, A. R., Dyken, C., Hagen, T. R., Hjelmervik, J. M., & Storaasli, O. O. (2010). Stateof-the-art in heterogeneous computing. Scientific Programming, 18, 1–33.
Chen, J., Hsu, H., Chuang, K., Yang, C., Pang, A., & Kuo, T. (2004). Multiprocessor energy-efficient scheduling with task migration considerations. In 16th euromicro conference on real-time systems, ECRTS (pp. 101–108). IEEE.
Cohen-Addad, V., Li, Z., Mathieu, C., & Milis, I. (2015). Energy-efficient algorithms for non-preemptive speed-scaling. In International workshop on approximation and online algorithms, WAOA 2014. LNCS (Vol. 8952, pp. 107–118). Springer, Berlin.
Drozdowski, M. (2009). Scheduling for parallel processing. London: Springer.
Gerards, M. E. T., Hurink, J. L., & Hölzenspies, P. K. F. (2016). A survey of offline algorithms for energy minimization under deadline constraints. Journal of Scheduling, 19, 3–19.
Greiner, G., Nonner, T., & Souza, A. (2014). The bell is ringing in speed-scaled multiprocessor scheduling. Theory of Computing Systems, 54(1), 24–44.
Grötschel, M., Lovász, L., & Schrijver, A. (1993). Geometric algorithms and combinatorial optimizations, 2nd corrected edition. Berlin: Springer.
Gupta, A., Im, S., Krishnaswamy, R., Moseley, B., & Pruhs, K. (2012). Scheduling heterogeneous processors isn’t as easy as you think. In Twenty-third annual ACM-SIAM symposium on discrete algorithms (pp. 1242–1253).
Gupta, A., Krishnaswamy, R., & Pruhs, K. (2010a). Nonclairvoyantly scheduling power-heterogeneous processors. In Proceedings of the international green computing conference (pp. 165–173).
Gupta, A., Krishnaswamy, R., & Pruhs, K. (2010b). Scalably scheduling power-heterogeneous processors. In Proceedings of the international colloquium on automata, languages, and programming (pp. 312–323).
Huang, W., & Wang, Y. (2009). An optimal speed control scheme supported by media servers for low-power multimedia applications. Multimedia Systems, 15(2), 113–124.
Jansen, K., & Porkolab, L. (2000). Preemptive parallel task scheduling in \(o(n)+poly(m)\) time. In D. T. Lee, S.H. Teng (Eds.), Proceedings of ISAAC 2000. LNCS (Vol. 1969, pp. 398–409).
Johannes, B. (2006). Scheduling parallel jobs to minimize the makespan. Journal of Scheduling, 9, 433–452.
Karzanov, A. (1974). Determining the maximal flow in a network by the method of preflows. Soviet Math. Doklady, 15(2), 434–437.
Kononov, A., & Kovalenko, Y. (2016). On speed scaling scheduling of parallel jobs with preemption. In International conference on discrete optimization and operations research, DOOR-2016. LNCS (Vol. 9869, pp. 309–321). Springer, Berlin.
Kononov, A., & Kovalenko, Y. (2017). An approximation algorithm for preemptive speed scaling scheduling of parallel jobs with migration. In International conference on learning and intelligent optimization, LION 2017. LNCS (Vol. 10556, pp. 351–357). Springer, Berlin.
Li, M., Yao, F., & Yuan, H. (2017). An \(O(n^2)\) algorithm for computing optimal continuous voltage schedules. In Theory and applications of models of computation, TAMC 2017. LNCS(Vol. 10185, pp. 389–400). Springer.
Naroska, E., & Schwiegelshohn, U. (2002). On an on-line scheduling problem for parallel jobs. Information Processing Letters, 81, 297–304.
Shioura, A., Shakhlevich, N., & Strusevich, V. (2017). Machine speed scaling by adapting methods for convex optimization with submodular constraints. INFORMS Journal on Computing, 29(4), 724–736.
Wu, W., Li, M., & Chen, E. (2011). Min-energy scheduling for aligned jobs in accelerate model. Theoretical Computer Science, 412(12–14), 1122–1139.
Yao, F., Demers, A., & Shenker, S. (1995). A scheduling model for reduced CPU energy. In 36th annual foundations of computer science, FOCS 1995 (pp. 374–382). IEEE.
Acknowledgements
A. Kononov was supported by Program no. I.5.1 of Fundamental Research of the Siberian Branch of the Russian Academy of Sciences (project no. 0314-2019-0014). Yu. Kovalenko was supported by Program no. I.5.1 of Fundamental Research of the Siberian Branch of the Russian Academy of Sciences (project no. 0314-2019-0019).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
The Appendix contains an exact method for computing the lower bound on the energy consumption and approximation algorithms for particular cases of the speed-scaling problem when rigid jobs have a common release date and/or a common deadline.
Lower bound model
Using the approach from Shioura et al. (2017), we present an exact algorithm for finding a max-flow in the bipartite network \(G=(V,A)\) proposed in Sect. 2.1 (see Fig. 4) that minimizes the total cost
Note that our network differs from the network in Shioura et al. (2017) only in the capacities of arcs from job nodes to interval nodes (\(\mu (j,I_k)=\Delta _k\) in Shioura et al. 2017). Initially, we reformulate the problem in terms of submodular optimization. We define a polymatroid polyhedron
Let \(q(X):=\sum _{j\in X}q_j\) be the total duration of all jobs from X as they are executed on one processor, and a polymatroid rank function \(\varphi :2^{\mathcal {J}}\rightarrow \mathbb {R}\) is given by
Then all possible max-flows can be characterized as a base polyhedron \(B(\varphi )=\{ q\in Q_{(+)}(\varphi ):\ q(X)\le \varphi (X),\ X\in 2^{\mathcal {J}}\}\) of the polymatroid polyhedron \(Q_{(+)}(\varphi )=\{ q\in \mathbb {R}^n_+:\ q(\mathcal {J})=\varphi (\mathcal {J})\}\).
For \(X\subseteq \mathcal {J}\) and \(h=1,\dots ,\gamma \), we denote by
the number of processors that can be utilized by jobs from X in interval \(I_h\). Then the value \(\varphi (x)\) is explicitly given as
and specifies the total duration of all time intervals available for processing the jobs of set X.
Therefore, in terms of submodular optimization, the considered problem can be reformulated as
To solve the problem (14), we use the decomposition algorithm from Shioura et al. (2017). The scheme is presented in Algorithm 2. Note that the subproblems to be solved in Steps 5 and 6 have a common structure. Hence, the original problem is solved recursively. The number of subproblems generated by the decomposition algorithm is O(n), and therefore Steps 1, 2 and 3 are performed O(n) times. Hence, the overall running time of the decomposition algorithm is \(O(nT_{123}(n))\), where \(T_{123}(n)\) is the time complexity of Steps 1, 2 and 3.
Step 1 can be implemented in O(n) time, using the necessary condition

Thus, we combine (15) with (16) and obtain
The problem of Step 2 is reduced to the max-flow problem in the network \(G_b\) obtained from network \(G=(V,A)\) by replacing capacities \(\mu (s,j)=+\infty \) with b(j) for all \(j\in \mathcal {J}\). For a max-flow \(x^*\) in \(G_b\), an optimal solution to the problem of Step 2 is given by \(c_j=x^*(s,j),\ j\in \mathcal {J}\). At Step 3, a set \(Y^*\) is constructed from a minimum s-t cut \((S^*,T^*)\) in \(G_b\) as \(Y^*=S^*\) due to \(c(S^*)=\varphi (S^*),\ c_j=b_j,\ j\in T^*\). The results for Steps 2 and 3 immediately follow from Lemma 3 in Shioura et al. (2017). A max-flow in \(G_b\) can be found in \(O(n^3)\) time by the algorithm of Karzanov, and a minimum s-t cut in \(G_b\) is computed in \(O(n^2)\) operations Karzanov (1974). Therefore, the time complexity of the decomposition algorithm is \(O(n^4)\).
Common release date and/or deadline
Let us assume that all jobs have a common release date r and/or a common deadline d. We present strongly polynomial-time algorithms achieving constant-factor approximation guarantees for non-migratory cases. Our algorithms consist of two stages. At the first stage, we obtain a lower bound on the minimum energy consumption and calculate intermediate execution times of jobs. Then, at the second stage, we determine the final speeds of jobs and schedule them.
1.1 Common release date and deadline
Now we consider the non-preemptive case of the problem where all jobs arrive at time \(r=0\) and have a shared global deadline d.
1.1.1 The first stage
A lower bound on the objective function can be found in \(O(n^4+n\max \nolimits _{j\in \mathcal {J}}size_j)\) time using the method presented in Sect. 2.1. Here, a more effective approach is proposed for the considered problem instances.
We construct the following convex problem with \(e_j\) being the temporary duration of job \(j\in \mathcal {J}\):
Constraint (19) gives the bound on processor usage in interval [0, d). The energy consumption is formulated in form (18). This problem is solved using the Lagrangian method. Define the Lagrangian function \(L(e_j,\lambda )\) as
The necessary and sufficient conditions for an optimal solution allow us to find the temporary durations
where \(B_j=\left( \sum \limits _{l=1}^{size_j} \frac{ W_{jl}^{\alpha }(\alpha -1) }{ size_j }\right) ^{\frac{1}{\alpha }}\).
Note that \(e_j\) may be greater than d for some jobs. In order to avoid such situations, we propose the following procedure.
Let \(m'\) denote the current number of unoccupied processors and \(\mathcal {J}'\) be the set of currently considered jobs. Initially, \(\mathcal {J}':=\mathcal {J}\) and \(m':=m\).
We enumerate the jobs one by one in order of non-increasing values \(B_j\). If the current job i has value \(B_i\ge \frac{\sum _{j\in \mathcal {J}'} B_j size_j}{m'}\), then we assign duration \(p_i:=d\) for this job, and set \(\mathcal {J}':=\mathcal {J}'\setminus \{i\}\) and \(m':=m'-size_i\). We then go to the next job. Otherwise, all jobs \({l\in \mathcal {J}'}\) satisfy the inequality \({B_l<\frac{\sum _{j\in \mathcal {J}'} B_j size_j}{m'}}\), and we assign durations \(p_l:=\frac{B_l m' d}{\sum _{j\in \mathcal {J}'} B_j size_j}\) for them.
The time complexity of our procedure is \(O(n\log n+n\max \nolimits _{j\in \mathcal {J}}size_j)\). It guarantees that \(\sum _{j\in \mathcal {J}}p_j size_j= md\), \(p_j \le d\) and gives the lower bound on the objective function equal to \(\sum _{j\in \mathcal {J}} p_j \sum _{l=1}^{size_j} \left( \frac{W_{jl}}{p_j}\right) ^{\alpha }\). At the second stage, we use the “non-preemptive list scheduling” algorithm (Naroska and Schwiegelshohn 2002) to construct a feasible schedule.
1.1.2 The second stage
Whenever a subset of processors falls idle, the non-preemptive list scheduling algorithm schedules a job that does not require more processors than are available, until all jobs in \(\mathcal {J}\) are assigned. The time complexity of the algorithm is \(O(n^2)\).
We claim that the length of the constructed schedule is at most \({\left( 2-\frac{1}{m}\right) d}\). (The proof is similar to the proof of Lemma 2 in Sect. 2.4.) By increasing the speed of each job operation in \(\left( 2-\frac{1}{m}\right) \) times, we obtain a schedule with length of at most d. The total energy consumption is increased by a factor \(\left( 2-\frac{1}{m}\right) ^{\alpha -1}\) in comparison with the lower bound. As a result, we have
Theorem 6
A \(\left( 2-\frac{1}{m}\right) ^{\alpha -1}\)-approximate schedule can be found in \(O(n^2+n\max \nolimits _{j\in \mathcal {J}}size_j)\) time for speed-scaling scheduling problems \({P|size_j,r_j=r,d_j=d|E}\) and \({P|size_j,pmtn*,r_j=r,d_j=d|E}\).
1.2 Common release date or deadline
Here, we study the preemptive problem without migration where all jobs are released at time \(r=0\), but have individual deadlines. Auxiliary durations \(e_j\) of jobs and a lower bound on the objective function are computed in \(O(n^4+n\max \nolimits _{j\in \mathcal {J}}size_j)\) time using the min-cost max-flow model presented in Sect. 2.1.
Then we use the preemptive earliest deadline list scheduling algorithm to construct an approximate solution. Jobs are scheduled in order of non-decreasing deadlines as follows. If \(size_i> \left\lceil \frac{m}{2} \right\rceil \), then job i is assigned at the end of the current schedule. Otherwise, we start job i at the earliest time instant when \(size_i\) processors are idle and process it during \(e_i\) time, ignoring intervals of jobs with \(size_j> \left\lceil \frac{m}{2} \right\rceil \). The time complexity of the algorithm is \(O(n^2)\).
We claim that the completion time \(C_j\) of each job j in the constructed schedule is at most \({\left( 3-\varphi _m\right) d_j}\) (see Lemma 4). Hence, an increase in the speeds in \({\left( 3-\varphi _m\right) }\) time yields a feasible schedule. The total energy consumption is increased by a factor \(\left( 3-\varphi _m\right) ^{\alpha -1}\).
Obviously, through the interchange of release dates and deadlines, the algorithm presented can also handle the case of jobs with individual release dates but a common deadline. As a result, we have
Theorem 7
A \(\left( 3-\varphi _m\right) ^{\alpha -1}\)-approximate schedule can be found in \(O(n^4 +n\max \limits _{j\in \mathcal {J}}size_j)\) time for speed-scaling scheduling problems \(P|size_j,pmtn*,r_j=r,d_j|E\) and \(P|size_j,pmtn*,r_j,d_j=d|E\).
Recall that
and prove the following lemma.
Lemma 4
Given m processors and a set of jobs \(\mathcal {J}\) with deadlines \(d_i,\) processing times \(e_i\le d_i\) and sizes \(size_i\), where \(\sum _{j\in \mathcal {J}_i}e_j size_j\le md_i\) for each \(i\in \mathcal {J}\) with \(\mathcal {J}_i=\{j\in \mathcal {J}:\ d_j\le d_i\}\), the completion time \(C_i\) is at most \({\left( 3-\varphi _m\right) d_i}\) for each job \(i\in \mathcal {J}\) in the schedule S constructed by the preemptive earliest deadline list scheduling algorithm.
Proof
We consider an arbitrary deadline \(d_i\), where job i has the maximum completion time \(C_i\) in schedule S among all jobs with deadline equal to \(d_i\).
Note that \(C_j\le C_i\) for all jobs \(j\in \mathcal {J}_i\). Let \(S_i\) denote the part of schedule S which contains only jobs from subset \(\mathcal {J}_i\) and occupies interval \([0,C_i)\). We will show that \(C_i\le {\left( 3-\varphi _m\right) d_i}\).
If at least \(\left\lceil \frac{m+1}{2} \right\rceil \) processors are used at any time instance in subschedule \(S_i\), we have
Otherwise, assume that l is the last job in subschedule \(S_i\) that requires \(size_l \le \left\lceil \frac{m}{2} \right\rceil \) processors. It is easy to see that all time slots in intervals \([0,C_l-e_l)\) and \([C_l,C_i)\) use at least \(\left\lceil \frac{m+1}{2} \right\rceil \) processors, and at least \(size_l\) processors are utilized in interval \([C_l-e_l,C_l)\). Therefore, the total load of all processors in subschedule \(S_i\) is at least
If \(e_l\ge \frac{C_i}{3-\varphi _m}\), then \(C_i \le \left( 3-\varphi _m\right) d_l \le \left( 3-\varphi _m\right) d_i\).
Otherwise, for \(size_l\ge 1\) we have
\(\square \)
Rights and permissions
About this article
Cite this article
Kononov, A., Kovalenko, Y. Approximation algorithms for energy-efficient scheduling of parallel jobs. J Sched 23, 693–709 (2020). https://doi.org/10.1007/s10951-020-00653-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10951-020-00653-8