Discrete OptimizationGeneral scheduling non-approximability results in presence of hierarchical communications
Introduction
More and more parallel and distributed systems (cluster, grid and global computing) are both becoming available all over the world, and opening new perspectives for developers of a large range of applications including data mining, multimedia, and bio-computing. However, this very large potential of computing power remains largely unexploited this being, mainly due to the lack of adequate and efficient software tools for managing this resource.
Scheduling theory is concerned with the optimal allocation of scarce resources to activities over time. Of obvious practical importance, it has been the subject of extensive research since the early 1950s and an impressive amount of literature now exists. The theory dealing with the design of algorithms dedicated to scheduling is much younger, but still has a significant history.
An application which will be scheduled on a parallel architecture may be represented by an acyclic graph G = (V, E) (or precedence graph) where V designates the set of tasks, which will be executed on a set of m processors, and where E represents the set of precedence constraints. A processing time is allotted to each task i ∈ V.
From the very beginning of the study about scheduling problems, models kept up with changing and improving technology. Indeed,
- •
In the PRAM’s model, in which communication is considered instantaneous, the critical path (the longest path from a source to a sink) gives the length of the schedule. So the aim, in this model, is to find a partial order on the tasks, in order to minimize an objective function.
- •
In the homogeneous scheduling delay model, each arc (i, j) ∈ E represents the potential data transfer between task i and task j provided that i and j are processed on two different processors. So the aim, in this model, is to find a compromise between a sequential execution and a parallel execution.
These two models have been extensively studied over the last few years from both the complexity and the (non)-approximability points of view (see [17], [9]).
With the increasing importance of parallel computing, the question of how to schedule a set of tasks on a given architecture becomes critical, and has received much attention. More precisely, scheduling problems involving precedence constraints are among the most difficult problems in the area of machine scheduling and they are part of the most studied problems in the domain.
In this paper, we adopt the hierarchical communication model [5] in which we assume that the communication delays are not homogeneous anymore; the processors are connected into clusters and the communications inside a same cluster are much faster than those between processors belonging to different ones.
This model incorporates the hierarchical nature of the communications using today’s parallel computers, as shown by many PCs or workstations networks (NOWs) [24], [1]. The use of networks (clusters) of workstations as a parallel computer [24], [1] has not only renewed the user’s interest in the domain of parallelism, but it has also brought forth many new challenging problems related to the exploitation of the potential power of computation offered by such a system.
Several approaches meant to try and model these systems were proposed taking into account this technological development:
- •
One approach concerning the form of programming system, we can quote work [25], [26], [8], [6].
- •
In abstract model approach, we can quote work [27], [20], [21], [12], [7], [22], [13] on malleable tasks introduced by [7], [12]. A malleable task is a task which can be computed on several processors and of which the execution time depends on the number of processors used for its execution.
As stated above, the model we adopt here is the hierarchical communication model which addresses one of the major problems that arises in the efficient use of such architectures: the task scheduling problem. The proposed model includes one of the basic architectural features of NOWs: the hierarchical communication assumption i.e., a level-based hierarchy of communication delays with successively higher latencies. In a formal context where both a set of clusters of identical processors, and a precedence graph G = (V, E) are given, we consider that if two communicating tasks are executed on the same processor (resp. on different processors of the same cluster) then the corresponding communication delay is negligible (resp. is equal to what we call interprocessor communication delay). On the contrary, if these tasks are executed on different clusters, then the communication delay is more significant and is called intercluster communication delay.
We are given m multiprocessor machines (or clusters denoted by Πi) that are used to process n precedence-constrained tasks. Each machine Πi (cluster) comprises several identical parallel processors (denoted by ). A couple (cij, ϵij) of communication delays is associated to each arc (i, j) between two tasks in the precedence graph. In what follows, cij (resp. ϵij) is called intercluster (resp. interprocessor) communication, and we consider that cij ⩾ ϵij. If tasks i and j are alloted on different machines Πi and Πj, then j must be processed at least cij time units after the completion of i. Similarly, if i and j are processed on the same machine Πi but on different processors and (with k ≠ k′) then j can only start ϵij units of time after the completion of i. However, if i and j are executed on the same processor, then j can start immediately after the end of i. The communication overhead (intercluster or interprocessor delay) does not interfere with the availability of processors and any processor may execute any task. Our goal is to find a feasible schedule of tasks minimizing the makespan, i.e., the time needed to process all tasks subject to the precedence graph.
Formally, in the hierarchical scheduling delay model a hierarchical couple of values (cij, ϵij) will be associated with ϵij ⩽ cij ∀(i, j) ∈ E such that:
- •
if Πi = Πj and if then ti + pi ⩽ tj,
- •
else if Πi = Πj and if with k ≠ k′ then ti + pi + ϵij ⩽ tj,
- •
else Πi ≠ Πjti + pi + cij ⩽ tj,
Note that the hierarchical model that we consider here is a generalization of classical scheduling model with communication delays [9], [11]. Consider, for instance, that for every arc (i, j) of the precedence graph we have cij = ϵij. In such a case, the hierarchical model is exactly the classical scheduling communication delays model. In this article, we study the impact of introducing the notion of hierarchical communications on the hardness of approximating the multiprocessor scheduling problem such that processors of the parallel architecture are partitioned into unbounded numbers of clusters (we study the case where there are only l ⩾ 4 fully connected processors per cluster, denoted in what follows by ).
Using an extension of the classical notation of Lenstra et al. [17], our problems can be written as
- •
and
- •
with c > c′.
Note that the values c and l are considered as constant in the following.
Complexity results: On negative side, Bampis et al. in [4] studied the impact of the hierarchical communications on the complexity of the associated problem. They considered the simplest case, i.e., the problem (cij, ϵij) = (1, 0); pi = 1∣Cmax, and they showed that this problem did not possess a polynomial-time approximation algorithm with a ratio guarantee better than 5/4 (unless ). Recently, [15] Giroudeau proved that there is no hope to find a ρ-approximation with ρ < 6/5 for the couple of communication delays (cij, ϵij) = (2, 1). If duplication is allowed, Bampis et al. [3] extended the result of [10] in the case of hierarchical communications, providing an optimal algorithm for . These complexity results are given in Table 1. Remark 1 Note that as far as the problem denoted by , in the case of Cmax = 5 (resp. Cmax = 3) is concerned, it is -complete (resp. polynomial). For Cmax = 4 we conjecture that there exists a polynomial-time algorithm. These results are in [15].
Approximation results: On positive side, the authors presented in [2] a 8/5-approximation algorithm for the problem which is based on an integer linear programming formulation. They relax the integrity constraints and they produce a feasible schedule by rounding. This result is extended to the problem leading to a -approximation algorithm (see Table 2).
The challenge is to determinate a threshold for the approximation algorithm concerning the two more general problems: and with c′ < c.
In the classical scheduling communication delay model, we know that (see [19]) the decision problem associated with becomes -complete even for Cmax ⩾ 6, and that it is polynomial for Cmax ⩽ 5 (this problem is denoted in what follows the UET-UCT (Unit Execution Time Unit Communication Time) homogeneous scheduling communication delays problem). Recently, in [16], the authors proved that there is no possibility of finding a ρ-approximation with ρ < 1 + 1/(c + 4) (unless ) for the case where all tasks of the precedence graph have unit execution times, where the multiprocessor is composed of an unrestricted number of machines, and where c denotes the communication delay between two tasks i and j both submitted to a precedence constraint and which have to be processed by two different machines (this problem is denoted in the following UET-LCT (Unit Execution Time Large Communication Time) homogeneous scheduling communication delays problem). The problem becomes polynomial whenever the makespan is at most (c + 1). The case of (c + 2) is still partially opened.
In the same way as for the hierarchical communication delay model, for the couple of communication delay values (1, 0), the authors proved in [4] that there is no possibility of finding a ρ-approximation with ρ < 5/4 (this problem is detailed in following the UET-UCT hierarchical scheduling communication delay problem).
Thus, an interesting question arises: “Does the gap, for the UCT case, between the homogeneous and hierarchical scheduling communication delay problems persist for the LCT case?”
This article provides the answer to this question.
In order to give the threshold for the two problems described below, we prove that the problem of deciding whether an instance of with c ⩾ 3 (resp. with (c > c′, c′ > 1) has a schedule of length at most (c + 3) is -complete. We also extend the non-approximability result in the case of the completion time, denoted in what follows by with Cj = tj + 1. In order to obtain this result, the polynomial-time transformation using in the -completeness proof for makespan minimization, and the gap technique proposed by Hoogeveen et al. [18] are used.
We also prove that the problem of deciding whether an instance of has a schedule of length at most (c + 1) is polynomial.
This article is organized as follows: in the next section, we give some preliminary results concerning the problem which will be used for the polynomial-time transformation in order to prove the non-approximability results. In Section 3, we prove that the problem of deciding whether an instance of has a schedule of length at most (c + 3) is -complete. These results will be generalized. We extend this result to the criterion of the sum of the completion times by proving that there is no possibility of finding a ρ-approximation algorithm with ρ strictly less than . We show that the problem of deciding whether an instance of has a feasible schedule of length at most (c + 1) is solvable in polynomial time. In the last section, we discuss our results.
Section snippets
Preliminary results
In this section, we give one polynomial-time transformation in order to prove the -completeness of the problem (the definition of this problem is given below). This problem is used in the polynomial-time transformation for the -completeness of scheduling problems.
- •
The problem : is the problem Monotone-one-in-three-3SAT. Let us first recall the definition of Monotone-one-in-three-3SAT problem.
Instance of problem Monotone-one-in-three-3SAT:
- –
Let be a set of n boolean variables.
- –
- –
Non-approximability results
In this section, several proofs of -completeness will be provided. First, we prove that the problem of deciding whether an instance of ; (cij, ϵij) = (c ⩾ 3, 1); pi = 1∣Cmax having a schedule of length at (c + 3) is -complete (see Theorem 3). This result will be generalized in Theorem 6 for the problem with c > c′ + 1 > 2.
We also prove the -completeness for the problem of (see Theorem 7). The polynomial-time
Discussion
With previous results, the answer to the question asked at the beginning of this paper is provided in Fig. 5. Since the hierarchical model that we consider here is a generalization of the classical scheduling model with communication delay, for both cases (UET-UCT and UET-LCT cases) the problem becomes more difficult for the hierarchical model.
Moreover, for c ⩾ 4 the problem is proved to be easier for the hierarchical model. Indeed, in [16], the authors proved that there is no possibility of
Conclusion
In this paper, we first proved that the problem of deciding whether an instance of having a schedule of length at most (c + 3) is -complete. We generalized the results given by Bampis et al. [4] and Giroudeau [15].
This result is to be compared with the result of [19], which states that cij = 1;pi = 1—Cmax = 6 is -complete. Our result implies that there is no ρ − approximation algorithm with , unless . In addition, we show that there is
Acknowledgements
The authors thank the two anonymous referees of the journal for their helpful corrections and suggestions which improved the readability of this article.
References (27)
- et al.
An approximation algorithm for the precedence constrained scheduling problem with hierarchical communications
Theoretical Computer Science
(2003) - et al.
Optimization and approximation in deterministic sequencing and scheduling theory: A survey
Annals of Discrete Mathematics
(1979) - et al.
Three, four, five, six, or the complexity of scheduling with communication delays
Operations Research Letters
(1994) Guidelines for data-parallel cycle-stealing in networks of workstations I: On maximizing expected output
Journal of Parallel Distributed Computing
(1999)- et al.
A case for NOW (networks of workstations)
IEEE Micro
(1995) - et al.
A heuristic for the precedence constrained multiprocessor scheduling problem with hierarchical communications
- et al.
Using duplication for multiprocessor scheduling problem with hierarchical communications
Parallel Processing Letters
(2000) - et al.
On the hardness of approximating the precedence constrained multiprocessor scheduling problem with hierarchical communications
RAIRO-RO
(2002) - et al.
On optimal strategies for cycle-stealing in networks of workstations
IEEE Transactions on Computers
(1997) - et al.
Dynamic load balancing for ocean circulation model adaptive meshing