Discrete Optimization
General scheduling non-approximability results in presence of hierarchical communications

https://doi.org/10.1016/j.ejor.2006.11.020Get rights and content

Abstract

We investigate on the issue of minimizing the makespan (resp. the sum of the completion times) for the multiprocessor scheduling problem in presence of hierarchical communications. We consider a model with two levels of communication: interprocessor and intercluster. The processors are grouped in fully connected clusters. We propose general non-approximability results in the case where all the tasks of the precedence graph have unit execution times, and where the multiprocessor is composed of an unrestricted number of machines with l  4 identical processors each.

Introduction

More and more parallel and distributed systems (cluster, grid and global computing) are both becoming available all over the world, and opening new perspectives for developers of a large range of applications including data mining, multimedia, and bio-computing. However, this very large potential of computing power remains largely unexploited this being, mainly due to the lack of adequate and efficient software tools for managing this resource.

Scheduling theory is concerned with the optimal allocation of scarce resources to activities over time. Of obvious practical importance, it has been the subject of extensive research since the early 1950s and an impressive amount of literature now exists. The theory dealing with the design of algorithms dedicated to scheduling is much younger, but still has a significant history.

An application which will be scheduled on a parallel architecture may be represented by an acyclic graph G = (V, E) (or precedence graph) where V designates the set of tasks, which will be executed on a set of m processors, and where E represents the set of precedence constraints. A processing time is allotted to each task i  V.

From the very beginning of the study about scheduling problems, models kept up with changing and improving technology. Indeed,

  • In the PRAM’s model, in which communication is considered instantaneous, the critical path (the longest path from a source to a sink) gives the length of the schedule. So the aim, in this model, is to find a partial order on the tasks, in order to minimize an objective function.

  • In the homogeneous scheduling delay model, each arc (i, j)  E represents the potential data transfer between task i and task j provided that i and j are processed on two different processors. So the aim, in this model, is to find a compromise between a sequential execution and a parallel execution.

These two models have been extensively studied over the last few years from both the complexity and the (non)-approximability points of view (see [17], [9]).

With the increasing importance of parallel computing, the question of how to schedule a set of tasks on a given architecture becomes critical, and has received much attention. More precisely, scheduling problems involving precedence constraints are among the most difficult problems in the area of machine scheduling and they are part of the most studied problems in the domain.

In this paper, we adopt the hierarchical communication model [5] in which we assume that the communication delays are not homogeneous anymore; the processors are connected into clusters and the communications inside a same cluster are much faster than those between processors belonging to different ones.

This model incorporates the hierarchical nature of the communications using today’s parallel computers, as shown by many PCs or workstations networks (NOWs) [24], [1]. The use of networks (clusters) of workstations as a parallel computer [24], [1] has not only renewed the user’s interest in the domain of parallelism, but it has also brought forth many new challenging problems related to the exploitation of the potential power of computation offered by such a system.

Several approaches meant to try and model these systems were proposed taking into account this technological development:

  • One approach concerning the form of programming system, we can quote work [25], [26], [8], [6].

  • In abstract model approach, we can quote work [27], [20], [21], [12], [7], [22], [13] on malleable tasks introduced by [7], [12]. A malleable task is a task which can be computed on several processors and of which the execution time depends on the number of processors used for its execution.

As stated above, the model we adopt here is the hierarchical communication model which addresses one of the major problems that arises in the efficient use of such architectures: the task scheduling problem. The proposed model includes one of the basic architectural features of NOWs: the hierarchical communication assumption i.e., a level-based hierarchy of communication delays with successively higher latencies. In a formal context where both a set of clusters of identical processors, and a precedence graph G = (V, E) are given, we consider that if two communicating tasks are executed on the same processor (resp. on different processors of the same cluster) then the corresponding communication delay is negligible (resp. is equal to what we call interprocessor communication delay). On the contrary, if these tasks are executed on different clusters, then the communication delay is more significant and is called intercluster communication delay.

We are given m multiprocessor machines (or clusters denoted by Πi) that are used to process n precedence-constrained tasks. Each machine Πi (cluster) comprises several identical parallel processors (denoted by πki). A couple (cij, ϵij) of communication delays is associated to each arc (i, j) between two tasks in the precedence graph. In what follows, cij (resp. ϵij) is called intercluster (resp. interprocessor) communication, and we consider that cij  ϵij. If tasks i and j are alloted on different machines Πi and Πj, then j must be processed at least cij time units after the completion of i. Similarly, if i and j are processed on the same machine Πi but on different processors πki and πkj (with k  k′) then j can only start ϵij units of time after the completion of i. However, if i and j are executed on the same processor, then j can start immediately after the end of i. The communication overhead (intercluster or interprocessor delay) does not interfere with the availability of processors and any processor may execute any task. Our goal is to find a feasible schedule of tasks minimizing the makespan, i.e., the time needed to process all tasks subject to the precedence graph.

Formally, in the hierarchical scheduling delay model a hierarchical couple of values (cij, ϵij) will be associated with ϵij  cij ∀(i, j)  E such that:

  • if Πi = Πj and if πki=πkj then ti + pi  tj,

  • else if Πi = Πj and if πkiπkj with k  k′ then ti + pi + ϵij  tj,

  • else Πi  Πjti + pi + cij  tj,

where ti denotes the starting time of the task i and pi its duration. The objective is to find a schedule, i.e., an allocation of each task to a time interval on one processor, such that communication delays are taken into account and that completion time (makespan) is minimized (the makespan is denoted by Cmax and it corresponds to maxiV{ti + pi}). In what follows, we consider the simplest case ∀i  V, pi = 1, cij = c  2, ϵij = c  1 with c  c′.

Note that the hierarchical model that we consider here is a generalization of classical scheduling model with communication delays [9], [11]. Consider, for instance, that for every arc (i, j) of the precedence graph we have cij = ϵij. In such a case, the hierarchical model is exactly the classical scheduling communication delays model. In this article, we study the impact of introducing the notion of hierarchical communications on the hardness of approximating the multiprocessor scheduling problem such that processors of the parallel architecture are partitioned into unbounded numbers of clusters (we study the case where there are only l  4 fully connected processors per cluster, denoted in what follows by P¯(Pl4)).

Using an extension of the classical notation of Lenstra et al. [17], our problems can be written as

  • P¯(Pl4)|prec;(cij,ϵij)=(c,1);pi=1|Cmax and

  • P¯(Pl4)|prec;(cij,ϵij)=(c,c);pi=1|Cmax with c > c′.

Note that the values c and l are considered as constant in the following.

Complexity results: On negative side, Bampis et al. in [4] studied the impact of the hierarchical communications on the complexity of the associated problem. They considered the simplest case, i.e., the problem P¯(P2)|prec; (cij, ϵij) = (1, 0); pi = 1∣Cmax, and they showed that this problem did not possess a polynomial-time approximation algorithm with a ratio guarantee better than 5/4 (unless P=NP). Recently, [15] Giroudeau proved that there is no hope to find a ρ-approximation with ρ < 6/5 for the couple of communication delays (cij, ϵij) = (2, 1). If duplication is allowed, Bampis et al. [3] extended the result of [10] in the case of hierarchical communications, providing an optimal algorithm for P¯(P2)|prec;(cij,ϵij)=(1,0);pi=1;dup|Cmax. These complexity results are given in Table 1.

Remark 1

Note that as far as the problem denoted by P¯(P2)|prec;(cij,ϵij)=(2,1);pi=1|Cmax, in the case of Cmax = 5 (resp. Cmax = 3) is concerned, it is NP-complete (resp. polynomial). For Cmax = 4 we conjecture that there exists a polynomial-time algorithm. These results are in [15].

Approximation results: On positive side, the authors presented in [2] a 8/5-approximation algorithm for the problem P¯(P2)|prec;(cij,ϵij)=(1,0);pi=1|Cmax which is based on an integer linear programming formulation. They relax the integrity constraints and they produce a feasible schedule by rounding. This result is extended to the problem P¯(Pl)|prec;(cij,ϵij)=(1,0);pi=1|Cmax leading to a 4l2l+1-approximation algorithm (see Table 2).

The challenge is to determinate a threshold for the approximation algorithm concerning the two more general problems: P¯(Pl4)|prec;(cij,ϵij)=(c,1);pi=1|Cmax and P¯(Pl4)|prec;(cij,ϵij)=(c,c);pi=1|Cmax with c < c.

In the classical scheduling communication delay model, we know that (see [19]) the decision problem associated with P¯|prec;cij=1;pi=1|Cmax becomes NP-complete even for Cmax  6, and that it is polynomial for Cmax  5 (this problem is denoted in what follows the UET-UCT (Unit Execution Time Unit Communication Time) homogeneous scheduling communication delays problem). Recently, in [16], the authors proved that there is no possibility of finding a ρ-approximation with ρ < 1 + 1/(c + 4) (unless P=NP) for the case where all tasks of the precedence graph have unit execution times, where the multiprocessor is composed of an unrestricted number of machines, and where c denotes the communication delay between two tasks i and j both submitted to a precedence constraint and which have to be processed by two different machines (this problem is denoted in the following UET-LCT (Unit Execution Time Large Communication Time) homogeneous scheduling communication delays problem). The problem becomes polynomial whenever the makespan is at most (c + 1). The case of (c + 2) is still partially opened.

In the same way as for the hierarchical communication delay model, for the couple of communication delay values (1, 0), the authors proved in [4] that there is no possibility of finding a ρ-approximation with ρ < 5/4 (this problem is detailed in following the UET-UCT hierarchical scheduling communication delay problem).

Thus, an interesting question arises: “Does the gap, for the UCT case, between the homogeneous and hierarchical scheduling communication delay problems persist for the LCT case?”

This article provides the answer to this question.

In order to give the threshold for the two problems described below, we prove that the problem of deciding whether an instance of P¯(Pl4)|prec;(cij,ϵij)=(c,1);pi=1|Cmax with c  3 (resp. P¯(Pl4)|prec;(cij,ϵij)=(c,c);pi=1|Cmax with (c > c′, c > 1) has a schedule of length at most (c + 3) is NP-complete. We also extend the non-approximability result in the case of the completion time, denoted in what follows by jCj with Cj = tj + 1. In order to obtain this result, the polynomial-time transformation using in the NP-completeness proof for makespan minimization, and the gap technique proposed by Hoogeveen et al. [18] are used.

We also prove that the problem of deciding whether an instance of P¯(Pl)|prec;(cij,ϵij)=(c,c);pi=1|Cmax has a schedule of length at most (c + 1) is polynomial.

This article is organized as follows: in the next section, we give some preliminary results concerning the problem which will be used for the polynomial-time transformation in order to prove the non-approximability results. In Section 3, we prove that the problem of deciding whether an instance of P¯(Pl4)|prec;(cij,ϵij)=(c,1);pi=1|Cmax has a schedule of length at most (c + 3) is NP-complete. These results will be generalized. We extend this result to the criterion of the sum of the completion times by proving that there is no possibility of finding a ρ-approximation algorithm with ρ strictly less than 1+12c+4. We show that the problem of deciding whether an instance of P¯(Pl)|prec;(cij,ϵij)=(c,c);pi=1|Cmax has a feasible schedule of length at most (c + 1) is solvable in polynomial time. In the last section, we discuss our results.

Section snippets

Preliminary results

In this section, we give one polynomial-time transformation in order to prove the NP-completeness of the problem P2 (the definition of this problem is given below). This problem is used in the polynomial-time transformation for the NP-completeness of scheduling problems.

  • The problem P1: is the problem Monotone-one-in-three-3SAT. Let us first recall the definition of Monotone-one-in-three-3SAT problem.

    • Instance of problem Monotone-one-in-three-3SAT:

      • Let V={x1,,xn} be a set of n boolean variables.

Non-approximability results

In this section, several proofs of NP-completeness will be provided. First, we prove that the problem of deciding whether an instance of P¯(Pl4)|prec; (cij, ϵij) = (c  3, 1); pi = 1∣Cmax having a schedule of length at (c + 3) is NP-complete (see Theorem 3). This result will be generalized in Theorem 6 for the problem P¯(Pl4)|prec;(cij,ϵij)=(c,c);pi=1|Cmax with c > c + 1 > 2.

We also prove the NP-completeness for the problem of P¯(Pl4)|prec;(cij,ϵij)=(3,2);pi=1|Cmax (see Theorem 7). The polynomial-time

Discussion

With previous results, the answer to the question asked at the beginning of this paper is provided in Fig. 5. Since the hierarchical model that we consider here is a generalization of the classical scheduling model with communication delay, for both cases (UET-UCT and UET-LCT cases) the problem becomes more difficult for the hierarchical model.

Moreover, for c  4 the problem is proved to be easier for the hierarchical model. Indeed, in [16], the authors proved that there is no possibility of

Conclusion

In this paper, we first proved that the problem of deciding whether an instance of P¯(Pl4)|prec;(cij,ϵij)=(c3,c<c);pi=1|Cmax having a schedule of length at most (c + 3) is NP-complete. We generalized the results given by Bampis et al. [4] and Giroudeau [15].

This result is to be compared with the result of [19], which states that P¯|prec; cij = 1;pi = 1—Cmax = 6 is NP-complete. Our result implies that there is no ρ  approximation algorithm with ρ<1+1c+3, unless P=NP. In addition, we show that there is

Acknowledgements

The authors thank the two anonymous referees of the journal for their helpful corrections and suggestions which improved the readability of this article.

References (27)

  • R. Blumafe, D.S. Park, Scheduling on networks of workstations, in: 3rd Int. Symp. High Performance Distr. Computing,...
  • B. Chen, C.N. Potts, G.J. Woeginger, A review of machine scheduling: Complexity, algorithms and approximability,...
  • P. Chrétienne et al.

    C.P.M. scheduling with small interprocessor communication delays

    Operations Research

    (1991)
  • Cited by (0)

    View full text