Task scheduling in multiprocessing systems using duplication☆
Introduction
The main focus of task scheduling is to find an optimal schedule which minimizes the parallel execution time of an application. An application can be broken down into a set of tasks. We represent an application using a directed acyclic graph (DAG) G defined by the tuple , where V, the nodes, represents the set of tasks and E, the edges, represents the communication between tasks. For any task denotes the computational cost. An edge from to , where , represents the precedence relationship between these two tasks. The communication cost from to is denoted by . If , then is called the immediate predecessor of and is called the immediate successor of . Given a task , its set of immediate predecessors, denoted by , is defined as . A task is called a join task, if it has two or more immediate predecessors. The immediate successors of is denoted by and is defined as . is the set of all the descendants of task , and is defined as . It is assumed that there is one entry task and one exit task for the DAG, and the number of processors available is ∣V∣. The task scheduling algorithms have applications in distributed as well as grid computing environments [1], [2].
Task scheduling is one of the most challenging problems in both parallel and distributed computing environments. Whether duplication techniques are employed or not, scheduling is a well-known NP-complete problem [6], [7], [8]. In order to deal with this problem, many heuristic methods have been proposed [2], [1], [10]. Some of these methods produce an optimal schedule only when some constraints are satisfied. For example, for task graphs with small communication delays, Colin and Chretienne [4] proposed a (Lower Bound) LWB algorithm that gives an optimal solution, provided that the following condition is always true in the DAGfor any join task . That is, for any join task , the computational cost of each immediate predecessor is not less than the largest incoming communication cost with respect to . Task duplication can reduce the completion time of a task, since there will be no communication cost if the predecessors of the task are duplicated on the same processor.
Improvements by Darbha and Agrawal [5] on the LWB algorithm’s optimality criterion allowed for the inclusion of DAGs with wider communication and computational costs, in their task duplication based scheduling (TDS) algorithm. This optimality criterion is as follows: if join task has k immediate predecessors, and immediate predecessors and are the highest and second highest values with respect to ready times or data arrival times of , then an optimal schedule is guaranteed with the following earliest start time (est) and earliest completion time (ect)andprovided that one of the following conditions are metPark and Choe [9] proposed an improved TDS algorithm, which we will refer to as the ITDS algorithm in this paper. The ITDS algorithm is guaranteed to obtain, for any task , the earliest start time, and the earliest completion time . The ITDS algorithm approach is as follows: for any task , there is a corresponding cluster which is obtained by a merging operation on its immediate predecessors’ clusters. Therefore, if task has k immediate predecessors, ,with corresponding clusters , sorted in order of ready times (the ready time of with respect to is defined as and is equal to , i.e , thenandwhere is the smallest index between 1 and k such thatHowever, and are optimal only if the condition given below is satisfied; that is, if the communication overhead is relatively highThe optimality check is done as follows. For every task in , find its earliest completion time. Next, for each immediate successor of this task found in , find the maximum of: the communication cost to, and computational of, the immediate successor; plus the computational cost of each descendant of this immediate successor also found in . Add this maximum to the earliest start time; if the smallest of these totals is greater than or equal to the completion time of , then is optimal.
The ITDS algorithm has several drawbacks. Firstly, it does not always produce an optimal solution, since the optimality criteria, for a join task, is not taken into account during scheduling. Secondly, it seems that it does not specify how to obtain an optimal solution, if the solution provided is not optimal. Finally, the condition that the communication cost must be relatively high is not clearly defined, and there are occasions when the ITDS algorithm finds an optimal schedule with tasks having a communication cost as low as one (1) unit. This paper presents a task scheduling algorithm which uses duplication, to optimally schedule any application represented in the form of a directed acyclic graph (DAG).
Ruan et al. [11] proposed an algorithm which took into account the drawbacks mentioned above, and reported further improvements over the ITDS algorithm. However, Ruan et al’s algorithm does not always produce an optimal solution. Firstly, their approach assumes that the earliest start time for a join task is always found in the cluster with the longest ready time, . The example in Fig. 1 illustrates why their approach does not always produce an optimal solution.
For the DAG given in Fig. 1a, Fig. 1b shows the clusters and which are the immediate predecessors of task , and each having a completion time of 10. Since has the longest ready time to task , it should be considered as a possible candidate with respect to where should be scheduled. With Ruan et al’s approach [11], is considered as the only candidate. Therefore, to reduce the communication cost of 10 from , is merged with . The algorithm then proceeds to check the immediate predecessor of task . It can be seen that , but the start time of task on is 10, therefore this leads to having to be merged with . In Fig. 1c (i), the algorithm proposed by Ruan et al. [11] finds . However, in contrast, if cluster is considered, as shown in Fig. 1c (ii), it can be shown that as follows. First, is merged with and then task is considered. Since and the start time of task on is also 10, there is no need to schedule on . Secondly, Ruan et al’s algorithm [11] does not always take advantage of the opportunity to reduce the start time when a task and its immediate successor are scheduled on different processors. In the above example, if instead of 14, there would have been no change to the earliest start time of task on with Ruan et al’s approach and would have remained as 13. Clearly on , if is considered. It may also be noted that since the tasks scheduled on a particular processor are not always sorted by task numbers, it appears that the complexity of Ruan et al’s algorithm [11] will degenerate to in the worst-case.
The algorithm proposed in this paper does not require any assumption in terms of the computation and communication costs of the tasks in the DAG, unlike those used in [5], [9]. Also at the same time, unlike the algorithm of Ruan et al. [11], the proposed algorithm will always produce the shortest schedule. It is shown that the proposed algorithm has a time complexity of , where ∣V∣ represents the number of tasks and d the maximum indegree of tasks. The rest of this paper is organized as follows: The basis of the algorithm is described in Section 2. The proposed algorithm is shown in Section 3. In Section 4 we establish that the algorithm is optimal. Section 5 provides the complexity analysis on the algorithm and how it compares to other algorithms. Section 6 provides an illustrative example of the proposed algorithm. Section 7 presents a comparative study of the proposed algorithm with other works. Finally, Section 8 concludes the paper.
Section snippets
Notation and terminology
The proposed algorithm uses the following notation and terminology:
processor where task is the last task to be completed on it
earliest start time of task on processor
earliest start time of task and is the same as
completion time of task on processor
earliest completion time of task and is equal to
start time of task on processor
completion time of task on processor
ready time of task
The algorithm
A detailed description of the algorithm is given below.
Algorithm Scheduling
// Input: A DAG , with entry task
// Output: The earliest start time and the earliest completion time for each
// task It also produces, for each processor , a list of tasks to be
// scheduled on it.
;
;
← Topological ordering of nodes in the DAG
while do
Optimality condition
In this section we establish that the algorithm schedule_task() is optimal. Theorem 1 For any join task with k immediate predecessor tasks where for , the proposed algorithm constructs correctly and that the can be obtained using the following equation
Complexity analysis
Using a breath-first search to obtain a ready task, the complexity of the ready task search is . This search is done by algorithm Scheduling() which also calls procedure schedule_tasks() for the calculation of the earliest start time of each task , . The procedure schedule_tasks() is executed ∣V∣ times. Let d be the maximum indegree of a task, then any join task has at most d immediate predecessors. Although , for all practical purposes . The procedure schedule_
An illustrative example
For a given DAG, Fig. 6a–d shows how the proposed algorithm schedules each task optimally. The is 0. The values , and , the immediate successors of , are easily calculated since they have only one immediate predecessor, therefore are 2, 6, 4, 6 and 3 respectively as shown in Fig. 6b. Next we consider task which is a join task and has three immediate predecessors. These immediate predecessors, in order of
Performance and comparison
The schedule length, also referred to as the makespan (see, e.g [2], [11]) is the main performance measure of a scheduling algorithm. However, since a large set of task graphs with different properties is used in our experiments, it becomes necessary to normalize the schedule length with respect to the critical path. This normalized value of the schedule length is referred to as the schedule length ratio (SLR). The SLR value is defined as follows:The minimum critical
Conclusion
In this paper, a task scheduling algorithm which uses duplication, to optimally schedule any application represented in the form of a directed acyclic graph (DAG), was proposed. The proposed algorithm does not require any assumption in terms of the computation or communication costs of the tasks, unlike those in [5], [9]. Moreover, the proposed algorithm always finds the best possible schedule, which may not always be the case with Ruan et al’s algorithm [11]. The proposed algorithm produces an
Acknowledgements
The authors wish to thank the anonymous referees for their comments and suggestions on an earlier version of the manuscript which have greatly enhanced the readability of the paper. The authors also wish to thank Prof. L. Moseley for his interest and support in this work.
Pranay Chaudhuri received his B.Sc. (Physics) and B.Tech. (Electronics) from Calcutta University. He obtained his M.E. and Ph.D., both in Computer Science & Engineering, from Jadavpur University, Calcutta. At present he is holding the position of Professor of Computer Science and Head of the Department of Computer Science, Mathematics and Physics at the University of the West Indies, Cave Hill Campus, Barbados. Professor Chaudhuri has held faculty positions at the Indian Institute of
References (11)
- et al.
Optimization and approximation in deterministic sequencing and scheduling: a survey
Ann. Discr. Math.
(1979) - et al.
Analysis and evaluation of heuristic methods for static scheduling
J. Parallel Distr. Comput.
(1990) - et al.
On exploiting task duplication in parallel program scheduling
IEEE Trans. Parallel Distr. Syst.
(1998) - et al.
An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems
IEEE Trans. Parallel Distr. Syst.
(2003) - P. Chaudhuri, J. Elcock, Scheduling on multiprocessors using task duplication, in: Proceedings of the 2005 Design,...
Cited by (0)
Pranay Chaudhuri received his B.Sc. (Physics) and B.Tech. (Electronics) from Calcutta University. He obtained his M.E. and Ph.D., both in Computer Science & Engineering, from Jadavpur University, Calcutta. At present he is holding the position of Professor of Computer Science and Head of the Department of Computer Science, Mathematics and Physics at the University of the West Indies, Cave Hill Campus, Barbados. Professor Chaudhuri has held faculty positions at the Indian Institute of Technology, James Cook University of North Queensland, University of the New South Wales, and Kuwait University prior to joining the University of the West Indies in 2000. Professor Chaudhuri’s research interests include Parallel and Distributed Algorithms, Self-stabilization, and Graph Theory. He has published extensively in leading international journals and is the author of a book entitled, Parallel Algorithms: Design and Analysis (Prentice-Hall, 1992). Professor Chaudhuri is the recipient of several national and international awards for his research contributions. He is also the recipient of the Vice-Chancellor’s Award for Excellence 2007 at the University of the West Indies for all-round excellent performance in research and service to the University community.
Jeffrey Elcock received his B.Sc. (Mathematics and Computer Science) from the University of the West Indies, Cave Hill Campus, Barbados, and M.Sc. (Computation) from Oxford University. His research interests include Algorithms and Complexity, Distributed Systems and Grid Computing. Currently, Mr. Elcock is pursuing his doctoral research in Computer Science at the University of the West Indies.