Memory limited algorithms for optimal task scheduling on parallel systems

doi:10.1016/j.jpdc.2016.03.003

Journal of Parallel and Distributed Computing

Volume 92, May 2016, Pages 35-49

https://doi.org/10.1016/j.jpdc.2016.03.003 Get rights and content

Highlights

•
Solves the multi-processor task scheduling problem with communication delays.
•
Two new memory limited optimal task scheduling algorithms are proposed.
•
Their implementations have a small memory footprint.
•
Complimentary approach to the optimal solution, from below and above.
•
Computes non-optimal but quality guaranteed solutions on request.

Abstract

To fully benefit from a multi-processor system, tasks need to be scheduled optimally. Given that the task scheduling problem with communication delays, $P | p r e c, c_{i j} | C_{m a x}$ , is a well known strong NP-hard problem, exhaustive approaches are necessary. The previously proposed A* based algorithm retains its entire state space in memory and often runs out of memory before it finds an optimal solution. This paper investigates and proposes two memory limited optimal scheduling algorithms: Iterative Deepening A* (IDA*) and Depth-First Branch and Bound A* (BBA*). When finding a guaranteed near optimal schedule length is sufficient, the proposed algorithms can be combined, reporting the gap while they run. Problem specific pruning techniques, which are crucial for good performance, are studied for the two proposed algorithms. Extensive experiments are conducted to evaluate and compare the proposed algorithms with previous optimal algorithms.

Introduction

The problem of scheduling tasks with precedence constraints and communication delays onto a set of homogeneous multi-processor system with the objective of minimising the overall finish time is essential and fundamental to speed up task execution on a multiprocessor system. The problem addressed is the classic problem of scheduling task graphs on parallel systems with communication delay, which is $P | prec, c_{i j} | C_{\max}$ in the $α | β | γ$ notation [15], [41]. Optimal scheduling is a well known hard problem (an NP-hard optimisation problem [34]), as the time needed to solve it optimally grows exponentially with the number of tasks. A number of heuristics have been proposed for this classical problem, but they try to produce good rather than optimal schedules, e.g. [24], [28], [16], [31], [37], [44], [46], [7], [18]. For the classical scheduling problem, no $α$ -approximation is known [12]. The only known guaranteed approximation algorithm in [18] has an approximation factor depending on communication costs of the longest path in the schedule. While heuristics often provide good results, there is no guarantee that the solutions are close to optimal, especially for task graphs with high communication costs [40], [39].

Optimal schedules make a significant difference where the schedule is reused multiple times and in time critical systems with applications to flight control, industrial automation, telecommunication systems, consumer electronics, robotics and multimedia systems. This is especially true for schedules in embedded systems, where they are used many times during their life cycle. The lack of good guaranteed approximation algorithms makes this very relevant. Moreover, having optimal solutions for scheduling instances allows to better judge the quality of heuristics and thereby to gain insights into their behaviour. For other NP-hard problems it can be possible to compute the optimal solution for practically relevant problems in reasonable time, e.g. Travelling Salesman Problem [1]. With our work on optimal scheduling, we investigate if we can solve larger problem instances in a reasonable amount of time. The work presented in this paper addresses the optimal solution of the $P | prec, c_{i j} | C_{\max}$ scheduling problem. This is one of the classic models for task scheduling with a wealth of results [22], [23], [43], [8], [10]. As such, solving it optimally has an interest for the community and is also practically used, for example the task-based linear algebra solvers like StarPU [3], [2] use such a model with a simple scheduling heuristic.

Previous approaches to optimally solve this scheduling problem are based on Mixed Integer Linear Programming (MILP) formulations [10], [42] and smart state space enumerations using A* [22]. Both approaches showed strengths and weaknesses. The MILP formulations are less efficient for small numbers of processors and when communication costs are high [43]. They also require powerful solvers, which are like black boxes and make performance predictions difficult. The compact A* scheduling algorithm is very well suited to inject problem specific knowledge to prune the search space, which is crucial for efficient searches [38]. However, the approach suffers from the well known drawback of A*: it runs out of memory very quickly. ILP based solvers use Linear Programming (LP) relaxations instead of exhaustive search of the entire solution space as is the case in A* and its proposed memory limited variants. The use of different techniques to solve the scheduling problem addressed here makes it worthwhile to investigate and compare their performance.

The main contributions of this paper are (1) to overcome the memory limitations by proposing two new algorithms: Iterative Deepening A* (IDA*) scheduling algorithm which employs iterative deepening to limit the memory utilisation of the algorithm and the Depth-First Branch and Bound A* (BBA*) which improves the solution as the search traverses through the state space of the $P | prec, c_{i j} | C_{\max}$ scheduling problem; (2) to propose an exhaustive search approach based on the two algorithms that finds a feasible schedule whose quality with respect to the schedule length is guaranteed, updating the current quality while running; (3) to improve the initial lower bound on the schedule length to reduce the number of states revisited by IDA* during iterative deepening. (4) to investigate and propose new pruning techniques for generic and structure specific graphs to significantly reduce the state (solution) space.

The rest of the paper is organised as follows. Section 2 gives the related work on other approaches used to optimally solve this scheduling problem. Section 3 discusses the task scheduling model and Section 4 details the proposed IDA* and BBA* scheduling algorithms and supplies different methods to improve the $f$ - function calculations that guide the algorithms. The section also proposes the gap calculation method to find a feasible schedule with a guaranteed quality on the schedule length. Section 5 proposes a method to find a good initial lower bound close to the optimal schedule length in order to speed-up the execution of IDA*. Section 6 analyses existing and investigates novel state space pruning techniques that are essential to speed-up the runtime of the algorithm. Duplicate avoidance without memory and processor normalisation without memory are the pruning techniques proposed in this paper. Section 7 evaluates and compares the performance of the proposed algorithms with previous approaches. Section 8 concludes by highlighting the main results of the paper and the significance in using memory limited algorithms to solve the task scheduling problem.

Section snippets

Related work

Given the NP-hardness of the problem, few attempts have been made to solve it optimally. The solution space for the scheduling problem is spawned by all possible processor assignments combined with all possible task orderings. The search space grows exponentially making it impractical already for small task graphs. In this section we discuss the Mixed Integer Linear Programming (MILP) formulations and A* algorithm for the task scheduling problem. We observe the strengths and weaknesses of each

Task scheduling model

Formally, the tasks to be scheduled are represented by a directed acyclic graph (DAG) defined by a 4-tuple $G = (V, E, C, L)$ where $V$ denotes the set of tasks and $E$ represents the set of edges. Each edge $(i, j) \in E$ defines a precedence relation between the tasks $i, j \in V$ . The model assumes a fully connected network of homogeneous multiprocessors $P = {1, \dots, | P |}$ with identical communication links. Each processor may execute several tasks, but each task has to be assigned to exactly one processor, in which it is

Memory limited optimal scheduling algorithms

In order to employ IDA* and BBA* to optimally solve the scheduling problem defined in the previous section we formulate it as a combinatorial problem. Essentially, the solution space to be searched is created by generating all possible processor allocations with all possible task orders. The latter are constrained by the precedence relations of the tasks expressed through the edges of the task graph. Starting with an empty schedule, a new state (i.e. partial schedule) is created by selecting an

Lower bound for IDA* scheduling

A good lower bound close to the optimal schedule length is crucial to reduce the runtime of the IDA* scheduling algorithm. The tighter the lower bound the less iterations are necessary, hence fewer states need to be regenerated. In the input to Algorithm 1, $T \leftarrow S L_{LB}$ assigns the lower bound on the schedule length to the round threshold. This section discusses an improved method to determine a lower bound on the schedule length. The best (maximum of) all lower bounds calculated is assigned to $S L_{LB}$ .

State space pruning

In order to control the exponential explosion of states in the solution space, a number of pruning techniques are investigated in this section. Ideally, we want to employ previously proposed pruning techniques for state space search [22], [36]. However, these techniques often rely on a complete and reliable duplication detection as done in A* with the Open and Closed lists [38]. In A* newly created states are compared to all the states generated before and duplicates are dropped. IDA* and

Experimental evaluation

This section performs an experimental evaluation of the two proposed algorithms, using two approaches. First, we conduct a performance comparison of the proposed IDA* and BBA* task scheduling algorithms against the A* scheduling algorithm in [38] and the MILP formulation SHD-RELAXED and SHD-REDUCED in [43] for a 1 min time-out, varying the number of target processors between 2, 4, 8 and 16 processors. The reason for the 1 min time limit is that we wanted to be able to run many experiments

Conclusion

This paper proposed two new optimal scheduling algorithms named IDA* and BBA*. In contrast to previous algorithms, they use a limited memory and their implementations have a small memory footprint. Their approaches are complimentary, approaching the optimal solution from below and above. Using this nature they can be innovatively employed to compute non-optimal but quality guaranteed solutions. The destructive lower bound was introduced for the proposed algorithms along with other pruning

Acknowledgments

We gratefully acknowledge the anonymous reviewers in improving the quality of the paper. Also, this work is supported by the Marsden Fund Council from Government funding, Grant 9073-3624767, administered by the Royal Society of New Zealand and additional postgraduate support from the University of Auckland.

Sarad Venugopalan received his Bachelor degree in Computer Science and Engineering from Siddaganga Institute of Technology, Karnataka, India and a Master (by Research) degree in Wireless Communication from Madras Institute of Technology, Anna University Chennai, India. He is a Marsden Fund scholar and pursuing his Ph.D. at the University of Auckland, New Zealand working on Optimal Task Scheduling for Parallel Systems. His other research interests include computer algorithms and cryptology.

References (46)

R.L. Graham et al.
Optimization and approximation in deterministic sequencing and scheduling: A survey
Ann. Discrete Math.
(1979)
T. Hagras et al.
A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems
Parallel Comput.
(2005)
Richard E. Korf
Depth-first iterative-deepening: An optimal admissible tree search
Artif. Intell.
(1985)
Yu-Kwong Kwok et al.
On multiprocessor task scheduling using efficient state space search approaches
J. Parallel Distrib. Comput.
(2005)
U.K. Sarkar et al.
Reducing reexpansions in iterative-deepening search by controlling cutoff bounds
Artificial Intelligence
(1991)
Oliver Sinnen
Reducing the solution space of optimal task scheduling
Comput. Oper. Res.
(2014)
B. Veltman et al.
Multiprocessor scheduling with communication delays
Parallel Comput.
(1990)
Tao Yang et al.
List scheduling with and without communication delays
Parallel Comput.
(1993)
David L. Applegate et al.
Cédric Augonnet et al.
StarPU: a runtime system for scheduling tasks over accelerator-based multicore machines. Research Report RR-7240
(2010)

Cédric Augonnet et al.

StarPU: A unified platform for task scheduling on heterogeneous multicore architectures

Concurr. Comput.: Pract. Exper.

(2011)

Armin Bender

MILP based task mapping for heterogeneous multiprocessor systems

Peter Brucker et al.

A linear programming and constraint propagation-based lower bound for the RCPSP

European J. Oper. Res.

(1998)

Parallel machines—linear programming and enumerative algorithms

Ann. Oper. Res.

(1986)

E.G. Coffman et al.

Optimal scheduling for two-processor systems

Acta Inform.

(1972)

Abhijit Davare et al.

Classification, customization, and characterization: Using MILP for task allocation and scheduling. Technical Report UCB/EECS-2006-166

(2006)

Tatjana Davidović et al.

Mathematical programming-based approach to scheduling of communicating tasks. Technical report

(2004)

T. Davidović, L. Liberti, N. Maculan, N. Mladenovic, Towards the optimal solution of the multiprocessor scheduling...

Rina Dechter et al.

Generalized best-first search strategies and the optimality of A*

J. ACM

(1985)

Maciej Drozdowski

Scheduling for Parallel Processing

(2009)

Christodoulos A. Floudas et al.

Mixed integer linear programming in process scheduling: Modeling, algorithms, and applications

Ann. Oper. Res.

(2005)

Satoshi Fujita

A branch-and-bound algorithm for solving the multiprocessor scheduling problem with improved lower bounding techniques

IEEE Trans. Comput.

(2011)

R.C. Hull, A. Winter, A short introduction to the gxl software exchange format, in: Reverse Engineering, 2000....

Cited by (0)

Oliver Sinnen received his Diploma degree in electrical and computer engineering from RWTH Aachen University, Germany. He did his Ph.D. studies at Instituto Superior Técnico (IST), Technical University of Lisbon, completed in 2003. Since 2004, he has been a (Senior) Lecturer with the Department of Electrical and Computer Engineering at the University of Auckland, New Zealand. He authored the book “Task Scheduling for Parallel Systems”, published by Wiley. His research interests include parallel computing and programming, scheduling and reconfigurable computing.

View full text

Memory limited algorithms for optimal task scheduling on parallel systems

Highlights

Abstract

Introduction

Section snippets

Related work

Task scheduling model

Memory limited optimal scheduling algorithms

Lower bound for IDA* scheduling

State space pruning

Experimental evaluation

Conclusion

Acknowledgments

Ann. Discrete Math.

Parallel Comput.

Artif. Intell.

J. Parallel Distrib. Comput.

Artificial Intelligence

Comput. Oper. Res.

Parallel Comput.

Parallel Comput.

StarPU: a runtime system for scheduling tasks over accelerator-based multicore machines. Research Report RR-7240

StarPU: A unified platform for task scheduling on heterogeneous multicore architectures

Concurr. Comput.: Pract. Exper.

MILP based task mapping for heterogeneous multiprocessor systems

A linear programming and constraint propagation-based lower bound for the RCPSP

European J. Oper. Res.

Parallel machines—linear programming and enumerative algorithms

Ann. Oper. Res.

Optimal scheduling for two-processor systems

Acta Inform.

Classification, customization, and characterization: Using MILP for task allocation and scheduling. Technical Report UCB/EECS-2006-166

Mathematical programming-based approach to scheduling of communicating tasks. Technical report

Generalized best-first search strategies and the optimality of A*

J. ACM

Scheduling for Parallel Processing

Mixed integer linear programming in process scheduling: Modeling, algorithms, and applications

Ann. Oper. Res.

A branch-and-bound algorithm for solving the multiprocessor scheduling problem with improved lower bounding techniques

IEEE Trans. Comput.