Dynamic mapping of cooperating tasks to nodes in a distributed system

https://doi.org/10.1016/j.future.2004.05.032Get rights and content

Abstract

Networks of workstations (NOWs) are a low-cost and widespread platform for parallel computing. This paper focuses on the dynamic task-scheduling problem in NOW environments. The aim is to minimize the completion time of parallel programs by distributing cooperating concurrent tasks to homogeneous networked nodes. Cooperation dependencies as well as creation and termination dependencies between tasks are taken into account. An event lattice model is introduced to describe past, actual and future behavior of a parallel program in execution. Based on this model an algorithm is presented to dynamically assign tasks to the nodes of a dedicated distributed system. Crucial for the efficiency of this approach is a top-down construction of all operating system entities involved in distributed resource management, particularly the close cooperation of the compiler and runtime system, which allows the creation of event lattice clippings at runtime.

Introduction

Networks of workstations (NOWs) have become a widespread and cost-effective platform for parallel computing. They are usually created through the use of a switch-based interconnection network with switch ports connected to other switch ports or to processors through an I/O bus interface. As this network technology was not initially developed for parallel processing but for infrequent point-to-point transfers of large volumes of data, the communication overhead among workstations is quite high. Message passing typically requires the intervention of the operating system at each send or receive operation. Such interventions dramatically increase the latency of the operation to the extent that the interconnection network capabilities become secondary. As a result, it is difficult to obtain performance gains in network-based cluster computing.

Scheduling tasks to nodes in NOW environments is a difficult and challenging problem. Without good task assignment, the time for the distributed execution of programs can be even longer than the execution time of the same programs on a single node (see Section 6). As the global scheduling problem is NP-complete in its general form [1], [2], [3], no optimal solutions can be found without restrictions. Researchers have explored a number of approximate calculations and heuristics [4], [5], [6], [7], each producing good results under different circumstances. Most of the research projects concentrate on static scheduling where the mapping of tasks to processors is done before program execution begins. Static scheduling methods are usually processor non-preemptive.

This paper presents a decentralized algorithm to match cooperating tasks to machines of a dedicated distributed environment dynamically (online). Tasks may cooperate in a client–server style by synchronous rendezvous (message passing). In addition new tasks can be created dynamically using fork-like operations (explicit task parallelism). Delays resulting from implicit start- and stop-synchronization, which takes place between a generating and a newly generated task as well as a terminated task and its creator, are taken into account. The model normally used to describe the input for global scheduling is a directed acyclic graph (DAG). This is not adequate to describe cooperation between tasks except for start and termination. Therefore, a model based on event lattices has been developed to specify the input for our global scheduling algorithm. The model is introduced in Section 4. In our MoDiS project (see Section 3), the generation of the model is prepared by static compiler analysis. The information is augmented dynamically and used by the runtime system whenever a new task is forked.

The rest of the paper is organized as follows. The next section briefly discusses related work. Section 3 describes the environment the scheduling algorithm presented in this paper is aimed for. Section 4 presents a parallel task model based on event lattices. This model is later used to describe the input for the scheduling decision. It can be derived from the output of the compiler analysis combined with some information from the runtime system. In Section 5, a scheduling algorithm for deterministic parallel programs without conditional branches and while loops is presented. The quality of the algorithm is analyzed in Section 6. Section 7 shows a way to apply this approach to programs with nondeterministic future behavior by using predictions. Section 8 summarizes the results of this paper.

Section snippets

Related work

There exist a lot of publications in the field of global task scheduling. They can be separated into two categories: mathematical approaches making unrealistic and impractical assumptions and bottom-up oriented attempts [8], [9] making placement or migration decisions based on runtime measurings such as processor load or run queue length.

Some of the mathematical approaches deal with optimal solutions and provable facts, others with the design and evaluation of heuristics with polynomial time

The MoDiS approach

The global scheduling decision presented in this paper has been developed in the context of the distributed system architecture MoDiS (Model oriented Distributed System) [14], [15], which is featured by a top-down driven and language-based approach combined with structuring facilities. New applications dynamically extend the running system, forming a global structure which can be exploited to improve distributed resource management. The grammar for the formal MoDiS concepts is provided by the

Event lattices

To model the input for task scheduling in parallel and distributed systems different representations are used. When precedence constraints among tasks need to be enforced, parallel programs are often represented as a directed acyclic graph (DAG). If there are no precedence constraints but communication dependencies among tasks, an undirected graph called task interaction graph [17] (TIG) is normally used. All those models are not sufficient to describe the characteristics of INSEL programs.

Scheduling deterministic programs

After introducing the model to describe the past and expected future progression of a MoDiS system in Section 4, this section concentrates on the algorithm to place newly created actors on the nodes of a distributed system. We only consider deterministic programs without while-loops and conditional branches. This assumption reduces the amount of possible future program execution paths to one. To schedule nondeterministic programs, the algorithm can be applied to a number of possible execution

Evaluation

The quality of global scheduling in MoDiS has been evaluated comparing the placement decision based on the sd–lp-ratio with a random, a server-initiated (SBidding) and a receiver-initiated bidding strategy (RBidding). For this purpose INSEL programs have been classified into two categories, which are defined by the intensity of actor cooperation. The results are summarized in Fig. 3, Fig. 4. In both diagrams, the number of nodes is plotted on the abscissa, the ordinate shows the normalized

Extensions for nondeterminism

The results from the last section show that without regarding future behavior when scheduling cooperating tasks it is impossible to exploit the potential of parallel and distributed hardware architectures. Hence, a problem arises when scheduling nondeterministic programs in a distributed environment where different future behaviors are possible. The nondeterminism is mainly a result of conditional branches and while loops and makes it difficult for the management to predict the future behavior

Conclusion

The dynamic scheduling strategy presented in this paper makes a trade-off between costs and quality. By only taking a small clipping of the future behavior of a parallel program into account and by using a greedy strategy computation costs for the placement decision are reduced. Therefore, sometimes suboptimal schedules are chosen. Tests have shown that our approach outperforms scheduling strategies that do not use any knowledge about the parallel application. The expense for rating the nodes

Christian Rehn received his Master's Degree in computer science from the Technische Universität München, Germany, in 1998. Since 1998, he is a research staff member of the Distributed Operating Systems Group headed by Professor Dr. Peter Paul Spies of the computer science department at the Technische Universität München, where he participates in the basic research program “Methods and Concepts for the Construction of Distributed, Parallel and Cooperative Systems”. He received his PhD in

References (17)

  • L. Wang et al.

    Task matching and scheduling in heterogenous computing environments using a genetic-algorithm-based approach

    J. Parallel Distributed Comput.

    (1997)
  • M.R. Garey et al.

    Computers and Intractability—A Guide to the Theory of NP-Completeness

    (1979)
  • P. Chretienne

    Task scheduling over distributed memory machines

  • V. Sarkar

    Partitioning and Scheduling Parallel Programs for Multiprocessors

    (1989)
  • Y.-K.K.I. Ahmad et al.

    Analysis, evaluation, and comparison of algorithms for scheduling task graphs on parallel processors

  • T.L. Casavant et al.

    A taxonomy of scheduling in general-purpose distributed computing systems

    IEEE Transact. Software Eng.

    (1988)
  • E.G. Coffman

    Computer and Job-Shop Scheduling Theory

    (1976)
  • H. El-Rewini et al.

    Task scheduling in multiprocessing systems

    Computer

    (1995)
There are more references available in the full text version of this article.

Cited by (2)

  • Task distribution with a random overlay network

    2006, Future Generation Computer Systems
    Citation Excerpt :

    Similar designs are proposed in [2,6,9]. The Wire Speed Grid Project [23] proposes an architecture in which the task allocation is performed hardware accelerated on the network routers. [11] proposes a model for the allocation of cooperating tasks on networks of workstations.

  • An improved algorithm for Alhusaini's algorithm in heterogeneous distributed systems

    2007, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Christian Rehn received his Master's Degree in computer science from the Technische Universität München, Germany, in 1998. Since 1998, he is a research staff member of the Distributed Operating Systems Group headed by Professor Dr. Peter Paul Spies of the computer science department at the Technische Universität München, where he participates in the basic research program “Methods and Concepts for the Construction of Distributed, Parallel and Cooperative Systems”. He received his PhD in computer science from the Technische Universität München in 2004. His major research activities focus on the design and implementation of the language-based distributed operating system MoDiS-OS. Besides his academical involvement, Christian Rehn is managing director of the itestra GmbH which has a major focus on software maintenance and evolution. His major research interests are parallel and distributed processing, scalable and adaptable distributed resource management, compilation and dynamic evolution in distributed computing environments.

View full text