Design and analysis of a non-preemptive decentralized load balancing algorithm for multi-class jobs in distributed networks

https://doi.org/10.1016/j.comcom.2004.02.002Get rights and content

Abstract

In this paper, we propose a static, decentralized load balancing algorithm for handling multi-class jobs in distributed network system for minimizing the mean response time of a job, using the concept of virtual routing. We formulate the problem as a constrained non-linear minimization problem with job flow-rate, communication delays, and processing delays, as constraints. We employ a novel approach to transform the formulated problem into an equivalent routing problem and propose an algorithm, referred to as load balancing via virtual routing (LBVR), to seek an optimal solution, whenever it exists. We show that the design of the proposed algorithm subsumes several interesting properties and guarantees to deliver a super-linear rate of convergence in obtaining an optimal solution, whenever it exists. Also, when the variation of mean link delays is assumed to be a convex function, we show that the solution generated by our LBVR algorithm is indeed an optimal solution, whereas, when the above variation is assumed to be non-convex, we derive a necessary condition for an optimal solution. With rigorous experiments we test our algorithm in terms of its rate of convergence and quality of solution to quantify its performance. We demonstrate the complete workings of our algorithm using an illustrative example in a systematic fashion, for ease of understanding.

Introduction

Minimizing the mean response time (MRT) of the jobs submitted for processing in a parallel/distributed system is a critical performance metric to be considered for improving the overall performance of the system. Load balancing algorithms thrive to meet this objective of minimizing the MRT, the average time interval between the time instant at which a job is submitted and the time instant at which the job leaves the system after processing. The design of such load balancing algorithms, in general, considers several influencing factors. For instance, the underlying network topology, job arrival rates at each processor in the system, communication network bandwidth/traffic, etc. Further, while considering job characteristics, there may exist several variations, such as priority assignment for jobs in processing, jobs with or without deadlines, etc. In the exiting literature, several combinations of the above types and other issues such as sender-initiated strategies, receiver-initiated strategies are designed for load balancing [1].

Further, while balancing the loads, certain type of information such as the number of jobs waiting in the queue to be processed, the current job rates arriving, etc. at each processor as well as in neighboring processors may be exchanged among the processors for improving the overall performance. Based on the information that can be used, load balancing algorithms are classified as either dynamic or static. A dynamic algorithm makes its decision according to the state of the system, where the state could refer to the above-mentioned information [2], [4], [7], [28]. On the other hand, a static algorithm is carried out by a predetermined policy, without considering the state of the system [3], [5], [8].

In this paper, we attempt to formulate a static load balancing algorithm as a non-linear constrained optimization problem. Specifically, we consider the following real-life situation in our problem setting/definition. We consider a network of processors to which several classes of jobs arrive with a constant flow-rate for processing. Each processor may receive one or more classes of jobs and considers the entire set of jobs submitted for processing to it as its total input load. As with the principle of load balancing, jobs are allowed to migrate from heavily loaded processors to lightly load processors for minimizing the MRT. We assume that the underlying network has an arbitrary topology that incurs non-zero finite communication delays while transferring jobs between processors. Further, for each class of job, the communication delay is modeled as a non-linear function that depend on the network traffic, and that the delays are different on different links. Consequently, the nature of this function, either as a convex or a non-convex, influences the optimality of the solution. Also, we assume that each class of jobs demands different processing rate, depending on the nature of the jobs. All these influencing factors are captured as constraints in our optimization problem. Further, we consider a non-preemptive style of processing of the jobs at a processor, i.e. a job that is currently being processed cannot be interrupted by any other class of job for processing.

We now discuss some related literature to our problem context below and then point out to some key differences in the problem formulation, solution approaches, and the techniques used to solve the problem.

Numerous studies have been conducted on a variety of load balancing algorithms in the literature. In the following, we present some related studies that are very close to our contributions in this paper. An excellent compilation of most of the load balancing/sharing algorithms until 1992 can be found in Ref. [9]. For a dynamic load balancing algorithm, it is unacceptable to frequently exchange state information because of the high communication overheads. In order to reduce the communication overheads, Anand et al. [2] proposed an estimated load information scheduling algorithm (ELISA) and Michael [15] analyzed the usefulness of the extent to which old information can be used to estimate the state of the system. In Refs. [6], [10], [11], the authors have proven the correctness using randomization techniques, leading to an exponential improvement in balancing the loads. However, in Refs. [22], [29], it was pointed out that the static load balancing algorithms are preferable when the system loads are moderate or light or when the communication overheads are still high.

Static load balancing algorithms are widely used in large-scale simulations [24], parallel program [25], etc. For static algorithms, there are some differences among the network configurations. Kim and Kameda [31] proposed two algorithms for static load balancing in star and bus network configurations, respectively. Also, for an optimal static load balancing, Li and Kameda [30] studied the tree network configurations with two-way traffic. In Refs. [5], [8], the algorithms proposed were concerned about an arbitrary network configuration, and hence, became more applicable in a practical distributed system. However, the contributions mentioned above considered only a single class of jobs. In practice [32], the jobs in the system were divided into several classes and each class of jobs had its own priority. The study of multi-class jobs makes the system more flexible to handle different classes of jobs and is a right step in generalizing the study.

We discuss on some more related work later in Section 4.1, while discussing the correspondence between a load balancing problem and a routing problem.

Our contributions are summarized as follows. The problem addressed in this paper is closely related to the earlier works reported in Refs. [5], [8], [3], however, the key differences are as explained below. In Refs. [5], [8], the formulated non-linear constrained optimization problem considers only one class of jobs and shows that the delay functions are indeed convex and increasing functions. Whereas, in Ref. [3] the delay functions are assumed to be convex and the proposed algorithm is proven to be faster than the standard flow deviation (FD) algorithm [21], consuming larger amount of computations in carrying out certain inverse functions. Also, the formulated problem in this work considers the process of load distribution in a different manner. For instance, for each class of jobs and for each processor, say i, the neighboring processors are categorized into four different sets such that, processors in each set send the jobs of this class to a processor i based on certain rules. The rationale for this may be driven from application needs.

In our formulation, we relax this assumption and we consider all the jobs of a class that arrive at node i as a cumulative amount regardless of their origin. Thus, in our model, each processor is considered as an unbiased resource capable of processing the submitted jobs. Also, the delay functions are considered as arbitrary non-linear functions for the analysis to be more generic. Of course, as a possible extension, we also analyze the performance when convexities of the delay functions are to be considered. As a solution approach, we propose a novel methodology for the posed problem. We transform the problem into a routing problem and derive an optimal solution to the transformed problem. The correspondence between the load balancing problem and the routing problem is also discussed. Significant advantages of our proposed algorithm are discussed in Section 5. Thus, in this paper, we propose a static, decentralized load balancing algorithm for multi-class jobs in distributed network systems for minimizing the MRT of a job, using the concept of virtual routing.

The organization of the paper is as follows. In Section 2, we present a mathematical model and the problem definition. In Section 3, we formulate the problem and discuss on the solution approach. In Section 4, we propose our algorithm and derive conditions for obtaining an optimal solution. We prove several important properties used in the design of the proposed algorithm and derive its rate of convergence. In Section 5, we report all our experimental results and present a detailed illustrative example to show the complete workings of our proposed algorithm for the ease of understanding. We shall also present simulation study to quantify the performance in terms of rate of convergence and solution quality. We highlight our contributions and discuss on possible future extensions in Section 6.

Section snippets

Mathematical model

In this section, we shall formally introduce the problem we are tackling and propose the mathematical model used in our analysis. We will also introduce the required terminology, notations, and definitions that are used throughout the paper. In the following, we shall first describe the characteristics of the system, types of jobs that arrive to the system, and the resource constraints that are considered in the problem context.

Problem formulation and solution approach

In this section, based on the model introduced in Section 2, we shall formally define the problem that we want to address. In essence, we formulate the problem as a real-valued optimization problem with the objective of minimizing the MRT defined in Eq. (2). We state the followingMinimizeD(β,x)=1Φk∈Ji∈NβikPikik)+(i,j)∈LxijkGijk(xijk)subject to:φik+j∈Vixjikik+j∈Vixijk,k∈J,i∈N,φik≥0,k∈J,i∈N,xijk≥0,(i,j)∈L,k∈J.Thus, the solution to our problem lies in determining the optimal values of β

Proposed algorithm and an optimal solution

Once the load balancing problem has been transformed into a routing problem, there are several algorithms proposed in the literature to solve such problems [16], [17], [18], [19]. However, in our problem context, it is important to establish the correspondence between the problems of routing and load balancing. Since our solution methodology proposes to use routing to balance the load in the network, we refer to our algorithm as load balancing via virtual routing (LBVR). We shall first present

Experimental results and discussions

Our algorithm LBVR retains two important features of practical interest: one is its decentralized style of working and the other is its simple structure, in terms of implementation ease. To appreciate one of the strengths of our algorithm, consider Fig. 3. In this figure, if the heavily loaded node 1 wants to send some jobs to lightly loaded node 3, when the failure of link (1,3) is occurred. In this case, our proposed algorithm LBVR considers adding a multi-hop routing path (1→2→3) for node 1

Conclusions

A novel super-linear static load balancing algorithm, referred to as LBVR, using the concept of data routing in computer networks, has been proposed in this paper. The objective is to minimize the MRT of the jobs that arrive to a distributed system for processing. Our formulation considers a multi-class jobs that arrive at processors of a distributed network system for processing. The jobs are considered for processing in a non-preemptive manner. In this paper, we have proposed a novel solution

Acknowledgements

The authors would like to thank Professor Jie Li, University of Tsukuba, Japan, for his valuable suggestions to their work in terms of clarifying certain key issues and implementation aspects of the algorithm reported in Ref. [3]. The comparison study conducted in this paper used the LK algorithm and the authors thank his guidance in carrying out an exact implementation of their algorithm. The authors would also like to thank the anonymous reviewers for their valuable suggestions, which

References (33)

  • L Anand et al.

    ELISA: an estimated load information scheduling algorithm for distributed computing system

    Computers and Mathematics with Applications

    (1999)
  • M.J Zaki et al.

    Customized dynamic load balancing for a network of workstations

    Journal of Parallel and Distributed Computing

    (1997)
  • N.G Shivaratri et al.

    Load distributing for locally distributed systems

    Computer

    (1992)
  • J Li et al.

    Load balancing problems for multiclass jobs in distributed/parallel computer systems

    IEEE Transactions on Computers

    (1998)
  • J Watts et al.

    A practical approach to dynamic load balancing

    IEEE Transactions on Parallel and Distributed Systems

    (1998)
  • K.W Ross et al.

    Optimal load balancing and scheduling in a distributed computer system

    Journal of the Association for Computing Machinery

    (1991)
  • Y Amir et al.

    An opportunity cost approach for job assignment in a scalable computing cluster

    IEEE Transactions on Parallel and Distributed Systems

    (2000)
  • G Manimaran et al.

    An efficient dynamic scheduling algorithm for multiprocessor real-time systems

    IEEE Transactions on Parallel and Distributed Systems

    (1998)
  • A.N Tantawi et al.

    Optimal static load balancing in distributed computer systems

    Journal of the Association for Computing Machinery

    (1985)
  • B Shirazi et al.

    Scheduling and Load Balancing in Parallel and Distributed Systems

    (1995)
  • A.E Kostin et al.

    A randomized contention-based load-balancing protocol for a distributed multiserver queuing system

    IEEE Transactions on Parallel and Distributed Systems

    (2000)
  • M Mitzenmacher

    The power of two choices in randomized load balancing

    IEEE Transactions on Parallel and Distributed Systems

    (2001)
  • D.J Wilde et al.

    Foundations of Optimization

    (1967)
  • N.U Prabhu

    Foundations of Queuing Theory

    (1997)
  • M Avriel

    Nonlinear Programming Analysis and Methods

    (1997)
  • M Mitzenmacher

    How useful is old information?

    IEEE Transactions on Parallel and Distributed Systems

    (2000)
  • Cited by (0)

    View full text