Dimension-exchange token distribution on the mesh and the torus

doi:10.1016/S0167-8191(98)00006-4

Parallel Computing

Volume 24, Issue 2, February 1998, Pages 247-265

https://doi.org/10.1016/S0167-8191(98)00006-4 Get rights and content

Abstract

A solution to the token distribution problem is presented for the 2-dimensional mesh and torus, based on the dimension-exchange strategy. The approach is shown to reduce the discrepancy Δ between maximum and minimum processor loads to δ in worst-case optimal Θ((Δ−δ)·n) time steps, where 2≤δ<Δ in the case of an n-by-n mesh, and 4≤δ<Δin the case of an n-by-n torus.

Introduction

One of the fundamental data distribution problems on parallel architectures is that of token distribution, a static variant of the well-studied load balancing problem. Each processing element (PE) of the parallel architecture possesses an initial set of tokens, each of which represents a task to be performed; the number of tokens stored at a particular PE is called the load of that PE. Ideally, one would prefer that the distribution of the tokens over the set of PEs be as even as possible, as imbalances would result in a delay in the time needed to perform all the tasks. The goal of a token distribution algorithm is to redistribute the tokens in such a way that the final loads of the PEs differ as little as possible. Here it is assumed that each token requires only a constant amount of time to send from one PE to an adjacent PE, and that no tokens are created or destroyed before the redistribution is complete.

There are many data distribution methods which achieve a balanced token distribution by gathering and making use of a certain amount of global information 1, 2, 3. Such methods are often unsatisfactory, in that they do not consider the practical limitations of the parallel architecture, or result in algorithms that are unnecessarily complex. One method that requires no such global information is the scheme for load balancing within networks due to Aiello et al. [4]and analyzed by Ghosh et al. [5]. At each time step, every node v of the network receives a token from each of its neighbours having at least 2d+1 more tokens than v, where d is the maximum degree of any node of the network. The scheme guarantees that the algorithm balances to within a difference of O(d² log n/α) tokens in O(Δ/α) time steps, where n is the number of nodes of the network, Δ is the initial difference between the maximum and minimum processor loads, and α≤d is a parameter based on the topology of the network. In Ref. [5], networks and initial token distributions are shown for which these upper bounds are tight. For some network topologies, however, a better balance can be achieved.

Another data distribution method that requires no global information is the so-called dimension-exchange method, which is based on the repetitive application of an extremely simple and scalable local exchange protocol. To be able to implement a dimension-exchange algorithm on a particular parallel architecture, the communication edges of the underlying topology must be partitionable (or colourable) into sets whereby no two edges of the same set are incident on the same processor. For networks having hypercube or mesh-connected topologies, the edges can be partitioned in a natural fashion, according to the dimension of the network along which the edge is oriented. For other networks, partitions may be based on sets of matchings [6].

Dimension-exchange algorithms use the edge-colouring of a network to pair processors for data exchange, and are invariably of the following general form:Dimension-Exchange Algorithm:

LOOP
FOR i=colour 1 to k (*k colours*)
- Over all pairs of processors connected by edges of colour i, compare values and exchange;
END

The dimension-exchange approach has been used successfully for solutions to the problems of sorting (for example the well known algorithm of Batcher [7]), and form an integral part of many of the so-called `hot-potato' routing algorithms 8, 9, 10.

Due to their simplicity and scalability, many researchers have studied the applicability of dimension-exchange techniques to data-distribution problems; the first being Cybenko [11]in 1987, who proposed an algorithm for the d-dimensional hypercube under the assumption that the load in each PE was infinitely-divisible—that is, a real-valued quantity able to be split among processors in an arbitrary fashion. Cybenko showed that if every exchange results in an equal sharing of the load between the two PEs involved, then after d iterations the PE loads would be perfectly balanced.

This original work prompted a steady stream of research into the analysis of dimension-exchange algorithms. In 1988 Ranka et al. [12]studied the operation of Cybenko's algorithm empirically for the d-dimensional hypercube, under the more realistic assumption that the loads were finitely-divisible—that is, representable as a set of tokens. They observed that the difference between the maximum number and minimum number of tasks over all PEs of the network (called the discrepancy) would eventually fall to at most d. Soon after, Hosseini et al. [6]and Plaxton [13]confirmed this observation by providing an algorithm that after d steps reduced the discrepancy to at most d. In addition, Hosseini et al. demonstrated that, for infinitely-divisible loads, Cybenko's analysis could be generalized to arbitrary k-colourable networks.

In 1992 Xu and Lau [14]extended the work of Hosseini et al. by showing that for some topologies, the rate at which the global discrepancy converged to zero could be optimized by altering the ratio with which infinitely-divisible loads were locally balanced. They showed that the optimal ratios for the linear array, ring, 2-dimensional mesh and 2-dimensional torus all depend on their scales. These ratios were provided in an unpublished technical report appearing in the same year [15].

To date, a large body of results exist detailing the performance of the dimension-exchange approach over infinitely-divisible loads; on the other hand, little has been known concerning dimension-exchange for finitely-divisible loads (tokens) on meshes and tori of constant degree. In this paper, we present asymptotically-optimal dimension-exchange algorithms for token distribution on the two-dimensional mesh and torus. For the n-by-n mesh, we prove that if the discrepancy is greater than 3, 16n steps of the algorithm suffice to reduce the discrepancy by 1, and if the discrepancy is equal to 3, 22n steps suffice. For the n-by-n torus when n is restricted to be even, we prove that if the discrepancy is greater than 4, 14n steps of the algorithm suffice to reduce the discrepancy by 1. These results are the first to establish that dimension-exchange techniques lead to optimal solutions for finitely-divisible load balancing on a mesh-connected network of constant degree.

The organization of the paper is as follows: in Section 2, we describe the model of computation. In Section 3, we prove a lower bound on the complexity of the token-distribution problem, and propose dimension-exchange algorithms for the mesh and the torus. The notation and preliminary concepts that we use in the analysis of the algorithms is introduced in Section 4. The analysis of the algorithm on the torus appears in Section 5, and in Section 6, the result for the torus is extended to the mesh. Concluding remarks are made in Section 7.

Section snippets

Model of computation

One of the simplest and most practical fixed connection networks is the single-port mesh-connected array. In this model, the processing elements (PEs) are arranged in a square grid, and are connected to their neighbours by unidirectional communication links. The PEs of the mesh may send or receive at most one message at any one time. This model is considerably weaker than the MIMD-model, where bidirectional links are assumed and concurrent communication to all the neighbours is allowed.

The

The token distribution problem and algorithm

The token distribution problem TD(A; Δ,M,δ) was first posed by Peleg and Upfal [16], and can be stated as follows: given parallel architecture A containing n² processors P₁,…,P_n² with each processor P_i containing a stack of μ≤l(P_i)≤M tokens (for all 1≤i≤n²) and for a global discrepancy between loads equal to Δ=M−μ, distribute the tokens such that at the end the global discrepancy has been reduced to at most δ.

Notation and preliminaries

In this section, we present notation and preliminary observations for the analysis of 2DEB over the torus $T$ .

Consider the situation where row i of torus $T$ initially contains one token per location, and all other rows contain no tokens. Over the course of 2n steps of 2DEB, the tokens migrate from row to row through the torus, shifting one location every other step. After the 2nth step, the tokens once again fill their starting row i. The direction of the migration depends on the parity of i: if i

Analysis of 2DEB on the torus

In this section, Algorithm 2DEB is proven to optimally solve token distribution problems TD( $T$ ;Δ,M,δ) for tori $T$ , and δ≥4. When δ<4, there are instances for which Algorithm 2DEB fails; one such instance is shown in Fig. 4.
Lemma 5. Let $T$ be an n-by-n torus (n even) whose elements are non-negative integers, and let M and μ be the maximum and minimum values of these elements, respectively. If M−μ>2, then after n steps of Algorithm 2DEB on $T$ , no row or column of $T$ can contain piles α and β from

Extension of analysis to the mesh

In this section, we show how the results of Section 5for the torus lead directly to prove that Algorithm 2DEB optimally solves token distribution problems TD( $M$ ;Δ,M,δ) for mesh $M$ , and Δ≥δ≥2. The result is obtained via a simulation of the mesh by a torus of twice the sidelength, upon which the results of Section 5are applied. For δ<2, clearly, there are token distribution problems TD( $T$ ;Δ,M,δ) which cannot be solved by any algorithm.

Unlike the analysis of the Section 5for the torus, the analysis

Conclusion

In this paper, we presented a dimension-exchange data distribution algorithm and proved that it is asymptotically-optimal for token distribution on the two-dimensional mesh and torus. The benefits of the dimension-exchange approach, in that it is extremely simple, uses only locally-available information and is completely scalable, cannot be overstated. The analysis shows for the first time that dimension-exchange techniques can lead to optimal solutions for token distribution on a

References (16)

S.H. Hosseini et al.
Analysis of graph coloring based distributed load balancing algorith
J. Parallel Distributed Comput.
(1990)
G. Cybenko
Dynamic load balancing for distributed memory multiprocessors
J. Parallel Distributed Comput.
(1989)
C.Z. Xu et al.
Analysis of the generalized dimension exchange method for dynamic load balancing
J. Parallel Distributed Comput.
(1992)
D. Diderich, H. Gengler, S. Ubéda, An efficient algorithm for solving the token distribution problem on k-ary d-cube...
F. Meyer auf der Heide, B. Oesterdiekhoff, R. Wanka, Strongly adaptive token distribution, in: Proceedings of the 20th...
G. Turner, H. Schröder, Token distribution on reconfigurable d-dimensional meshes, in: Proc. 1st IEEE International...
W. Aiello, B. Awerbuch, B. Maggs, S. Rao, Approximate load balancing on dynamic and asynchronous networks, in: Proc....
B. Ghosh, F.T. Leighton, B.M. Maggs, S. Muthukrishnan, C.G. Plaxton, R. Rajaraman, A.W. Richa, R.E. Tarjan, D....

There are more references available in the full text version of this article.

Cited by (6)

Self-stabilizing token distribution on trees with constant space
2020, Journal of Parallel and Distributed Computing
Self-stabilizing and silent distributed algorithms for token distribution in rooted tree networks are given. Initially, each process of a graph holds at most $ℓ$ tokens. Our goal is to distribute the tokens uniformly in the whole network so that every process holds exactly $k$ tokens. In the initial configuration, the total number of tokens in the network may not be $n k$ where $n$ is the number of processes in the network. The root process is given the ability to create a new token or remove a token from the network. We aim to minimize the convergence time, the number of token moves, and the space complexity. First, a self-stabilizing token distribution algorithm that converges within $O (n ℓ)$ asynchronous rounds and needs $Θ (n h ϵ)$ redundant (or unnecessary) token moves is given, where $ϵ = min (k, ℓ - k)$ and $h$ is the height of the tree network. Next, two novel mechanisms to reduce the number of redundant token moves are presented. One reduces the number of redundant token moves to $O (n h)$ without any additional costs while the other reduces the number of redundant token moves to $O (n)$ , but increases the convergence time to $O (n h ℓ)$ . All given algorithms have constant memory at each process and each link register.
Dimension-exchange algorithms for token distribution on tree-connected architectures
2004, Journal of Parallel and Distributed Computing
Load balancing on a multi-processor system involves redistributing tasks among processors so that each processor has roughly the same amount of work to perform. The token-distribution problem is a static variant of the load balancing problem for the case in which the workloads in the system cannot be divided arbitrarily; that is, where each token represents an atomic element of work. A scalable method for distributing tokens over a parallel architecture is the so-called dimension-exchange approach. Our results include improved analysis of two existing dimension-exchange algorithms for token distribution on arbitrary graphs and on arbitrary trees, respectively. In particular, we establish a logarithmic upper bound on the discrepancy of the resulting distribution when the second algorithm is applied to an arbitrary initial distribution on a tree. We then present a new dimension-exchange algorithm for token distribution on trees, which assuming each node knows the number of nodes in the tree, determines a ‘perfectly balanced’ distribution. Furthermore, the rate of convergence is worst-case optimal for trees of bounded degree. Note that an algorithm for token-distribution on trees is applicable to arbitrary architectures, since the algorithm can be applied on a spanning tree of any given connected graph.
Optimal dimension-exchange token distribution on complete binary trees
1999, Theoretical Computer Science
Load balancing on a multi-processor system involves redistributing tasks among processors so that each has roughly the same amount of work to perform. The token-distribution problem is a static variant of the load balancing problem for the case in which the workloads in the system cannot be divided arbitrarily; i.e., where each token represents an atomic element of work. A simple, scalable method for distributing tokens over a distributed-memory parallel architecture is the so-called dimension-exchange approach, which is based on the repetitive application of an extremely simple and scalable local exchange protocol. The behaviour of this approach depends heavily on the topology of the interconnection network.
This paper presents an analysis of dimension-exchange algorithms for token distribution on the complete binary tree. We show that for the complete binary tree of height H, and any initial distribution for which the discrepancy in workloads is greater than H tokens, the dimension-exchange method will eventually reduce the discrepancy to at most H. Furthermore, we show that the rate of this convergence to H is worst-case optimal. These results are the first to establish that dimension-exchange techniques lead to optimal algorithms for finitely-divisible load balancing on a tree-connected network.
On the routing number of complete d-ary trees
2001, International Journal of Foundations of Computer Science
Self-stabilizing token distribution with constant-space for trees
2019, Leibniz International Proceedings in Informatics, LIPIcs
Perfect token distribution on trees
2004, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

¹: E-mail: [email protected]

View full text

Dimension-exchange token distribution on the mesh and the torus

Abstract

Introduction

Section snippets

Model of computation

The token distribution problem and algorithm

Notation and preliminaries

Analysis of 2DEB on the torus

Extension of analysis to the mesh

Conclusion

J. Parallel Distributed Comput.

J. Parallel Distributed Comput.

J. Parallel Distributed Comput.