Elsevier

Parallel Computing

Volume 24, Issue 2, February 1998, Pages 247-265
Parallel Computing

Dimension-exchange token distribution on the mesh and the torus

https://doi.org/10.1016/S0167-8191(98)00006-4Get rights and content

Abstract

A solution to the token distribution problem is presented for the 2-dimensional mesh and torus, based on the dimension-exchange strategy. The approach is shown to reduce the discrepancy Δ between maximum and minimum processor loads to δ in worst-case optimal Θ((Δ−δn) time steps, where 2≤δ<Δ in the case of an n-by-n mesh, and 4≤δ<Δin the case of an n-by-n torus.

Introduction

One of the fundamental data distribution problems on parallel architectures is that of token distribution, a static variant of the well-studied load balancing problem. Each processing element (PE) of the parallel architecture possesses an initial set of tokens, each of which represents a task to be performed; the number of tokens stored at a particular PE is called the load of that PE. Ideally, one would prefer that the distribution of the tokens over the set of PEs be as even as possible, as imbalances would result in a delay in the time needed to perform all the tasks. The goal of a token distribution algorithm is to redistribute the tokens in such a way that the final loads of the PEs differ as little as possible. Here it is assumed that each token requires only a constant amount of time to send from one PE to an adjacent PE, and that no tokens are created or destroyed before the redistribution is complete.

There are many data distribution methods which achieve a balanced token distribution by gathering and making use of a certain amount of global information 1, 2, 3. Such methods are often unsatisfactory, in that they do not consider the practical limitations of the parallel architecture, or result in algorithms that are unnecessarily complex. One method that requires no such global information is the scheme for load balancing within networks due to Aiello et al. [4]and analyzed by Ghosh et al. [5]. At each time step, every node v of the network receives a token from each of its neighbours having at least 2d+1 more tokens than v, where d is the maximum degree of any node of the network. The scheme guarantees that the algorithm balances to within a difference of O(d2 log n/α) tokens in O(Δ/α) time steps, where n is the number of nodes of the network, Δ is the initial difference between the maximum and minimum processor loads, and αd is a parameter based on the topology of the network. In Ref. [5], networks and initial token distributions are shown for which these upper bounds are tight. For some network topologies, however, a better balance can be achieved.

Another data distribution method that requires no global information is the so-called dimension-exchange method, which is based on the repetitive application of an extremely simple and scalable local exchange protocol. To be able to implement a dimension-exchange algorithm on a particular parallel architecture, the communication edges of the underlying topology must be partitionable (or colourable) into sets whereby no two edges of the same set are incident on the same processor. For networks having hypercube or mesh-connected topologies, the edges can be partitioned in a natural fashion, according to the dimension of the network along which the edge is oriented. For other networks, partitions may be based on sets of matchings [6].

Dimension-exchange algorithms use the edge-colouring of a network to pair processors for data exchange, and are invariably of the following general form:Dimension-Exchange Algorithm:

  • LOOP

  • FOR i=colour 1 to k (*k colours*)

    • Over all pairs of processors connected by edges of colour i, compare values and exchange;

  • END

The dimension-exchange approach has been used successfully for solutions to the problems of sorting (for example the well known algorithm of Batcher [7]), and form an integral part of many of the so-called `hot-potato' routing algorithms 8, 9, 10.

Due to their simplicity and scalability, many researchers have studied the applicability of dimension-exchange techniques to data-distribution problems; the first being Cybenko [11]in 1987, who proposed an algorithm for the d-dimensional hypercube under the assumption that the load in each PE was infinitely-divisible—that is, a real-valued quantity able to be split among processors in an arbitrary fashion. Cybenko showed that if every exchange results in an equal sharing of the load between the two PEs involved, then after d iterations the PE loads would be perfectly balanced.

This original work prompted a steady stream of research into the analysis of dimension-exchange algorithms. In 1988 Ranka et al. [12]studied the operation of Cybenko's algorithm empirically for the d-dimensional hypercube, under the more realistic assumption that the loads were finitely-divisible—that is, representable as a set of tokens. They observed that the difference between the maximum number and minimum number of tasks over all PEs of the network (called the discrepancy) would eventually fall to at most d. Soon after, Hosseini et al. [6]and Plaxton [13]confirmed this observation by providing an algorithm that after d steps reduced the discrepancy to at most d. In addition, Hosseini et al. demonstrated that, for infinitely-divisible loads, Cybenko's analysis could be generalized to arbitrary k-colourable networks.

In 1992 Xu and Lau [14]extended the work of Hosseini et al. by showing that for some topologies, the rate at which the global discrepancy converged to zero could be optimized by altering the ratio with which infinitely-divisible loads were locally balanced. They showed that the optimal ratios for the linear array, ring, 2-dimensional mesh and 2-dimensional torus all depend on their scales. These ratios were provided in an unpublished technical report appearing in the same year [15].

To date, a large body of results exist detailing the performance of the dimension-exchange approach over infinitely-divisible loads; on the other hand, little has been known concerning dimension-exchange for finitely-divisible loads (tokens) on meshes and tori of constant degree. In this paper, we present asymptotically-optimal dimension-exchange algorithms for token distribution on the two-dimensional mesh and torus. For the n-by-n mesh, we prove that if the discrepancy is greater than 3, 16n steps of the algorithm suffice to reduce the discrepancy by 1, and if the discrepancy is equal to 3, 22n steps suffice. For the n-by-n torus when n is restricted to be even, we prove that if the discrepancy is greater than 4, 14n steps of the algorithm suffice to reduce the discrepancy by 1. These results are the first to establish that dimension-exchange techniques lead to optimal solutions for finitely-divisible load balancing on a mesh-connected network of constant degree.

The organization of the paper is as follows: in Section 2, we describe the model of computation. In Section 3, we prove a lower bound on the complexity of the token-distribution problem, and propose dimension-exchange algorithms for the mesh and the torus. The notation and preliminary concepts that we use in the analysis of the algorithms is introduced in Section 4. The analysis of the algorithm on the torus appears in Section 5, and in Section 6, the result for the torus is extended to the mesh. Concluding remarks are made in Section 7.

Section snippets

Model of computation

One of the simplest and most practical fixed connection networks is the single-port mesh-connected array. In this model, the processing elements (PEs) are arranged in a square grid, and are connected to their neighbours by unidirectional communication links. The PEs of the mesh may send or receive at most one message at any one time. This model is considerably weaker than the MIMD-model, where bidirectional links are assumed and concurrent communication to all the neighbours is allowed.

The

The token distribution problem and algorithm

The token distribution problem TD(A; Δ,M,δ) was first posed by Peleg and Upfal [16], and can be stated as follows: given parallel architecture A containing n2 processors P1,…,Pn2 with each processor Pi containing a stack of μl(Pi)≤M tokens (for all 1≤in2) and for a global discrepancy between loads equal to Δ=Mμ, distribute the tokens such that at the end the global discrepancy has been reduced to at most δ.

Notation and preliminaries

In this section, we present notation and preliminary observations for the analysis of 2DEB over the torus T.

Consider the situation where row i of torus T initially contains one token per location, and all other rows contain no tokens. Over the course of 2n steps of 2DEB, the tokens migrate from row to row through the torus, shifting one location every other step. After the 2nth step, the tokens once again fill their starting row i. The direction of the migration depends on the parity of i: if i

Analysis of 2DEB on the torus

In this section, Algorithm 2DEB is proven to optimally solve token distribution problems TD(T;Δ,M,δ) for tori T, and δ≥4. When δ<4, there are instances for which Algorithm 2DEB fails; one such instance is shown in Fig. 4.
Lemma 5. Let T be an n-by-n torus (n even) whose elements are non-negative integers, and let M and μ be the maximum and minimum values of these elements, respectively. If M−μ>2, then after n steps of Algorithm 2DEB on T, no row or column of T can contain piles α and β from

Extension of analysis to the mesh

In this section, we show how the results of Section 5for the torus lead directly to prove that Algorithm 2DEB optimally solves token distribution problems TD(M;Δ,M,δ) for mesh M, and Δδ≥2. The result is obtained via a simulation of the mesh by a torus of twice the sidelength, upon which the results of Section 5are applied. For δ<2, clearly, there are token distribution problems TD(T;Δ,M,δ) which cannot be solved by any algorithm.

Unlike the analysis of the Section 5for the torus, the analysis

Conclusion

In this paper, we presented a dimension-exchange data distribution algorithm and proved that it is asymptotically-optimal for token distribution on the two-dimensional mesh and torus. The benefits of the dimension-exchange approach, in that it is extremely simple, uses only locally-available information and is completely scalable, cannot be overstated. The analysis shows for the first time that dimension-exchange techniques can lead to optimal solutions for token distribution on a

References (16)

There are more references available in the full text version of this article.

Cited by (6)

  • Self-stabilizing token distribution on trees with constant space

    2020, Journal of Parallel and Distributed Computing
  • On the routing number of complete d-ary trees

    2001, International Journal of Foundations of Computer Science
  • Self-stabilizing token distribution with constant-space for trees

    2019, Leibniz International Proceedings in Informatics, LIPIcs
  • Perfect token distribution on trees

    2004, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
View full text