Dimension-exchange token distribution on the mesh and the torus
Introduction
One of the fundamental data distribution problems on parallel architectures is that of token distribution, a static variant of the well-studied load balancing problem. Each processing element (PE) of the parallel architecture possesses an initial set of tokens, each of which represents a task to be performed; the number of tokens stored at a particular PE is called the load of that PE. Ideally, one would prefer that the distribution of the tokens over the set of PEs be as even as possible, as imbalances would result in a delay in the time needed to perform all the tasks. The goal of a token distribution algorithm is to redistribute the tokens in such a way that the final loads of the PEs differ as little as possible. Here it is assumed that each token requires only a constant amount of time to send from one PE to an adjacent PE, and that no tokens are created or destroyed before the redistribution is complete.
There are many data distribution methods which achieve a balanced token distribution by gathering and making use of a certain amount of global information 1, 2, 3. Such methods are often unsatisfactory, in that they do not consider the practical limitations of the parallel architecture, or result in algorithms that are unnecessarily complex. One method that requires no such global information is the scheme for load balancing within networks due to Aiello et al. [4]and analyzed by Ghosh et al. [5]. At each time step, every node v of the network receives a token from each of its neighbours having at least 2d+1 more tokens than v, where d is the maximum degree of any node of the network. The scheme guarantees that the algorithm balances to within a difference of O(d2 log n/α) tokens in O(Δ/α) time steps, where n is the number of nodes of the network, Δ is the initial difference between the maximum and minimum processor loads, and α≤d is a parameter based on the topology of the network. In Ref. [5], networks and initial token distributions are shown for which these upper bounds are tight. For some network topologies, however, a better balance can be achieved.
Another data distribution method that requires no global information is the so-called dimension-exchange method, which is based on the repetitive application of an extremely simple and scalable local exchange protocol. To be able to implement a dimension-exchange algorithm on a particular parallel architecture, the communication edges of the underlying topology must be partitionable (or colourable) into sets whereby no two edges of the same set are incident on the same processor. For networks having hypercube or mesh-connected topologies, the edges can be partitioned in a natural fashion, according to the dimension of the network along which the edge is oriented. For other networks, partitions may be based on sets of matchings [6].
Dimension-exchange algorithms use the edge-colouring of a network to pair processors for data exchange, and are invariably of the following general form:Dimension-Exchange Algorithm:
LOOP
FOR i=colour 1 to k (*k colours*)
Over all pairs of processors connected by edges of colour i, compare values and exchange;
END
Due to their simplicity and scalability, many researchers have studied the applicability of dimension-exchange techniques to data-distribution problems; the first being Cybenko [11]in 1987, who proposed an algorithm for the d-dimensional hypercube under the assumption that the load in each PE was infinitely-divisible—that is, a real-valued quantity able to be split among processors in an arbitrary fashion. Cybenko showed that if every exchange results in an equal sharing of the load between the two PEs involved, then after d iterations the PE loads would be perfectly balanced.
This original work prompted a steady stream of research into the analysis of dimension-exchange algorithms. In 1988 Ranka et al. [12]studied the operation of Cybenko's algorithm empirically for the d-dimensional hypercube, under the more realistic assumption that the loads were finitely-divisible—that is, representable as a set of tokens. They observed that the difference between the maximum number and minimum number of tasks over all PEs of the network (called the discrepancy) would eventually fall to at most d. Soon after, Hosseini et al. [6]and Plaxton [13]confirmed this observation by providing an algorithm that after d steps reduced the discrepancy to at most d. In addition, Hosseini et al. demonstrated that, for infinitely-divisible loads, Cybenko's analysis could be generalized to arbitrary k-colourable networks.
In 1992 Xu and Lau [14]extended the work of Hosseini et al. by showing that for some topologies, the rate at which the global discrepancy converged to zero could be optimized by altering the ratio with which infinitely-divisible loads were locally balanced. They showed that the optimal ratios for the linear array, ring, 2-dimensional mesh and 2-dimensional torus all depend on their scales. These ratios were provided in an unpublished technical report appearing in the same year [15].
To date, a large body of results exist detailing the performance of the dimension-exchange approach over infinitely-divisible loads; on the other hand, little has been known concerning dimension-exchange for finitely-divisible loads (tokens) on meshes and tori of constant degree. In this paper, we present asymptotically-optimal dimension-exchange algorithms for token distribution on the two-dimensional mesh and torus. For the n-by-n mesh, we prove that if the discrepancy is greater than 3, 16n steps of the algorithm suffice to reduce the discrepancy by 1, and if the discrepancy is equal to 3, 22n steps suffice. For the n-by-n torus when n is restricted to be even, we prove that if the discrepancy is greater than 4, 14n steps of the algorithm suffice to reduce the discrepancy by 1. These results are the first to establish that dimension-exchange techniques lead to optimal solutions for finitely-divisible load balancing on a mesh-connected network of constant degree.
The organization of the paper is as follows: in Section 2, we describe the model of computation. In Section 3, we prove a lower bound on the complexity of the token-distribution problem, and propose dimension-exchange algorithms for the mesh and the torus. The notation and preliminary concepts that we use in the analysis of the algorithms is introduced in Section 4. The analysis of the algorithm on the torus appears in Section 5, and in Section 6, the result for the torus is extended to the mesh. Concluding remarks are made in Section 7.
Section snippets
Model of computation
One of the simplest and most practical fixed connection networks is the single-port mesh-connected array. In this model, the processing elements (PEs) are arranged in a square grid, and are connected to their neighbours by unidirectional communication links. The PEs of the mesh may send or receive at most one message at any one time. This model is considerably weaker than the MIMD-model, where bidirectional links are assumed and concurrent communication to all the neighbours is allowed.
The
The token distribution problem and algorithm
The token distribution problem TD(A; Δ,M,δ) was first posed by Peleg and Upfal [16], and can be stated as follows: given parallel architecture A containing n2 processors P1,…,Pn2 with each processor Pi containing a stack of μ≤l(Pi)≤M tokens (for all 1≤i≤n2) and for a global discrepancy between loads equal to Δ=M−μ, distribute the tokens such that at the end the global discrepancy has been reduced to at most δ.
Notation and preliminaries
In this section, we present notation and preliminary observations for the analysis of 2DEB over the torus .
Consider the situation where row i of torus initially contains one token per location, and all other rows contain no tokens. Over the course of 2n steps of 2DEB, the tokens migrate from row to row through the torus, shifting one location every other step. After the 2nth step, the tokens once again fill their starting row i. The direction of the migration depends on the parity of i: if i
Analysis of 2DEB on the torus
In this section, Algorithm 2DEB is proven to optimally solve token distribution problems TD(;Δ,M,δ) for tori , and δ≥4. When δ<4, there are instances for which Algorithm 2DEB fails; one such instance is shown in Fig. 4.
Lemma 5. Let be an n-by-n torus (n even) whose elements are non-negative integers, and let M and μ be the maximum and minimum values of these elements, respectively. If M−μ>2, then after n steps of Algorithm 2DEB on , no row or column of can contain piles α and β from
Extension of analysis to the mesh
In this section, we show how the results of Section 5for the torus lead directly to prove that Algorithm 2DEB optimally solves token distribution problems TD(;Δ,M,δ) for mesh , and Δ≥δ≥2. The result is obtained via a simulation of the mesh by a torus of twice the sidelength, upon which the results of Section 5are applied. For δ<2, clearly, there are token distribution problems TD(;Δ,M,δ) which cannot be solved by any algorithm.
Unlike the analysis of the Section 5for the torus, the analysis
Conclusion
In this paper, we presented a dimension-exchange data distribution algorithm and proved that it is asymptotically-optimal for token distribution on the two-dimensional mesh and torus. The benefits of the dimension-exchange approach, in that it is extremely simple, uses only locally-available information and is completely scalable, cannot be overstated. The analysis shows for the first time that dimension-exchange techniques can lead to optimal solutions for token distribution on a
References (16)
- et al.
Analysis of graph coloring based distributed load balancing algorith
J. Parallel Distributed Comput.
(1990) Dynamic load balancing for distributed memory multiprocessors
J. Parallel Distributed Comput.
(1989)- et al.
Analysis of the generalized dimension exchange method for dynamic load balancing
J. Parallel Distributed Comput.
(1992) - D. Diderich, H. Gengler, S. Ubéda, An efficient algorithm for solving the token distribution problem on k-ary d-cube...
- F. Meyer auf der Heide, B. Oesterdiekhoff, R. Wanka, Strongly adaptive token distribution, in: Proceedings of the 20th...
- G. Turner, H. Schröder, Token distribution on reconfigurable d-dimensional meshes, in: Proc. 1st IEEE International...
- W. Aiello, B. Awerbuch, B. Maggs, S. Rao, Approximate load balancing on dynamic and asynchronous networks, in: Proc....
- B. Ghosh, F.T. Leighton, B.M. Maggs, S. Muthukrishnan, C.G. Plaxton, R. Rajaraman, A.W. Richa, R.E. Tarjan, D....
Cited by (6)
Self-stabilizing token distribution on trees with constant space
2020, Journal of Parallel and Distributed ComputingDimension-exchange algorithms for token distribution on tree-connected architectures
2004, Journal of Parallel and Distributed ComputingOptimal dimension-exchange token distribution on complete binary trees
1999, Theoretical Computer ScienceOn the routing number of complete d-ary trees
2001, International Journal of Foundations of Computer ScienceSelf-stabilizing token distribution with constant-space for trees
2019, Leibniz International Proceedings in Informatics, LIPIcsPerfect token distribution on trees
2004, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
- 1
E-mail: [email protected]