PaperA multi-level diffusion method for dynamic load balancing
Abstract
We consider the problem of dynamic load balancing for multiprocessors, for which a typical application is a parallel finite element solution method using non-structured grids and adaptive grid refinement. This type of application requires communication between the subproblems which arises from the interdependencies in the data. A load balancing algorithm should ideally not make any assumptions about the physical topology of the parallel machine. Further requirements are that the procedure should be fast and accurate. An new multi-level algorithm is presented for solving the dynamic load balancing problem which has these properties and whose parallel complexity is logarithmic in the number of processors used in the computation.
References (5)
- G Cybenko
Dynamic load balancing for distributed memory multiprocessors
J. Parallel Distributed Comput.
(1989) - J.E Boillat
Load balancing and Poisson equation in a Graph
Concurrency Practice Exper.
(1990)
Cited by (55)
Cluster-based communication and load balancing for simulations on dynamically adaptive grids
2014, Procedia Computer ScienceThe present paper introduces a new communication and load-balancing scheme based on a clustering of the grid which we use for the efficient parallelization of simulations on dynamically adaptive grids.
With a partitioning based on space-filling curves (SFCs), this yields several advantageous properties regarding the memory requirements and load balancing. However, for such an SFC- based partitioning, additional connectivity information has to be stored and updated for dynamically changing grids.
In this work, we present our approach to keep this connectivity information run-length encoded (RLE) only for the interfaces shared between partitions. Using special properties of the underlying grid traversal and used communication scheme, we update this connectivity information implicitly for dynamically changing grids and can represent the connectivity information as a sparse communication graph: graph nodes (partitions) represent bulks of connected grid cells and each graph edge (RLE connectivity information) a unique relation between adjacent partitions. This directly leads to an efficient shared-memory parallelization with graph nodes assigned to computing cores and an efficient en bloc data exchange via graph edges. We further refer to such a partitioning approach with RLE meta information as a cluster-based domain decomposition and to each partition as a cluster. With the sparse communication graph in mind, we then extend the connectivity information represented by the graph edges with MPI ranks, yielding an en bloc communication for distributed-memory systems and a hybrid parallelization. For data migration, the stack-based intra-cluster communication allows a very low memory footprint for data migration and the RLE leads to efficient updates of connectivity information.
Our benchmark is based on a shallow water simulation on a dynamically adaptive grid. We conducted performance studies for MPI-only and hybrid parallelizations, yielding an efficiency of over 90% on 256 cores. Furthermore, we demonstrate the applicability of cluster-based optimizations on distributed-memory systems.
Dynamic load balancing algorithm for molecular dynamics based on Voronoi cells domain decompositions
2012, Computer Physics CommunicationsWe present a new algorithm for automatic parallel load balancing in classical molecular dynamics. It assumes a spatial domain decomposition of particles into Voronoi cells. It is a gradient method which attempts to minimize a cost function by displacing Voronoi sites associated with each processor/sub-domain along steepest descent directions. Excellent load balance has been obtained for quasi-2D and 3D practical applications, with up to particles on 65,536 MPI tasks.
Dimension-exchange algorithms for token distribution on tree-connected architectures
2004, Journal of Parallel and Distributed ComputingLoad balancing on a multi-processor system involves redistributing tasks among processors so that each processor has roughly the same amount of work to perform. The token-distribution problem is a static variant of the load balancing problem for the case in which the workloads in the system cannot be divided arbitrarily; that is, where each token represents an atomic element of work. A scalable method for distributing tokens over a parallel architecture is the so-called dimension-exchange approach. Our results include improved analysis of two existing dimension-exchange algorithms for token distribution on arbitrary graphs and on arbitrary trees, respectively. In particular, we establish a logarithmic upper bound on the discrepancy of the resulting distribution when the second algorithm is applied to an arbitrary initial distribution on a tree. We then present a new dimension-exchange algorithm for token distribution on trees, which assuming each node knows the number of nodes in the tree, determines a ‘perfectly balanced’ distribution. Furthermore, the rate of convergence is worst-case optimal for trees of bounded degree. Note that an algorithm for token-distribution on trees is applicable to arbitrary architectures, since the algorithm can be applied on a spanning tree of any given connected graph.
Dynamic load balancing by diffusion in heterogeneous systems
2004, Journal of Parallel and Distributed ComputingThe distributed environments constitute a major option for the future development of high-performance computing. In order to be able to efficiently execute parallel applications on such systems, one should ensure a fair utilization of the available resources. Here, we address a number of aspects regarding the generalization of the diffusion algorithms for the case when the processors have different relative speeds and the communication parameters have different values. Although some work has been done in this direction, we propose complementary results and we investigate other variants than those commonly used. In a first step, we discuss general aspects of the generalized diffusion. Bounds are formulated for the convergence factor and an explicit expression is given for the migration flow generated by such algorithms. It is shown that this flow has an important property, that is a scaled projection of all other balancing flows. In the second part, a variant of generalized diffusion is investigated. Complexity results are formulated and it is shown that this algorithm theoretically converges faster than the hydrodynamic algorithm. Comparative tests between different variants of generalized diffusion algorithms are performed.
A novel dynamic load balancing scheme for parallel systems
2002, Journal of Parallel and Distributed ComputingAdaptive mesh refinement (AMR) is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. One of the key issues related to AMR is dynamic load balancing (DLB), which allows large-scale adaptive applications to run efficiently on parallel systems. In this paper, we present an efficient DLB scheme for structured AMR (SAMR) applications. This scheme interleaves a grid-splitting technique with direct grid movements (e.g., direct movement from an overloaded processor to an underloaded processor), for which the objective is to efficiently redistribute workload among all the processors so as to reduce the parallel execution time. The potential benefits of our DLB scheme are examined by incorporating our techniques into a SAMR cosmology application, the ENZO code. Experiments show that by using our scheme, the parallel execution time can be reduced by up to 57% and the quality of load balancing can be improved by a factor of six, as compared to the original DLB scheme used in ENZO.
Dynamic load balancing in computational mechanics
2000, Computer Methods in Applied Mechanics and EngineeringIn many important computational mechanics applications, the computation adapts dynamically during the simulation. Examples include adaptive mesh refinement, particle simulations and transient dynamics calculations. When running these kinds of simulations on a parallel computer, the work must be assigned to processors in a dynamic fashion to keep the computational load balanced. A number of approaches have been proposed for this dynamic load balancing problem. This paper reviews the major classes of algorithms and discusses their relative merits on problems from computational mechanics. Shortcomings in the state-of-the-art are identified and suggestions are made for future research directions.