Abstract
We present a tailored load balancing technique that addresses specific performance issues in the boundary data accumulation algorithm for non-overlapping domain decompositions. The technique is used to speed up a parallel conjugate gradient algorithm with an algebraic multigrid preconditioner to solve a potential problem on an unstructured tetrahedral finite element mesh. The optimized accumulation algorithm significantly improves the performance of the parallel solver and we show up to 50 % runtime improvements over the standard approach in benchmark runs with up to 48 MPI processes. The load balancing problem itself is a global optimization problem that is solved approximately by local optimization algorithms in parallel that require no communication during the optimization process.
Similar content being viewed by others
References
Baker, A.H., Schulz, M., Yang, U.M.: On the performance of an algebraic multigrid solver on multicore clusters. In: Palma, J.M.L.M., Daydé, M.J., Marques, O., Lopes, J.C. (eds.) VECPAR. Lecture Notes in Computer Science, vol. 6449, pp. 102–115. Springer (2010)
Briggs, W.L., Henson, V.E., McCormick, S.F.: A Multigrid Tutorial, 2nd edn. SIAM Books, Philadelphia (2000)
Dekker, A., Aarts, E.: Global optimization and simulated annealing. Math. Program. 50, 367–393 (1991)
Geimer, M., Wolf, F., Wylie, B.J., Abraham, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput. Pract. Exp. 22(6), 702–719 (2010)
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message Passing Interface. The MIT Press, Cambridge (1999)
Haase, G.: A parallel AMG for overlapping and non-overlapping domain decomposition. Electron. Trans. Numer. Anal. (ETNA) 10, 41–55 (2000)
Haase, G., Kuhn, M., Reitzinger, S.: Parallel AMG on distributed memory computers. SIAM SISC 24(2), 410–427 (2002)
Karypis, G., Kumar, V.: MeTis: Unstructured graph partitioning and sparse matrix ordering system, Version 4.0. http://www.cs.umn.edu/~metis (2009)
Knupp, P.M.: Applications of mesh smoothing: copy, morph, and sweep on unstructured quadrilateral meshes. Int. J. Numer. Methods Eng. 44(1), 37–45 (1999)
Liebmann, M.: Efficient PDE solvers on modern hardware with applications in medical and technical sciences. Ph.D. thesis, University of Graz, Austria (2009)
Meurant, G.: The Lanczos and Conjugate Gradient Algorithms, Software, Environments, and Tools, vol. 19. SIAM, Philadelphia (2006)
Mitchell, L., Bishop, M., Hoetzl, E., Neic, A., Liebmann, M., Haase, G., Plank, G.: Modeling cardiac electrophysiology at the organ level in the peta FLOPS computing age. In: AMER INST PHYSICS, pp. 407–410 (2010)
Neic, A., Liebmann, M., Haase, G., Plank, G.: Algebraic multigrid solvers on clusters of CPUs and GPUs. In: Jónasson, K. (eds.) PARA (2). Lecture Notes in Computer Science, vol. 7134, pp. 389–398. Springer (2012)
Neic, A., Liebmann, M., Hoetzl, E., Mitchell, L., Vigmond, E., Haase, G., Plank, G.: Accelerating cardiac bidomain simulations using graphics processing units. IEEE Trans. Biomed. Eng. 59(8), 2281–2290 (2012). doi:10.1109/TBME.2012.2202661
Pechstein, C.: Finite and boundary element tearing and interconnecting solvers for multiscale problems. Lecture Notes in Computational Science and Engineering, vol. 90. Springer (2013)
Plank, G., Liebmann, M., Weber dos Santos, R., Vigmond, E., Haase, G.: Algebraic multigrid preconditioner for the cardiac bidomain model. IEEE Trans. Biomed. Eng. 54(4), 585–596 (2007)
van Laarhoven, P., Aarts, E.: Simulated Annealing. Kluwer Academic Press, Dordrecht (1987)
Wohlmuth, B.I.: A mortar finite element method using dual spaces for the lagrange multiplier. SIAM J. Numer. Anal. 38, 989–1012 (1998)
Wolf, F.: Scalasca. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, 1st edn, pp. 1775–1785. Springer, Berlin (2011)
Acknowledgments
The research was supported by the Austrian Science Fund FWF project SFB F032 “Mathematical Optimization and Applications in Biomedical Sciences”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Gabriel Wittum.
Rights and permissions
About this article
Cite this article
Liebmann, M., Neic, A. & Haase, G. A balanced accumulation scheme for parallel PDE solvers. Comput. Visual Sci. 16, 33–40 (2013). https://doi.org/10.1007/s00791-014-0222-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00791-014-0222-y