Skip to main content
Log in

A balanced accumulation scheme for parallel PDE solvers

  • Published:
Computing and Visualization in Science

Abstract

We present a tailored load balancing technique that addresses specific performance issues in the boundary data accumulation algorithm for non-overlapping domain decompositions. The technique is used to speed up a parallel conjugate gradient algorithm with an algebraic multigrid preconditioner to solve a potential problem on an unstructured tetrahedral finite element mesh. The optimized accumulation algorithm significantly improves the performance of the parallel solver and we show up to 50 % runtime improvements over the standard approach in benchmark runs with up to 48 MPI processes. The load balancing problem itself is a global optimization problem that is solved approximately by local optimization algorithms in parallel that require no communication during the optimization process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Baker, A.H., Schulz, M., Yang, U.M.: On the performance of an algebraic multigrid solver on multicore clusters. In: Palma, J.M.L.M., Daydé, M.J., Marques, O., Lopes, J.C. (eds.) VECPAR. Lecture Notes in Computer Science, vol. 6449, pp. 102–115. Springer (2010)

  2. Briggs, W.L., Henson, V.E., McCormick, S.F.: A Multigrid Tutorial, 2nd edn. SIAM Books, Philadelphia (2000)

  3. Dekker, A., Aarts, E.: Global optimization and simulated annealing. Math. Program. 50, 367–393 (1991)

    Article  Google Scholar 

  4. Geimer, M., Wolf, F., Wylie, B.J., Abraham, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput. Pract. Exp. 22(6), 702–719 (2010)

    Google Scholar 

  5. Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message Passing Interface. The MIT Press, Cambridge (1999)

    Google Scholar 

  6. Haase, G.: A parallel AMG for overlapping and non-overlapping domain decomposition. Electron. Trans. Numer. Anal. (ETNA) 10, 41–55 (2000)

    MathSciNet  MATH  Google Scholar 

  7. Haase, G., Kuhn, M., Reitzinger, S.: Parallel AMG on distributed memory computers. SIAM SISC 24(2), 410–427 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Karypis, G., Kumar, V.: MeTis: Unstructured graph partitioning and sparse matrix ordering system, Version 4.0. http://www.cs.umn.edu/~metis (2009)

  9. Knupp, P.M.: Applications of mesh smoothing: copy, morph, and sweep on unstructured quadrilateral meshes. Int. J. Numer. Methods Eng. 44(1), 37–45 (1999)

    Article  MathSciNet  Google Scholar 

  10. Liebmann, M.: Efficient PDE solvers on modern hardware with applications in medical and technical sciences. Ph.D. thesis, University of Graz, Austria (2009)

  11. Meurant, G.: The Lanczos and Conjugate Gradient Algorithms, Software, Environments, and Tools, vol. 19. SIAM, Philadelphia (2006)

    Book  Google Scholar 

  12. Mitchell, L., Bishop, M., Hoetzl, E., Neic, A., Liebmann, M., Haase, G., Plank, G.: Modeling cardiac electrophysiology at the organ level in the peta FLOPS computing age. In: AMER INST PHYSICS, pp. 407–410 (2010)

  13. Neic, A., Liebmann, M., Haase, G., Plank, G.: Algebraic multigrid solvers on clusters of CPUs and GPUs. In: Jónasson, K. (eds.) PARA (2). Lecture Notes in Computer Science, vol. 7134, pp. 389–398. Springer (2012)

  14. Neic, A., Liebmann, M., Hoetzl, E., Mitchell, L., Vigmond, E., Haase, G., Plank, G.: Accelerating cardiac bidomain simulations using graphics processing units. IEEE Trans. Biomed. Eng. 59(8), 2281–2290 (2012). doi:10.1109/TBME.2012.2202661

    Article  Google Scholar 

  15. Pechstein, C.: Finite and boundary element tearing and interconnecting solvers for multiscale problems. Lecture Notes in Computational Science and Engineering, vol. 90. Springer (2013)

  16. Plank, G., Liebmann, M., Weber dos Santos, R., Vigmond, E., Haase, G.: Algebraic multigrid preconditioner for the cardiac bidomain model. IEEE Trans. Biomed. Eng. 54(4), 585–596 (2007)

    Article  Google Scholar 

  17. van Laarhoven, P., Aarts, E.: Simulated Annealing. Kluwer Academic Press, Dordrecht (1987)

    Book  MATH  Google Scholar 

  18. Wohlmuth, B.I.: A mortar finite element method using dual spaces for the lagrange multiplier. SIAM J. Numer. Anal. 38, 989–1012 (1998)

    Article  MathSciNet  Google Scholar 

  19. Wolf, F.: Scalasca. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, 1st edn, pp. 1775–1785. Springer, Berlin (2011)

    Google Scholar 

Download references

Acknowledgments

The research was supported by the Austrian Science Fund FWF project SFB F032 “Mathematical Optimization and Applications in Biomedical Sciences”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manfred Liebmann.

Additional information

Communicated by Gabriel Wittum.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liebmann, M., Neic, A. & Haase, G. A balanced accumulation scheme for parallel PDE solvers. Comput. Visual Sci. 16, 33–40 (2013). https://doi.org/10.1007/s00791-014-0222-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00791-014-0222-y

Keywords

Navigation