Using LAMA for efficient AMG on hybrid clusters

Kraus, Jiri; Förster, Malte; Brandes, Thomas; Soddemann, Thomas

doi:10.1007/s00450-012-0223-3

Using LAMA for efficient AMG on hybrid clusters

Special Issue Paper
Published: 23 May 2012

Volume 28, pages 211–220, (2013)
Cite this article

Computer Science - Research and Development

Jiri Kraus¹,
Malte Förster¹,
Thomas Brandes¹ &
…
Thomas Soddemann¹

310 Accesses
7 Citations
Explore all metrics

Abstract

In this paper, we describe the implementation of an AMG solver for a hybrid cluster that exploits distributed and shared memory parallelization and uses the available GPU accelerators on each node. This solver has been written by using LAMA (Library for Accelerated Math Applications). This library does not only provide an easy-to-use framework for solvers that might run on different devices with different matrix formats, but also comes with features to optimize and hide communication and memory transfers between CPUs and GPUs. These features are explained and their impact on the efficiency of the AMG solver is shown in this paper. The benchmark results show that an efficient use of hybrid clusters is even possible for multi-level methods like AMG where fast solutions are needed on all levels for multiple problem sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Open MPI 1.4.4. Tests with Intel MPI 4.0.2.003 showed a similar behavior.

References

Hypre homepage (2010) https://computation.llnl.gov/casc/hypre/software.html, last viewed Jan 2012
Lama software on Sourceforge (2011) http://www.sourceforge.net/projects/libama, last viewed Jan 2012
ML homepage (2011) http://trilinos.sandia.gov/packages/ml/, last viewed Jan 2012
SAMG homepage (2011) https://www.scai.fraunhofer.de/samg.html, last viewed Jan 2012
Lama homepage (2012) http://www.libama.org, last viewed Jan 2012
MTL4 CG (2012) http://www.simunova.com/en/node/184, last viewed Jan 2012
Ament M, Knittel G, Weiskopf D, Straßer W (2010) A parallel preconditioned conjugate gradient solver or the poisson problem on a multi-GPU platform. In: Parallel, distributed, and network based processing, pp 583–593
Google Scholar
Bell N, Garland M (2009) Efficient sparse matrix-vector multiplication on CUDA. In: Proc ACM/IEEE conf supercomputing (SC), Portland, OR, USA
Google Scholar
Brandt A, McCormick S, Ruge J (1984) Algebraic Multigrid (AMG) for sparse matrix equations. In: Evans DJ (ed) Sparsity and its Applications. Cambridge University Press, Cambridge
Google Scholar
Catalyurek U, Aykanat C (2001) A hypergraph-partitioning approach for coarse-grain decomposition. In: SC2001, Denver, CO. ACM/IEEE, New York
Google Scholar
Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multiple GPUs. In: ICCS 2009, vol 5544, pp 893–903
Chapter Google Scholar
Cevahir A, Nukada A, Matsuoka S (2010) High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput Sci Res Dev 25:83–91
Article Google Scholar
Förster M, Kraus J (2011) Scalable parallel AMG on CCNUMA machines with OpenMP. Springer, Berlin, pp 1–8
Google Scholar
Haase G, Liebmann M, Douglas C, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units. In: High performance computing and applications, pp 38–47
Chapter Google Scholar
Heuveline V, Lukarski D, Weiss JP (2012) Using multicore CPUs and GPUs. Springer, Berlin
Google Scholar
Kraus J, Förster M (2012) Efficient AMG on heterogeneous systems Springer, Berlin, pp 133–146
Google Scholar
Ruge J, Stüben K (1987) Algebraic Multigrid (AMG). In: McCormick SF (ed) Multigrid methods. Frontiers in applied mathematics, vol 3. SIAM, Philadelphia, pp 73–130
Chapter Google Scholar
Schubert G, Fehske H, Hager G, Wellein G (2011) Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems. Parallel Process Lett 21(3):339–358
Article MathSciNet Google Scholar
Strustrup B (2000) The C++ programming language. Special edition
Google Scholar
Van Dyk D, Geveler M, Mallach S, Ribbrock D, Göddeke D, Gutwenger C (2009) Honei: a collection of libraries for numerical computations targeting multiple processor architectures. Comput Phys Commun 180(12):2534–2543
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Schloss Birlinghoven, 53754, Sankt Augustin, Germany
Jiri Kraus, Malte Förster, Thomas Brandes & Thomas Soddemann

Authors

Jiri Kraus
View author publications
You can also search for this author in PubMed Google Scholar
Malte Förster
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brandes
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Soddemann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiri Kraus.

Additional information

Granted by Fraunhofer, ITEA2 project H4H—BMBF 01|S10036H, BMBF project GASPI 01|H11007F.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kraus, J., Förster, M., Brandes, T. et al. Using LAMA for efficient AMG on hybrid clusters. Comput Sci Res Dev 28, 211–220 (2013). https://doi.org/10.1007/s00450-012-0223-3

Download citation

Published: 23 May 2012
Issue Date: May 2013
DOI: https://doi.org/10.1007/s00450-012-0223-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using LAMA for efficient AMG on hybrid clusters

Abstract

Access this article

Similar content being viewed by others

A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster

Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On

Asynchronous AMR on Multi-GPUs

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using LAMA for efficient AMG on hybrid clusters

Abstract

Access this article

Similar content being viewed by others

A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster

Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On

Asynchronous AMR on Multi-GPUs

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation