Abstract
Finite Element Methods (FEM) are widely used in academia and industry, especially in the fields of mechanical engineering, civil engineering, aerospace, and electrical engineering. These methods usually convert partial difference equations into large sparse linear systems. For complex problems, solving these large sparse linear systems is a time consuming process. This paper presents a parallelized iterative solver for large sparse linear systems implemented on a GPGPU cluster. Traditionally, these problems do not scale well on GPGPU clusters. This paper presents an approach to reduce the communications between cluster compute nodes for these solvers. Additionally, computation and communication are overlapped to reduce the impact of data exchange. The parallelized system achieved a speedup of up to 15.3 times on 16 NVIDIA Tesla GPUs, compared to a single GPU. An analytical evaluation of the algorithm is conducted in this paper, and the analytical equations for predicting the performance are presented and validated.
Similar content being viewed by others
References
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003)
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of SC ’09, Portland, OR (2009)
Mohiyuddin, M., Hoemmen, M., Demmel, J., Yelick, K.: Minimizing communication in sparse matrix solvers. In: Proceedings of SC’09 (2009)
Bahi, J.M., Couturier, R., Khodja, L.J.: Parallel GMRES implementation for solving sparse linear systems on GPU clusters. In: Proceedings of HPC’11, Boston, MA, pp. 12–19 (2011)
Bahi, J.M., Couturier, R., Khodja, L.J.: Parallel sparse linear solver GMRES for GPU clusters with compression of exchanged data. In: Lect. Notes Comput. Sci., vol. 7155, pp. 471–480 (2012)
Cevahir, A., Nukada, A., Matsuoka, S.: High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput. Sci. Res. Dev. 25, 83–91 (2010)
He, H., Wanga, L., Lee, E., Chen, P.: An MPI-CUDA implementation and optimization for parallel sparse equations and least squares (LSQR). In: The 2012 International Conference on Computational Science (ICCS), Procedia Computer Science. Elsevier, Amsterdam (2012)
NVIDIA: CUDA programming guide, version 5.0 (2012)
CUSP library: http://code.google.com/p/cusp-library/
Guo, P., Huang, H., Chen, Q., Wang, L., Lee, E., Chen, P.: A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs. In: Proceeding of TeraGrid ’11, Salt Lake City, UT (2011)
Godwin, J., Holewinski, J., Sadayappan, P.: High-performance sparse matrix-vector multiplication on GPUs for structured grid computations. In: The GPGPU 5, London, UK (2012)
NVIDIA: Developing a Linux kernel module using RDMA for GPU direct, v0.2, July 2012
Jordan, A., Bycul, R.P.: The parallel algorithm of conjugate gradient method. In: Lect. Notes Comput. Sci., vol. 2326, pp. 156–165 (2002)
Ament, M., Knittel, G., Weiskopf, D., Straßer, W.: A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-GPU platform. In: Proceedings of 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, Pisa, Italy (2010)
Benzi, M., Meyer, C.D., Tuma, M.: A sparse approximate inverse preconditioner for the conjugate gradient method. SIAM J. Sci. Comput. 17(5), 1135–1149 (1996)
Huckle, T.: Factorized sparse approximate inverses for preconditioning. J. Supercomput. 25, 109–117 (2003)
Devine, K., Boman, E., Heaphy, R., Bisseling, R., Atalyurek, U.: Parallel hypergraph partitioning for scientific computing. In: Proceeding of IPDPS’06, Isle of Rhodes, Greece (2006)
Schloegel, K., Karypis, G., Kumar, K.: Parallel static and dynamic multi-constraint graph partitioning. Concurr. Comput. 14, 219–240 (2012)
Catalyurek, U.V., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10(7), 673–693 (1999)
Pichel, J.C., Rivera, F.F., Fernández, M., Rodríguez, A.: Optimization of sparse matrix–vector multiplication using reordering techniques on GPUs. Microprocess. Microsyst. 36(2), 65–77 (2012)
ViennaCL: http://viennacl.sourceforge.net/
Davis, T., Hu, Y.: The University of Florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/
Blelloch, G.E., Koutis, I., Miller, G.L., Tangwongsan, K.: Hierarchical diagonal blocking and precision reduction applied to combinatorial multigrid. In: Proceedings of SC’10, New Orleans, LA (2010)
Göddeke, D.: Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters. PhD dissertation, Technische Universität Dortmund, Fakultät für Mathematik, Logos Verlag, Berlin (2010). ISBN: 978-3-8325-2768-6
Bolz, J., Schr, P.: Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph. 22(3), 917–924 (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, C., Taha, T.M. A communication reduction approach to iteratively solve large sparse linear systems on a GPGPU cluster. Cluster Comput 17, 327–337 (2014). https://doi.org/10.1007/s10586-013-0279-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-013-0279-2