A GPU parallelized spectral method for elliptic equations in rectangular domains

doi:10.1016/j.jcp.2013.05.031

Journal of Computational Physics

Volume 250, 1 October 2013, Pages 555-564

https://doi.org/10.1016/j.jcp.2013.05.031 Get rights and content

Abstract

We design and implement a polynomial-based spectral method on graphic processing units (GPUs). The key to success lies in the seamless integration of the matrix diagonalization technique and the new generation CUDA tools. The method is applicable to elliptic equations in rectangular domains with general boundary condition. We show remarkable speedups of up to 15 times in the 2-D case and more than 35 times in the 3-D case.

Introduction

General-purpose computing on graphics processing units (GPGPU) has drawn much attention recently from the scientific computing community. The new list of top 500 supercomputers shows that more than 10% of them are now powered by NVIDIA Tesla GPUs (cf. [4]). Thanks to the high core density and the wide vector width SIMD architecture, using GPGPU can yield performance that is very hard for a conventional CPU to achieve, especially for high and regular throughput workloads (cf. [3]). Therefore, researchers in computational sciences are more and more interested in exploring efficient parallel strategies on GPUs.

There exist many successful GPU implementations of numerical methods for partial differential equations, to name a few, fast multipole methods [10], nodal discontinuous Galerkin methods [12], finite difference methods [6], finite element methods [24], [14], [13], Fourier spectral methods [9], [5], etc. However, to the best of the authors’ knowledge, not much effort, if not at all, are devoted to implementing, on GPUs, non-Fourier based spectral methods for problems with non-periodic boundary conditions. The aim of this paper is to exploit how to efficiently implement spectral methods with GPGPU. In particular, we shall design and implement an efficient matrix diagonalization method with spectral discretizations on the GPU, for solving elliptic and parabolic type equations with non-periodic boundary conditions on a d-dimensional rectangular domain ( $d = 2, 3$ ).

It is well-known that, for separable elliptic equations, one can use the so-called matrix diagonalization technique (cf. [16], [11], [18]) to solve the linear systems from a spectral discretization in O( $N^{d + 1}$ ) operations, where N is the number of points in each dimension. For time-dependent problems, these solvers are called repeatedly. Fortunately, the core part of this matrix diagonalization process is matrix–matrix multiplications, which can be efficiently parallelized on the GPU. More precisely, matrix–matrix multiplications can be efficiently addressed by the new generation CUDA tool, CUBLAS.

We found in the numerical experiments that the speedup of using GPU vs. CPU goes up as N increases. For example, the speedup reaches a factor of 15 with $N = 2^{12}$ in the 2-D case, and a factor of 37 with $N = 2^{9}$ in the 3-D case, where N is the number of points in each direction. In addition, computing times of different parts scale consistently with the complexity. In our algorithm, the throughput is regular and the global memory access is completely coalesced, indicating a good performance on the GPU.

To initialize the matrix diagonalization procedure, we need to first compute the eigenvalue and eigenvectors of 1-D problems on the host, then ship these eigenvalues and eigenvectors to the device because they will be used repeatedly in a typical time marching scheme for evolutionary problems.

This paper is organized as follows. Details of the matrix diagonalization method are described in Section 2 using the spectral-collocation method as an example. They provide a background for an in-depth discussion in Section 3 for the GPU parallelization of the same algorithm. In Section 4, we present some benchmark results for solving model elliptic equation in both 2-D and 3-D, and for solving 2-D multiphase flows based on the coupled system of Navier–Stokes equation and Allen–Cahn equation.

Section snippets

Matrix diagonalization method for spectral discretizations

In this section, we describe the matrix diagonalization method for spectral discretizations of separable equations. To fix the idea, we consider the following model equation on a rectangular domain $Ω = {(- 1, 1)}^{d}$ ( $d = 2, 3$ ): $\{\begin{matrix} α u - Δ u = f, in Ω, \\ u |_{\partial Ω} = 0 . \end{matrix}$

Design and implementation of spectral methods on the GPU

We discuss now in detail the design and implementation of the algorithm in the last section on the GPU. There are three basic strategies for optimizing the performance on GPUs (cf. [1], [2]): (i) maximizing memory throughput; (ii) maximizing utilization; and (iii) maximizing instruction throughput. In Section 3.1, we address strategy (i) and describe the overall strategy of the GPU parallelization for our algorithm. In Section 3.2 and 3.3, we focus on strategy (ii), by using both CUBLAS and a

Numerical experiments

In the first subsection, we demonstrate the performance of our spectral-collocation method on the GPU for some model elliptic equations in 2-D and 3-D. In the second subsection, we apply the algorithm to solve the Navier–Stokes equation and the Allen–Cahn equation through an energy stable scheme which leads to a sequence of Poisson type equations at each time step. The detailed information of the CPU and the GPU in use was described at the beginning of Section 3.

Acknowledgments

Authors first would like to thank the referees for helpful suggestions and comments. Feng Chen would like to thank Professor Xiaofeng Yang at the University of South Carolina and Yuhang Tang at Brown University for their help. Authors acknowledge the generous facility support from National Institute for Computational Sciences (NICS) at Oak Ridge National Laboratory and the Brown University Center for Computing and Visualization (CCV).

References (26)

Heiko Bauke et al.
Accelerating the Fourier split operator method via graphics processing units
Computer Physics Communications
(2011)
Nail A. Gumerov et al.
Fast multipole methods on graphics processors
Journal of Computational Physics
(2008)
P. Haldenwang et al.
Chebyshev 3-d spectral and 2-d pseudospectral solvers for the Helmholtz equation
Journal of Computational Physics
(1984)
D. Komatitsch et al.
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
(2010)
C. Liu et al.
A phase field model for the mixture of two incompressible fluids and its approximation by a Fourier-spectral method
Physica D: Nonlinear Phenomena
(2003)
X. Yang et al.
Numerical simulations of jet pinching-off and drop formation using an energetic variational phase-field method
Journal of Computational Physics
(2006)
CUDA C best practices guide. NVIDIA Corporation,...
CUDA C programming guide. NVIDIA Corporation,...
Cuda community showcase, July 2012....
New top500 list, June 2012....

D.M Dang, C.C. Christara, and K.R. Jackson, Parallel implementation on GPUs of ADI finite difference methods for...

D. Gottlieb et al.

The spectrum of the Chebyshev collocation operator for the heat equation

SIAM Journal on Numerical Analysis

(1983)

J.L. Guermond et al.

An overview of projection methods for incompressible flows

Computer Methods in Applied Mechanics and Engineering

(2006)

Cited by (8)

An efficient wavefront parallel algorithm for structured three dimensional LU-SGS
2016, Computers and Fluids
Citation Excerpt :
For meshes with complex geometry and huge number of cells, the CFD simulation is very time consuming and should be performed with high performance parallel computing. Parallel computing is used to solve computation intensive applications simultaneously [5,6]. Large scale applications in science and engineering [7–9] such as turbulent flow simulation [10], molecular dynamics [11], unstructured CFD solver [12], non-linear fractional phenomenon [13–15], multiphase flows [16] rely on parallel computing.
Parallel computing is a useful technology for scientific and engineering algorithms/applications. LU-SGS (lower-upper Symmetric-Gauss–Seidel method) is an efficient and robust scheme for CFD (Computational fluid dynamics) and has strong data dependence in its computation. In this paper, we present an efficient wavefront parallel algorithm for 3D (three dimensional) LU-SGS with structured meshes. The corresponding data structure and memory access method with better data locality and communication optimization is designed. The performances of the presented parallel algorithm are reported with different problem sizes. Some discussion and performance issues are also reported. The results show that the overall performance speedup of one Intel E5540 CPU (4 CPU cores) ranges from 2.23 to 2.95 compared with one E5540 core. The parallel efficiency of 1024, 128 processes are up to 35.68%, 72.69% compared with 32 processes on a distributed memory cluster system. The CFD simulation of M6 wing model shows the effect of the presented parallel algorithm.
A simple GPU implementation of spectral-element methods for solving 3D Poisson type equations on cartesian meshes
2023, arXiv
Optimized BiCGStab Based GPU Accelerated Computation of Incompressible Viscous Flows by the ψ –v Formulation
2017, International Journal of Applied and Computational Mathematics
Novel approaches in implementing the legendre spectral-collocation method using the compute unified device architecture
2016, UPB Scientific Bulletin, Series A: Applied Mathematics and Physics
GPU accelerated flow computation by the streamfunction-velocity (ψ - ν) formulation
2015, AIP Conference Proceedings
Computational challenge of fractional differential equations and the potential solutions: A survey
2015, Mathematical Problems in Engineering

View all citing articles on Scopus

^☆: The research of Feng Chen is partly supported by OSD/AFOSR Grant FA9550-09-1-0613 and AFOSR Grant FA9550-12-1-0463. The research of Jie Shen is partly supported by NSF Grant DMS-1217066 and AFOSR Grant FA9550-11-1-0328.

View full text

A GPU parallelized spectral method for elliptic equations in rectangular domains☆

Abstract

Introduction

Section snippets

Matrix diagonalization method for spectral discretizations

Design and implementation of spectral methods on the GPU

Numerical experiments

Acknowledgments

Computer Physics Communications

Journal of Computational Physics

Journal of Computational Physics

Journal of Computational Physics

Physica D: Nonlinear Phenomena

Journal of Computational Physics

The spectrum of the Chebyshev collocation operator for the heat equation

SIAM Journal on Numerical Analysis

An overview of projection methods for incompressible flows

Computer Methods in Applied Mechanics and Engineering