Elsevier

Journal of Computational Physics

Volume 250, 1 October 2013, Pages 555-564
Journal of Computational Physics

A GPU parallelized spectral method for elliptic equations in rectangular domains

https://doi.org/10.1016/j.jcp.2013.05.031Get rights and content

Abstract

We design and implement a polynomial-based spectral method on graphic processing units (GPUs). The key to success lies in the seamless integration of the matrix diagonalization technique and the new generation CUDA tools. The method is applicable to elliptic equations in rectangular domains with general boundary condition. We show remarkable speedups of up to 15 times in the 2-D case and more than 35 times in the 3-D case.

Introduction

General-purpose computing on graphics processing units (GPGPU) has drawn much attention recently from the scientific computing community. The new list of top 500 supercomputers shows that more than 10% of them are now powered by NVIDIA Tesla GPUs (cf. [4]). Thanks to the high core density and the wide vector width SIMD architecture, using GPGPU can yield performance that is very hard for a conventional CPU to achieve, especially for high and regular throughput workloads (cf. [3]). Therefore, researchers in computational sciences are more and more interested in exploring efficient parallel strategies on GPUs.

There exist many successful GPU implementations of numerical methods for partial differential equations, to name a few, fast multipole methods [10], nodal discontinuous Galerkin methods [12], finite difference methods [6], finite element methods [24], [14], [13], Fourier spectral methods [9], [5], etc. However, to the best of the authors’ knowledge, not much effort, if not at all, are devoted to implementing, on GPUs, non-Fourier based spectral methods for problems with non-periodic boundary conditions. The aim of this paper is to exploit how to efficiently implement spectral methods with GPGPU. In particular, we shall design and implement an efficient matrix diagonalization method with spectral discretizations on the GPU, for solving elliptic and parabolic type equations with non-periodic boundary conditions on a d-dimensional rectangular domain (d=2,3).

It is well-known that, for separable elliptic equations, one can use the so-called matrix diagonalization technique (cf. [16], [11], [18]) to solve the linear systems from a spectral discretization in O(Nd+1) operations, where N is the number of points in each dimension. For time-dependent problems, these solvers are called repeatedly. Fortunately, the core part of this matrix diagonalization process is matrix–matrix multiplications, which can be efficiently parallelized on the GPU. More precisely, matrix–matrix multiplications can be efficiently addressed by the new generation CUDA tool, CUBLAS.

We found in the numerical experiments that the speedup of using GPU vs. CPU goes up as N increases. For example, the speedup reaches a factor of 15 with N=212 in the 2-D case, and a factor of 37 with N=29 in the 3-D case, where N is the number of points in each direction. In addition, computing times of different parts scale consistently with the complexity. In our algorithm, the throughput is regular and the global memory access is completely coalesced, indicating a good performance on the GPU.

To initialize the matrix diagonalization procedure, we need to first compute the eigenvalue and eigenvectors of 1-D problems on the host, then ship these eigenvalues and eigenvectors to the device because they will be used repeatedly in a typical time marching scheme for evolutionary problems.

This paper is organized as follows. Details of the matrix diagonalization method are described in Section 2 using the spectral-collocation method as an example. They provide a background for an in-depth discussion in Section 3 for the GPU parallelization of the same algorithm. In Section 4, we present some benchmark results for solving model elliptic equation in both 2-D and 3-D, and for solving 2-D multiphase flows based on the coupled system of Navier–Stokes equation and Allen–Cahn equation.

Section snippets

Matrix diagonalization method for spectral discretizations

In this section, we describe the matrix diagonalization method for spectral discretizations of separable equations. To fix the idea, we consider the following model equation on a rectangular domain Ω=(-1,1)d (d=2,3):αu-Δu=f,inΩ,u|Ω=0.

Design and implementation of spectral methods on the GPU

We discuss now in detail the design and implementation of the algorithm in the last section on the GPU. There are three basic strategies for optimizing the performance on GPUs (cf. [1], [2]): (i) maximizing memory throughput; (ii) maximizing utilization; and (iii) maximizing instruction throughput. In Section 3.1, we address strategy (i) and describe the overall strategy of the GPU parallelization for our algorithm. In Section 3.2 and 3.3, we focus on strategy (ii), by using both CUBLAS and a

Numerical experiments

In the first subsection, we demonstrate the performance of our spectral-collocation method on the GPU for some model elliptic equations in 2-D and 3-D. In the second subsection, we apply the algorithm to solve the Navier–Stokes equation and the Allen–Cahn equation through an energy stable scheme which leads to a sequence of Poisson type equations at each time step. The detailed information of the CPU and the GPU in use was described at the beginning of Section 3.

Acknowledgments

Authors first would like to thank the referees for helpful suggestions and comments. Feng Chen would like to thank Professor Xiaofeng Yang at the University of South Carolina and Yuhang Tang at Brown University for their help. Authors acknowledge the generous facility support from National Institute for Computational Sciences (NICS) at Oak Ridge National Laboratory and the Brown University Center for Computing and Visualization (CCV).

References (26)

  • D.M Dang, C.C. Christara, and K.R. Jackson, Parallel implementation on GPUs of ADI finite difference methods for...
  • D. Gottlieb et al.

    The spectrum of the Chebyshev collocation operator for the heat equation

    SIAM Journal on Numerical Analysis

    (1983)
  • J.L. Guermond et al.

    An overview of projection methods for incompressible flows

    Computer Methods in Applied Mechanics and Engineering

    (2006)
  • Cited by (8)

    View all citing articles on Scopus

    The research of Feng Chen is partly supported by OSD/AFOSR Grant FA9550-09-1-0613 and AFOSR Grant FA9550-12-1-0463. The research of Jie Shen is partly supported by NSF Grant DMS-1217066 and AFOSR Grant FA9550-11-1-0328.

    View full text