skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm

Journal Article · · IEEE Transactions on Parallel and Distributed Systems

GPU performance of the lattice Boltzmann method (LBM) depends heavily on memory access patterns. When implemented with GPUs on complex domains, typically, geometric data is accessed indirectly and lattice data is accessed lexicographically. Although there are a variety of other options, no study has examined the relative efficacy between them. Here, we examine a suite of memory access schemes via empirical testing and performance modeling. We find strong evidence that semi-direct is often better suited than the more common indirect addressing, providing increased computational speed and reducing memory consumption. For the layout, we find that the Collected Structure of Arrays (CSoA) and bundling layouts outperform the common Structure of Array layout; on V100 and P100 devices, CSoA consistently outperforms bundling, however the relationship is more complicated on K40 devices. When compared to state-of-the-art practices, our recommendations lead to speedups of 10–40 percent and reduce memory consumption up to 17 percent. Using performance modeling and computational experimentation, we determine the mechanisms behind the accelerations. We demonstrate that our results hold across multiple GPUs on two leadership class systems, and present the first near-optimal strong results for LBM with arterial geometries run on GPUs.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1807241
Journal Information:
IEEE Transactions on Parallel and Distributed Systems, Vol. 32, Issue 10; ISSN 1045-9219
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

Similar Records

GPU Data Access on Complex Geometries for D3Q19 Lattice Boltzmann Method
Conference · Tue May 01 00:00:00 EDT 2018 · OSTI ID:1807241

GSoFa: Scalable Sparse Symbolic LU Factorization on GPUs
Journal Article · Fri Apr 01 00:00:00 EDT 2022 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1807241

Evaluating the Performance of Integer Sum Reduction in SYCL on GPUs
Conference · Sun Aug 01 00:00:00 EDT 2021 · OSTI ID:1807241