A Multi-GPU Implementation of a D2Q37 Lattice Boltzmann Code

Biferale, Luca; Mantovani, Filippo; Pivanti, Marcello; Pozzati, Fabio; Sbragaglia, Mauro; Scagliarini, Andrea; Schifano, Sebastiano Fabio; Toschi, Federico; Tripiccione, Raffaele

doi:10.1007/978-3-642-31464-3_65

Luca Biferale¹⁹,
Filippo Mantovani²⁰,
Marcello Pivanti²¹,
Fabio Pozzati²²,
Mauro Sbragaglia¹⁹,
Andrea Scagliarini²³,
Sebastiano Fabio Schifano²¹,
Federico Toschi^24,25 &
…
Raffaele Tripiccione²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7203))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

2127 Accesses
8 Citations

Abstract

We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluster based on Nvidia Fermi processors. We analyze how to optimize the algorithm for GP-GPU architectures, describe the implementation choices that we have adopted and compare our performance results with an implementation optimized for latest generation multi-core CPUs. Our program runs at ≈ 30% of the double-precision peak performance of one GPU and shows almost linear scaling when run on the multi-GPU cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Succi, S.: The Lattice Boltzmann Equation for Fluid Dynamics and Beyond. Oxford University Press (2001)
Google Scholar
Wellein, G., Zeiser, T., Hager, G., Donath, S.: On the Single Processor Performance of Simple Lattice Boltzmann Kernels. Computers & Fluids 35, 910–919 (2006)
Article MATH Google Scholar
Axner, L., et al.: Performance evaluation of a parallel sparse lattice Boltzmann solver. Journal of Computational Physics 227(10), 4895–4911 (2008)
Article MathSciNet MATH Google Scholar
Tölke, J.: Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA. Comp. and Vis. in Science (2008)
Google Scholar
Tölke, J., Krafczyk, M.: TeraFLOP computing on a desktop PC with GPUs for 3D CFD. Journal of Computational Fluid Dynamics 22(7), 443–456 (2008)
Article MATH Google Scholar
Habich, J., Zeiser, T., Hager, G., Wellein, G.: Speeding up a Lattice Boltzmann Kernel on nVIDIA GPUs. In: Proc. of PARENG09-S01, Pecs, Hungary (April 2009)
Google Scholar
http://www.nvidia.com/object/fermi_architecture.html
http://www2.fz-juelich.de/jsc/judge
Biferale, L., et al.: Lattice Boltzmann fluid-dynamics on the QPACE supercomputer. In: ICCS Proc. 2010, Procedia Computer Science, vol. 1, pp. 1075–1082 (2010)
Google Scholar
Biferale, L., et al.: Lattice Boltzmann Method Simulations on Massively Parallel Multi-core Architectures. In: HPC 2011 Proc. (2011)
Google Scholar
Biferale, L., et al.: Optimization of Multi-Phase Compressible Lattice Boltzmann Codes on Massively Parallel Multi-Core Systems. In: ICCS 2011 Proc. 2011. Procedia Computer Science, vol. 4, pp. 994–1003 (2011)
Google Scholar
Sbragaglia, M., et al.: Lattice Boltzmann method with self-consistent thermo-hydrodynamic equilibria. J. Fluid Mech. 628, 299 (2009)
Article MathSciNet MATH Google Scholar
Scagliarini, A., et al.: Lattice Boltzmann Methods for thermal flows: continuum limit and applications to compressible Rayleigh-Taylor systems. Phys. Fluids 22, 055101 (2010)
Article Google Scholar
NVIDIA, NVIDIA CUDA C Programming Guide
Google Scholar
Pohl, T., et al.: Optimization and Profiling of the Cache Performance of Parallel Lattice Boltzmann Codes. Parallel Processing Letters 13(4), 549–560 (2003)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of Tor Vergata and INFN, Roma, Italy
Luca Biferale & Mauro Sbragaglia
Deutsches Elektronen Synchrotron (DESY), Zeuthen, Germany
Filippo Mantovani
University of Ferrara and INFN, Ferrara, Italy
Marcello Pivanti, Sebastiano Fabio Schifano & Raffaele Tripiccione
Fondazione Bruno Kessler Trento, Trento, Italy
Fabio Pozzati
University of Barcelona, Barcelona, Spain
Andrea Scagliarini
Eindhoven University of Technology, The Netherlands
Federico Toschi
CNR-IAC, Rome, Italy
Federico Toschi

Authors

Luca Biferale
View author publications
You can also search for this author in PubMed Google Scholar
Filippo Mantovani
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Pivanti
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Pozzati
View author publications
You can also search for this author in PubMed Google Scholar
Mauro Sbragaglia
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Scagliarini
View author publications
You can also search for this author in PubMed Google Scholar
Sebastiano Fabio Schifano
View author publications
You can also search for this author in PubMed Google Scholar
Federico Toschi
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele Tripiccione
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer and Information Science, Czestochowa University of Technology, Dabrowskiego 69, 42-201, Czestochowa, Poland
Roman Wyrzykowski & Konrad Karczewski &
Electrical Engineering and Computer Science Department, University of Tennessee, 1122 Volunteer Blvd, 37996-3450, Knoxville, TN, USA
Jack Dongarra
Department of Informatics and Mathematical Modeling, Technical University of Denmark, Richard Petersens Plads, Building 321, 2800, Kongens Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Biferale, L. et al. (2012). A Multi-GPU Implementation of a D2Q37 Lattice Boltzmann Code. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2011. Lecture Notes in Computer Science, vol 7203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31464-3_65

Download citation

DOI: https://doi.org/10.1007/978-3-642-31464-3_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31463-6
Online ISBN: 978-3-642-31464-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics