Skip to main content

A Multi-GPU Implementation of a D2Q37 Lattice Boltzmann Code

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2011)

Abstract

We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluster based on Nvidia Fermi processors. We analyze how to optimize the algorithm for GP-GPU architectures, describe the implementation choices that we have adopted and compare our performance results with an implementation optimized for latest generation multi-core CPUs. Our program runs at ≈ 30% of the double-precision peak performance of one GPU and shows almost linear scaling when run on the multi-GPU cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Succi, S.: The Lattice Boltzmann Equation for Fluid Dynamics and Beyond. Oxford University Press (2001)

    Google Scholar 

  2. Wellein, G., Zeiser, T., Hager, G., Donath, S.: On the Single Processor Performance of Simple Lattice Boltzmann Kernels. Computers & Fluids 35, 910–919 (2006)

    Article  MATH  Google Scholar 

  3. Axner, L., et al.: Performance evaluation of a parallel sparse lattice Boltzmann solver. Journal of Computational Physics 227(10), 4895–4911 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  4. Tölke, J.: Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA. Comp. and Vis. in Science (2008)

    Google Scholar 

  5. Tölke, J., Krafczyk, M.: TeraFLOP computing on a desktop PC with GPUs for 3D CFD. Journal of Computational Fluid Dynamics 22(7), 443–456 (2008)

    Article  MATH  Google Scholar 

  6. Habich, J., Zeiser, T., Hager, G., Wellein, G.: Speeding up a Lattice Boltzmann Kernel on nVIDIA GPUs. In: Proc. of PARENG09-S01, Pecs, Hungary (April 2009)

    Google Scholar 

  7. http://www.nvidia.com/object/fermi_architecture.html

  8. http://www2.fz-juelich.de/jsc/judge

  9. Biferale, L., et al.: Lattice Boltzmann fluid-dynamics on the QPACE supercomputer. In: ICCS Proc. 2010, Procedia Computer Science, vol. 1, pp. 1075–1082 (2010)

    Google Scholar 

  10. Biferale, L., et al.: Lattice Boltzmann Method Simulations on Massively Parallel Multi-core Architectures. In: HPC 2011 Proc. (2011)

    Google Scholar 

  11. Biferale, L., et al.: Optimization of Multi-Phase Compressible Lattice Boltzmann Codes on Massively Parallel Multi-Core Systems. In: ICCS 2011 Proc. 2011. Procedia Computer Science, vol. 4, pp. 994–1003 (2011)

    Google Scholar 

  12. Sbragaglia, M., et al.: Lattice Boltzmann method with self-consistent thermo-hydrodynamic equilibria. J. Fluid Mech. 628, 299 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  13. Scagliarini, A., et al.: Lattice Boltzmann Methods for thermal flows: continuum limit and applications to compressible Rayleigh-Taylor systems. Phys. Fluids 22, 055101 (2010)

    Article  Google Scholar 

  14. NVIDIA, NVIDIA CUDA C Programming Guide

    Google Scholar 

  15. Pohl, T., et al.: Optimization and Profiling of the Cache Performance of Parallel Lattice Boltzmann Codes. Parallel Processing Letters 13(4), 549–560 (2003)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Biferale, L. et al. (2012). A Multi-GPU Implementation of a D2Q37 Lattice Boltzmann Code. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2011. Lecture Notes in Computer Science, vol 7203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31464-3_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31464-3_65

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31463-6

  • Online ISBN: 978-3-642-31464-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics