Experience on Vectorizing Lattice Boltzmann Kernels for Multi- and Many-Core Architectures

Calore, Enrico; Demo, Nicola; Schifano, Sebastiano Fabio; Tripiccione, Raffaele

doi:10.1007/978-3-319-32149-3_6

Enrico Calore^7,8,
Nicola Demo⁷,
Sebastiano Fabio Schifano^7,8 &
…
Raffaele Tripiccione^7,8

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9573))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1251 Accesses
9 Citations

Abstract

Current development trends of fast processors calls for an increasing number of cores, each core featuring wide vector processing units. Applications must then exploit both directions of parallelism to run efficiently. In this work we focus on the efficient use of vector instructions. These process several data-elements in parallel, and memory data layout plays an important role to make this efficient. An optimal memory-layout depends in principle on the access patterns of the algorithm but also on the architectural features of the processor. However, different parts of the application may have different requirements, and then the choice of the most efficient data-structure for vectorization has to be carefully assessed. We address these problems for a Lattice Boltzmann (LB) code, widely used in computational fluid-dynamics. We consider a state-of-the-art two-dimensional LB model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid. We write our codes in C and expose vector parallelism using directive-based programming approach. We consider different data layouts and analyze the corresponding performance. Our results show that, if an appropriate data layout is selected, it is possible to write a code for this class of applications that is automatically vectorized and performance portable on several architectures. We end up with a single code that runs efficiently onto traditional multi-core processors as well as on recent many-core systems such as the Xeon-Phi.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Williams, S., et al.: Lattice Boltzmann simulation optimization on leading multicore platforms. In: IEEE International Symposium on Parallel and Distributed Processing (2008). doi:10.1109/IPDPS.2008.4536295
Williams, S., et al.: Optimization of a Lattice Boltzmann computation on state-of-the-art multicore platforms. J. Parallel Distrib. Comput. 69, 762–777 (2009). doi:10.1016/j.jpdc.2009.04.002
Article Google Scholar
Bernaschi, M., et al.: A flexible high-performance Lattice Boltzmann GPU code for the simulations of fluid flows in complex geometries. Concurrency Comput. Pract. Experience 22(1) (2010). doi:10.1002/cpe.1466
Article Google Scholar
Ye, Z.: Lattice Boltzmann based PDE solver on the GPU. Vis. J. 24(5), 323–333 (2008). doi:10.1007/s00371-007-0191-y
Article Google Scholar
Bondhugula, U., et al.: A practical and automatic polyhedral program optimization system. In: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (2008). doi:10.1145/1375581.1375595
Tang, Y., et al.: The pochoir stencil compiler. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (2011). doi:10.1145/1989493.1989508
Wittmann, M., et al.: Comparison of different Propagation Steps for the Lattice Boltzmann Method, 3 November 2011. arXiv:1111.0922vI
Shet, A.G., et al.: Data structure and movement for lattice-based simulations. Phys. Rev. E 88, 013314 (2013). doi:10.1103/PhysRevE.88.013314
Shet, A.G., et al.: On vectorization for lattice based simulations. Int. J. Mod. Phys. C 24, 1340011 (2013). doi:10.1142/S0129183113400111
Article MathSciNet Google Scholar
Succi, S.: The Lattice-Boltzmann Equation. Oxford University Press, Oxford (2001)
MATH Google Scholar
Sbragaglia, M., et al.: Lattice Boltzmann method with self-consistent thermo-hydrodynamic equilibria. J. Fluid Mech. 628, 299–309 (2009). doi:10.1017/S002211200900665X
Article MathSciNet MATH Google Scholar
Scagliarini, A., et al.: Lattice Boltzmann methods for thermal flows: Continuum limit and applications to compressible Rayleigh-Taylor systems. Phys. Fluids 22(5), 055101 (2010). doi:10.1063/1.3392774
Article MATH Google Scholar
Biferale, L., et al.: Second-order closure in stratified turbulence: simulations and modeling of bulk and entrainment regions. Phys. Rev. E 84(1), 016305 (2011). doi:10.1103/PhysRevE.84.016305
Article Google Scholar
Biferale, L., et al.: Reactive Rayleigh-Taylor systems: front propagation and non-stationarity. EPL (Europhys. Lett.) 94(5), 54004 (2011). doi:10.1209/0295-5075/94/54004
Article Google Scholar
McCalpin, J.: The STREAM Benchmark: Computer Memory Bandwidth. http://www.streambench.org/
Mantovani, F., et al.: Exploiting parallelism in many-core architectures: Lattice Boltzmann models as a test case. J. Phys. Conf. Ser. 454, 012015 (2013). doi:10.1088/1742-6596/454/1/012015
Article Google Scholar
Mantovani, F., et al.: Performance issues on many-core processors: a D2Q37 Lattice Boltzmann scheme as a test-case. Comp. Fluids 88 (2013). doi:10.1016/j.compfluid.2013.05.014
Article MathSciNet Google Scholar
Crimi, G., et al.: Early experience on porting and running a Lattice Boltzmann code on the Xeon-phi co-processor. Proc. Comput. Sci. 18, 551–560 (2013). doi:10.1016/j.procs.2013.05.219
Article Google Scholar
Biferale, L., et al.: An optimized D2Q37 Lattice Boltzmann code on GP-GPUs. Comput. Fluids 80 (2013). doi:10.1016/j.compfluid.2012.06.003
Article MathSciNet Google Scholar
Biferale, L., et al.: A multi-GPU implementation of a D2Q37 Lattice Boltzmann code. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 640–650. Springer, Heidelberg (2012)
Chapter Google Scholar
Kraus, J., et al.: Benchmarking GPUs with a parallel Lattice-Boltzmann code. In: Proceedings of Computer Architecture and High Performance Computing (SBAC-PAD), pp. 160–167 (2013). doi:10.1109/SBAC-PAD.2013.37

Download references

Acknowledgements

This work was done in the framework of the COKA, COSA and SUMA projects of INFN. We would like to thank CINECA (Italy) for access to their HPC systems.

Author information

Authors and Affiliations

Università di Ferrara, Ferrara, Italy
Enrico Calore, Nicola Demo, Sebastiano Fabio Schifano & Raffaele Tripiccione
INFN Ferrara, Ferrara, Italy
Enrico Calore, Sebastiano Fabio Schifano & Raffaele Tripiccione

Authors

Enrico Calore
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Demo
View author publications
You can also search for this author in PubMed Google Scholar
Sebastiano Fabio Schifano
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele Tripiccione
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastiano Fabio Schifano .

Editor information

Editors and Affiliations

Czestochowa University of Technolog, Czestochowa, Poland
Roman Wyrzykowski
Department of Computer Science, University of Southern California, Marina Del Rey, California, USA
Ewa Deelman
Electrical Engineering & Comput. Science, University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra
Czestochowa University of Technology, Institute of Computer & Information Sci., Czestochowa, Poland
Konrad Karczewski
Department of Computer Science, AGH University of Science and Technology, Krakow, Poland
Jacek Kitowski
Systèmes d’informations, Big Data et Rec, AGH University of Science and Technology, Krakow, Poland
Kazimierz Wiatr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Calore, E., Demo, N., Schifano, S.F., Tripiccione, R. (2016). Experience on Vectorizing Lattice Boltzmann Kernels for Multi- and Many-Core Architectures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science(), vol 9573. Springer, Cham. https://doi.org/10.1007/978-3-319-32149-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-32149-3_6
Published: 02 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32148-6
Online ISBN: 978-3-319-32149-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics