skip to main content
10.1145/3492805.3492811acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

Performance Evaluation of Lattice Boltzmann Method for Fluid Simulation on A64FX Processor and Supercomputer Fugaku

Published: 07 January 2022 Publication History

Abstract

The lattice Boltzmann method has recently become popular as an alternative to Navier-Stokes solvers for large-scale fluid simulations. We conduct a performance study of the lattice Boltzmann method on the A64FX Arm-based processor of the supercomputer Fugaku. We compared four types of data layouts: SoA, AoS, Clusterd SoA (CSoA), and CSoA2, and three algorithms for the LBM streaming step: Pull, Push, and Swap schemes. The performance measurement on a single CMG (Core Memory Group) shows that the combination of the CSoA2 layout and the Swap scheme has the highest performance of 176 GFLOP, which corresponds to 11.5% of the single-precision peak performance. Our simulations have demonstrated good weak scaling up to 16,384 nodes and achieved high performance 10.9 PFLOPS in single precision. The strong scalability is also a good result, with parallel efficiencies of 63.9%, 68.3% and 72.7 % for the D3Q15, D3Q19 and D3Q27 velocity model, respectively when scaling from 512 to 16,384 nodes.

References

[1]
Christie Alappat, Jan Laukemann, Thomas Gruber, Georg Hager, Gerhard Wellein, Nils Meyer, and Tilo Wettig. 2020. Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX. In 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 1–7. https://doi.org/10.1109/PMBS51919.2020.00006
[2]
Peter Bailey, Joe Myre, Stuart D.C. Walsh, David J. Lilja, and Martin O. Saar. 2009. Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors. In 2009 International Conference on Parallel Processing. 550–557. https://doi.org/10.1109/ICPP.2009.38
[3]
Massimo Bernaschi, Mauro Bisson, Toshio Endo, Satoshi Matsuoka, Massimiliano Fatica, and Simone Melchionna. 2011. Petaflop Biofluidics Simulations on a Two Million-Core System. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (Seattle, Washington) (SC ’11). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2063384.2063389
[4]
Prabhu Lal Bhatnagar, Eugene P Gross, and Max Krook. 1954. A model for collision processes in gases. I. Small amplitude processes in charged and neutral one-component systems. Physical Reviews 94(1954), 511–525. Issue 3. https://doi.org/10.1103/PhysRev.94.511
[5]
Enrico Calore, Alessandro Gabbana, Sebastiano Fabio Schifano, and Raffaele Tripiccione. 2018. Early Experience on Using Knights Landing Processors for Lattice Boltzmann Applications. In Parallel Processing and Applied Mathematics, Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, and Konrad Karczewski (Eds.). Springer International Publishing, Cham, 519–530. https://doi.org/10.1007/978-3-319-78024-5_45
[6]
Dominique d’Humières. 2002. Multiple–relaxation–time lattice Boltzmann models in three dimensions. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 360, 1792 (2002), 437–451. https://doi.org/10.1098/rsta.2001.0955
[7]
Abbas Fakhari, Martin Geier, and Taehun Lee. 2016. A mass-conserving lattice Boltzmann method with dynamic grid refinement for immiscible two-phase flows. J. Comput. Phys. 315(2016), 434–457. https://doi.org/10.1016/j.jcp.2016.03.058
[8]
Hui Gao, Hui Li, and Lian-Ping Wang. 2013. Lattice Boltzmann simulation of turbulent flow laden with finite-size particles. Computers & Mathematics with Applications 65, 2 (2013), 194–210. https://doi.org/10.1016/j.camwa.2011.06.028
[9]
Martin Geier, Andreas Greiner, and Jan G Korvink. 2006. Cascaded digital lattice Boltzmann automata for high Reynolds number flow. Physical Review E 73, 6 (2006), 066705. https://doi.org/10.1103/PhysRevE.73.066705
[10]
Martin Geier, Andrea Pasquali, and Martin Schönherr. 2017. Parametrization of the cumulant lattice Boltzmann method for fourth order accurate diffusion part I: Derivation and validation. J. Comput. Phys. 348(2017), 862–888. https://doi.org/10.1016/j.jcp.2017.07.004
[11]
Martin Geier, Martin Schönherr, Andrea Pasquali, and Manfred Krafczyk. 2015. The cumulant lattice Boltzmann equation in three dimensions: Theory and validation. Computers & Mathematics with Applications 70, 4 (2015), 507–547. https://doi.org/10.1016/j.camwa.2015.05.001
[12]
Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler, and Ulrich Rüde. 2013. A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC ’13). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2503210.2503273
[13]
Adrian Jackson, Michele Weiland, Nick Brown, Andrew Turner, and Mark Parsons. 2020. Investigating Applications on the A64FX. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE Computer Society, Los Alamitos, CA, USA, 549–558. https://doi.org/10.1109/CLUSTER49012.2020.00078
[14]
Chisachi Kato, Yoshinobu Yamade, Katsuhiro Nagano, Kiyoshi Kumahata, Kazuo Minami, and Tatsuo Nishikawa. 2020. Toward Realization of Numerical Towing-Tank Tests by Wall-Resolved Large Eddy Simulation Based on 32 Billion Grid Finite-Element Computation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, Article 3.
[15]
Yoshiaki Kuwata and Kazuhiko Suga. 2015. Anomaly of the lattice Boltzmann methods in three-dimensional cylindrical flows. J. Comput. Phys. 280(2015), 563–569. https://doi.org/10.1016/j.jcp.2014.10.002
[16]
Keijo Mattila, Jari Hyväluoma, Tuomo Rossi, Mats Aspnäs, and Jan Westerholm. 2007. An efficient swap algorithm for the lattice Boltzmann method. Computer Physics Communications 176, 3 (2007), 200–210. https://doi.org/10.1016/j.cpc.2006.09.005
[17]
Benjamin Michalowicz, Eric Raut, Yan Kang, Tony Curtis, Barbara Chapman, and Dossay Oryspayev. 2021. Comparing the Behavior of OpenMP Implementations with Various Applications on Two Different Fujitsu A64FX Platforms. In Practice and Experience in Advanced Research Computing (Boston, MA, USA) (PEARC ’21). Association for Computing Machinery, New York, NY, USA, Article 28. https://doi.org/10.1145/3437359.3465592
[18]
Naoyuki Onodera and Yasuhiro Idomura. 2018. Acceleration of wind simulation using locally mesh-refined lattice Boltzmann method on GPU-rich supercomputers. In Proceedings of Asian Conference on Supercomputing Frontiers. 128–145.
[19]
Amanda Randles, Erik W. Draeger, Tomas Oppelstrup, Liam Krauss, and John A. Gunnels. 2015. Massively Parallel Models of the Human Circulatory System. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(Austin, Texas) (SC ’15). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2807591.2807676
[20]
Pablo R Rinaldi, EA Dari, Marcelo J Vénere, and Alejandro Clausse. 2012. A Lattice-Boltzmann solver for 3D fluid simulation on GPU. Simulation Modelling Practice and Theory 25 (2012), 163–171. https://doi.org/10.1016/j.simpat.2012.03.004
[21]
Fredrik Robertsén, Keijo Mattila, and Jan Westerholm. 2019. High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor. Concurrency and Computation: Practice and Experience 31, 13 (7 2019). https://doi.org/10.1002/cpe.5072
[22]
Aniruddha G. Shet, Shahajhan H. Sorathiya, Siddharth Krithivasan, Anand M. Deshpande, Bharat Kaul, Sunil D. Sherlekar, and Santosh Ansumali. 2013. Data structure and movement for lattice-based simulations. Phys. Rev. E 88 (Jul 2013), 013314. Issue 1. https://doi.org/10.1103/PhysRevE.88.013314
[23]
Pedro Valero-Lara. 2017. Reducing memory requirements for large size LBM simulations on GPUs. Concurrency and Computation: Practice and Experience 29 (06 2017), e4221. https://doi.org/10.1002/cpe.4221
[24]
Werner Verdier, Pierre Kestener, and Alain Cartalade. 2020. Performance portability of lattice Boltzmann methods for two-phase flows with phase change. Computer Methods in Applied Mechanics and Engineering 370 (2020), 113266. https://doi.org/10.1016/j.cma.2020.113266
[25]
Xian Wang and Takayuki Aoki. 2011. Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster. Parallel Comput. 37, 9 (2011), 521–535. https://doi.org/10.1016/j.parco.2011.02.007
[26]
Seiya Watanabe and Takayuki Aoki. 2021. Large-scale flow simulations using lattice Boltzmann method with AMR following free-surface on multiple GPUs. Computer Physics Communications 264 (2021), 107871. https://doi.org/10.1016/j.cpc.2021.107871
[27]
Gerhard Wellein, Thomas Zeiser, Georg Hager, and Stefan Donath. 2006. On the single processor performance of simple lattice Boltzmann kernels. Computers & Fluids 35, 8 (2006), 910–919. https://doi.org/10.1016/j.compfluid.2005.02.008
[28]
Dieter A Wolf-Gladrow. 2000. Lattice-gas cellular automata and lattice Boltzmann models: An Introduction. Springer Science & Business Media.
[29]
Qingang Xiong, Bo Li, Ji Xu, Xiaowei Wang, Limin Wang, and Wei Ge. 2012. Efficient 3D DNS of gas–solid flows on Fermi GPGPU. Computers & Fluids 70(2012), 86–94. https://doi.org/10.1016/j.compfluid.2012.08.026

Index Terms

  1. Performance Evaluation of Lattice Boltzmann Method for Fluid Simulation on A64FX Processor and Supercomputer Fugaku
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          HPCAsia '22: International Conference on High Performance Computing in Asia-Pacific Region
          January 2022
          145 pages
          ISBN:9781450384988
          DOI:10.1145/3492805
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 07 January 2022

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. A64FX
          2. Fugaku
          3. data structure
          4. large scale CFD simulation
          5. lattice Boltzmann method

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Funding Sources

          • HPCI System Research Project

          Conference

          HPC Asia2022

          Acceptance Rates

          Overall Acceptance Rate 69 of 143 submissions, 48%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 248
            Total Downloads
          • Downloads (Last 12 months)29
          • Downloads (Last 6 weeks)3
          Reflects downloads up to 17 Feb 2025

          Other Metrics

          Citations

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media