skip to main content
10.1145/3578178.3578220acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

Fault Tolerance for Ensemble-based Molecular-Continuum Flow Simulations

Published:27 February 2023Publication History

ABSTRACT

Molecular dynamics (MD) simulations exhibit big computational efforts, which makes them very time-consuming. This particularly holds for molecular-continuum simulations in fluid dynamics, which rely on the simulation of MD ensembles that are coupled to computational fluid dynamics (CFD) solvers. Massively parallel implementations for MD simulations and the respective ensembles are therefore of utmost importance.

However, the more processors are used for the molecular-continuum simulation, the higher the probability of software- and hardware-induced failures or malfunctions of one processor becomes, which may lead to the issue that the entire simulation crashes. To avoid long re-calculation times for the simulation, a fault tolerance mechanism is required, especially considering respective simulations carried out at the exascale.

In this paper, we introduce a fault tolerance method for molecular-continuum simulations implemented in the macro-micro-coupling tool (MaMiCo), an open-source coupling tool for such multiscale simulations which allows the re-use of one’s favorite MD and CFD solvers. The method makes use of a dynamic ensemble handling approach that has been used previously to estimate statistical errors due to thermal fluctuations in the MD ensemble. The dynamic ensemble is always homogeneously distributed and, thus, balanced on the computational resources to minimize the overall induced overhead overhead. The method further relies on an MPI implementation with fault tolerance support. We report scalability results with and without modeled system failures on three TOP500 supercomputers—Fugaku/RIKEN with ARM technology, Hawk/HLRS with AMD EPYC technology and HSUper/Helmut Schmidt University with Intel Icelake processors—to demonstrate the feasibility of our approach.

References

  1. Bilge Acun, David Joseph Hardy, Laxmikant Vasudeo Kale, K. Li, James Christopher Phillips, and John E. Stone. 2018. Scalable molecular dynamics with NAMD on the Summit system. IBM Journal of Research and Development 62, 6 (2018), 4:1–4:9. https://doi.org/10.1147/JRD.2018.2888986Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Leopold Grinberg, Joseph A. Insley, Dmitry A. Fedosov, Vitali Morozov, Michael E. Papka, and George Em Karniadakis. 2012. Tightly Coupled Atomistic-Continuum Simulations of Brain Blood Flow on Petaflop Supercomputers. Computing in Science & Engineering 14, 6 (2012), 58–67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Saurabh Gupta, Tirthak Patel, Christian Engelmann, and Devesh Tiwari. 2017. Failures in Large Scale Systems: Long-Term Measurement, Analysis, and Implications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC ’17). Association for Computing Machinery, New York, NY, USA, Article 44, 12 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nicolas G. Hadjiconstantinou, Alejandro L. Garcia, Martin Z. Bazant, and Gang He. 2003. Statistical error in particle simulations of hydrodynamic phenomena. J. Comput. Phys. 187(2003), 274–297.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vahid Jafari, Niklas Wittmer, and Philipp Neumann. 2022. Massively Parallel Molecular-Continuum Flow Simulation with Error Control and Dynamic Ensemble Handling. In International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022). 52–60.Google ScholarGoogle Scholar
  6. Marco Kalweit and Dimitris Drikakis. 2008. Multiscale Methods for Micro/Nano Flows and Materials. J. Comput. Theor. Nanosci. 5, 9 (2008), 1923–1938.Google ScholarGoogle ScholarCross RefCross Ref
  7. Timm Krüger, Halim Kusumaatmaja, Alexandr Kuzmin, Orest Shardt, Goncalo Silva, and Erlend Magnus Viggen. 2017. The lattice Boltzmann method. Springer.Google ScholarGoogle Scholar
  8. I. Laguna, D.F. Richards, T. Gamblin, M. Schulz, and B.R. de Supinski. 2014. Evaluating User-Level Fault Tolerance for MPI Applications. In Proceedings of the 21st European MPI Users’ Group Meeting (Kyoto, Japan) (EuroMPI/ASIA ’14). Association for Computing Machinery, New York, NY, USA, 57–62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Khaled M. Mohamed and Abdulmajeed Mohamad. 2010. A review of the development of hybrid atomistic–continuummethods for dense fluids. Microfluid. Nanofluid. 8(2010), 283–302.Google ScholarGoogle ScholarCross RefCross Ref
  10. Philipp Neumann and Xin Bian. 2017. MaMiCo: Transient Multi-Instance Molecular-Continuum Flow Simulation on Supercomputers. Comput. Phys. Commun. 220 (2017), 390–402.Google ScholarGoogle ScholarCross RefCross Ref
  11. Philipp Neumann, Hanno Flohr, Rahul Arora, Piet Jarmatz, Nikola Tchipev, and Hans-Joachim Bungartz. 2016. MaMiCo: Software design for parallel molecular-continuum flow simulations. Comput. Phys. Commun. 200 (2016), 324–335.Google ScholarGoogle ScholarCross RefCross Ref
  12. Xiaobo Nie, Shiyi Chen, W. E, and Mark Robbins. 2004. A continuum and molecular dynamics hybrid method for micro-and nano-fluid flow. J. Fluid Mech. 500(2004), 55–64.Google ScholarGoogle ScholarCross RefCross Ref
  13. Xiaobo Nie, Mark O. Robbins, and Shiyi Chen. 2006. Resolving Singular Forces in Cavity Flow: Multiscale Modeling from Atomic to Millimeter Scales. Phys. Rev. Lett. 96(2006), 134501. Issue 13.Google ScholarGoogle ScholarCross RefCross Ref
  14. Christoph Niethammer, Stefan Becker, Martin Bernreuther, Martin Buchholz, Wolfgang Eckhardt, Alexander Heinecke, Stephan Werth, Hans-Joachim Bungartz, Colin W. Glass, Hans Hasse, Jadran Vrabec, and Martin Horsch. 2014. ls1 mardyn: The massively parallel molecular dynamics code for large systems. Journal of Chemical Theory and Computation 10, 10 (2014), 4455–4464.Google ScholarGoogle ScholarCross RefCross Ref
  15. Aidan P. Thompson, H. Metin Aktulga, Richard Berger, Dan S. Bolintineanu, W. Michael Brown, Paul S. Crozier, Pieter J. in ’t Veld, Axel Kohlmeyer, Stan G. Moore, Trung Dac Nguyen, Ray Shan, Mark J. Stevens, Julien Tranchida, Christian Trott, and Steven J. Plimpton. 2022. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Computer Physics Communications 271 (2022), 10817.Google ScholarGoogle ScholarCross RefCross Ref
  16. Shalini Yajnik and Niraj K. Jha. 1994. Synthesis of fault tolerant architectures for molecular dynamics. In Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS ’94, Vol. 4. 247–250 vol.4.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Fault Tolerance for Ensemble-based Molecular-Continuum Flow Simulations
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                HPCAsia '23: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
                February 2023
                161 pages
                ISBN:9781450398053
                DOI:10.1145/3578178

                Copyright © 2023 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 27 February 2023

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed limited

                Acceptance Rates

                HPCAsia '23 Paper Acceptance Rate15of34submissions,44%Overall Acceptance Rate69of143submissions,48%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format