ABSTRACT
Scalability and time-to-solution studies have historically been focused on the size of the problem and run time. We consider a more strict definition of "solution" whereby a live data analysis (co-visualization of either the full data or in situ data extracts) provides continuous and reconfigurable insight into massively parallel simulations. Specifically, we used the Argonne Leadership Class Facility's (ALCF) BlueGene/P machine with 163,840 cores tightly linked through a high-speed network to 100 visualization nodes that share 800 cores and 200 GPUs. Three meshes with respectively 52M, 416M and 3.3B elements discretize the flow over a full swept wing with an unsteady synthetic jet to evaluate time-to-solution plus insight. On the full machine, the 416M element mesh takes about 2 seconds per flow solve step including the extraction and rendering of a slice or a contour, slowing currently the simulation by only 10 and 15% respectively. The 3.3B element case proved scalable at about 15 seconds per time step, whereas PHASTA's strong scaling could compress the time-to-solution for the 52M element case enough to allow the rendering of one frame (slice or contour) every 0.7 second, paving the way for interactive simulation and simulation steering on massively parallel systems1.
Supplemental Material
Available for Download
- D. Ellsworth et al. Concurrent visualization in a production supercomputing environment. Visualization and Computer Graphics, IEEE Transactions on, 12(5):997--1004, 2006. Google ScholarDigital Library
- N. Fabian et al. The paraview coprocessing library: A scalable, general purpose in situ visualization library. In Proceedings of the IEEE-LDAV conference, 2011.Google ScholarCross Ref
- K. E. Jansen. A stabilized finite element method for computing turbulence. Comp. Meth. Appl. Mech. Engng., 174:299--317, 1999.Google ScholarCross Ref
- D. Keyes. Exaflop/s: The why and the how. Comptes Rendus Mécanique, 339(2--3):70--77, 2011.Google Scholar
- O. Sahni et al. Scalable implicit finite element solver for massively parallel processing with demonstration to 160K cores. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 68:1--68:12, 2009. Google ScholarDigital Library
- A. Squillacote. The paraview guide: A parallel visualization application. Technical report, Kitware Inc., ISBN 1-930934-21-1, 2007.Google Scholar
- V. Vishwanath et al. Toward simulation-time data analysis and I/O acceleration on leadership-class systems. Technical report, Preprint ANL/MCS-P1929-0911, 2011.Google Scholar
- C. H. Whiting and K. E. Jansen. A stabilized finite element method for the incompressible Navier-Stokes equations using a hierarchical basis. International Journal of Numerical Methods in Fluids, 35:93--116, 2001.Google ScholarCross Ref
Index Terms
Electronic poster: co-visualization of full data and in situ data extracts from unstructured grid cfd at 160k cores
Recommendations
Electronic poster: a massively parallel lattice Monte Carlo algorithm in CUDA for thermal conduction simulations
SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis CompanionWe present a highly parallel CUDA kernel based on the Lattice Monte Carlo (LMC) method for transient thermal conduction, which achieves a peak acceleration of more than 100x over a single-threaded Fortran version. A number of memory and branching ...
A Predictor-Corrector Technique for Visualizing Unsteady Flow
We present a method for visualizing unsteady flow by displaying its vortices. The vortices are identified by using a vorticity-predictor pressure-corrector scheme that follows vortex cores. The cross-sections of a vortex at each point along the core can ...
POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationMassively multithreaded GPUs achieve high throughput by running thousands of threads in parallel. To fully utilize the hardware, contemporary workloads spawn work to the GPU in bulk by launching large tasks, where each task is a kernel that contains ...
Comments