skip to main content
10.1145/2807591.2807627acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

STELLA: a domain-specific tool for structured grid methods in weather and climate models

Published: 15 November 2015 Publication History

Abstract

Many high-performance computing applications solving partial differential equations (PDEs) can be attributed to the class of kernels using stencils on structured grids. Due to the disparity between floating point operation throughput and main memory bandwidth these codes typically achieve only a low fraction of peak performance. Unfortunately, stencil computation optimization techniques are often hardware dependent and lead to a significant increase in code complexity. We present a domain-specific tool, STELLA, which eases the burden of the application developer by separating the architecture dependent implementation strategy from the user-code and is targeted at multi- and manycore processors. On the example of a numerical weather prediction and regional climate model (COSMO) we demonstrate the usefulness of STELLA for a real-world production code. The dynamical core based on STELLA achieves a speedup factor of 1.8x (CPU) and 5.8x (GPU) with respect to the legacy code while reducing the complexity of the user code.

References

[1]
I. Abrahams and A. Gurtovoy. C++ Template Metaprogramming: Concepts, Tools, And Techniques From Boost And Beyond. The C++ in-Depth Series. Addison Wesley Professional, 2005.
[2]
A. Alexandrescu. Modern C++ Design: Generic Programming and Design Patterns Applied. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2001.
[3]
M. Baldauf. Linear stability analysis of runge--kutta-based partial time-splitting schemes for the euler equations. Monthly Weather Review, 138(4475-4496), 2010.
[4]
M. Baldauf, A. Seifert, J. Förstner, D. Majewski, and M. Raschendorfer. Operational convective-scale numerical weather prediction with the cosmo model: Description and sensitivities. Monthly Weather Review, 139:3387--3905, 2011.
[5]
M. Bianco. An interface for halo exchange pattern, 2012.
[6]
M. Christen, O. Schenk, and H. Burkhart. PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, IPDPS '11, pages 676--687, Washington, DC, USA, 2011. IEEE Computer Society.
[7]
Consortium for Small-Scale Modeling. http://www.cosmo-model.org/.
[8]
Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: A domain specific language for building portable mesh-based PDE solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 9:1--9:12, New York, NY, USA, 2011. ACM.
[9]
G. Doms and U. Schättler. The nonhydrostatic limited-area model LM (Lokal-Modell) of the DWD. Part I: Scientific documentation. Technical report, German Weather Service (DWD), Offenbach, Germany, 1999.
[10]
T. M. Forum. MPI: A message passing interface, 1993.
[11]
O. Fuhrer, C. Osuna, X. Lapillonne, T. Gysi, B. Cumming, M. Bianco, A. Arteaga, and T. Schulthess. Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing frontiers and innovations, 1(1), 2014.
[12]
T. Gysi, T. Grosser, and T. Hoefler. MODESTO: Data-centric analytic optimization of complex stencil programs on heterogeneous architectures. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, pages 177--186, New York, NY, USA, 2015. ACM.
[13]
T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector SIMD architectures. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pages 13--24, New York, NY, USA, 2013. ACM.
[14]
Khronos Group. OpenCL (Open Computing Language). https://www.khronos.org/opencl/.
[15]
S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective automatic parallelization of stencil computations. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pages 235--244, New York, NY, USA, 2007. ACM.
[16]
X. Lapillonne and O. Fuhrer. Using compiler directives to port large scientific applications to GPUs: An example from atmospheric science. Parallel Processing Letters, 24(1):1450003, 2014.
[17]
N. Maruyama, T. Nomura, K. Sato, and S. Matsuoka. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11. ACM, 2011.
[18]
S. Mehta, P.-H. Lin, and P.-C. Yew. Revisiting loop fusion in the polyhedral framework. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pages 233--246, New York, NY, USA, 2014. ACM.
[19]
M. Mernik, J. Heering, and A. M. Sloane. When and how to develop domain-specific languages. ACM Computing Surveys, 37(4):316--344, 2005.
[20]
P. Micikevicius. GPU performance analysis and optimization, 2012.
[21]
NVIDIA. CUDA Parallel Computing Platform. https://developer.nvidia.com/cuda.
[22]
OpenACC Corporation. The OpenACC Application Programing Interface, 2011. http://www.openacc.org/.
[23]
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 519--530, New York, NY, USA, 2013. ACM.
[24]
J. Steppeler, G. Doms, U. Schättler, H. Bitzer, A. Gassmann, U. Damrath, and G. Gregoric. Meso gamma scale forecasts using the nonhydrostatic model LM. Meteor. Atmos. Phys., 82, 2002.
[25]
Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson. The pochoir stencil compiler. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '11, pages 117--128, New York, NY, USA, 2011. ACM.
[26]
The OpenMP ARB. The OpenMP API Specification for Parallel Programming, 2013. http://www.openmp.org.
[27]
R. Torres, L. Linardakis, J. Kunkel, and T. Ludwig. ICON DSL: A domain-specific language for climate modeling.
[28]
R. A. van Engelen. ATMOL: A domain-sepcific language for atmospheric modeling. Journal of Computing and Information Technology, 4(289-303), 2002.
[29]
R. A. van Engelen, L. Wolters, and G. Cats. Ctadel: a generator of multi-platform high-performance codes for PDE-based scientific applications. In Proceedings of the 10th international conference on Supercomputing, pages 86--93, New York, NY, USA, 1996. ACM.
[30]
M. Wahib and N. Maruyama. Scalable kernel fusion for memory-bound gpu applications. In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for, pages 191--202, Nov 2014.
[31]
T. Weusthoff, F. Ament, M. Arpagaus, and M. W. Rotach. Assessing the benefits of convection-permitting models by neighborhood verification: Examples from map d-phase. Monthly Weather Review, 138:3418--3433, 2010.
[32]
L. J. Wicker and W. C. Skamarock. Time-splitting methods for elastic models using forward time schemes. Monthly Weather Review, 130:2088--2097, 2001.
[33]
M. Xue. High-order monotonic numerical diffusion and smoothing. Monthly Weather Review, 128(8):2853--2864, 1999.

Cited By

View all
  • (2024)A shared compilation stack for distributed-memory parallelism in stencil DSLsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651344(38-56)Online publication date: 27-Apr-2024
  • (2024)Moirae: Generating High-Performance Composite Stencil Programs with Global OptimizationsSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00026(1-15)Online publication date: 17-Nov-2024
  • (2024)Automated Code Generation of High-Order Stencils for a Dataflow ArchitectureSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00025(1-13)Online publication date: 17-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2015
985 pages
ISBN:9781450337236
DOI:10.1145/2807591
  • General Chair:
  • Jackie Kern,
  • Program Chair:
  • Jeffrey S. Vetter
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. atmospheric model
  2. domain-specific language
  3. heterogeneous system
  4. stencil

Qualifiers

  • Research-article

Conference

SC15
Sponsor:

Acceptance Rates

SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A shared compilation stack for distributed-memory parallelism in stencil DSLsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651344(38-56)Online publication date: 27-Apr-2024
  • (2024)Moirae: Generating High-Performance Composite Stencil Programs with Global OptimizationsSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00026(1-15)Online publication date: 17-Nov-2024
  • (2024)Automated Code Generation of High-Order Stencils for a Dataflow ArchitectureSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00025(1-13)Online publication date: 17-Nov-2024
  • (2024)Domain-specific implementation of high-order Discontinuous Galerkin methods in spherical geometryComputer Physics Communications10.1016/j.cpc.2023.108993295(108993)Online publication date: Feb-2024
  • (2023)Low-Cost Post Hoc Reconstruction of HPC Simulations at Full Resolution2023 IEEE 13th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV60332.2023.00009(17-21)Online publication date: 23-Oct-2023
  • (2023)Building a domain-specific compiler for emerging processors with a reusable approachScience China Information Sciences10.1007/s11432-022-3727-667:1Online publication date: 27-Dec-2023
  • (2023)Halide Code Generation Framework in PhylanxEuro-Par 2022: Parallel Processing Workshops10.1007/978-3-031-31209-0_3(32-45)Online publication date: 2-May-2023
  • (2023)Language Agnostic Approach for Unification of Implementation Variants for Different Computing DevicesParallel Processing and Applied Mathematics10.1007/978-3-031-30442-2_21(279-290)Online publication date: 28-Apr-2023
  • (2022)DMStag: Staggered, Structured Grids for PETScJournal of Open Source Software10.21105/joss.045317:79(4531)Online publication date: Nov-2022
  • (2022)Implementation of Workflow Engine on BRIN HPC InfrastructureProceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications10.1145/3575882.3575958(393-397)Online publication date: 22-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media