research-article

Porting the COSMO Weather Model to Manycore CPUs

Authors:
Felix Thaler

Swiss National GSupercomputing Centre, CSCS, Zurich, Switzerland

Swiss National GSupercomputing Centre, CSCS, Zurich, Switzerland
View Profile

,
Stefan Moosbrugger

Federal Institute of Meteorology and Climatology, MeteoSwiss, Zurich, Switzerland

Federal Institute of Meteorology and Climatology, MeteoSwiss, Zurich, Switzerland
View Profile

,
Carlos Osuna

Federal Institute of Meteorology and Climatology, MeteoSwiss, Zurich, Switzerland

Federal Institute of Meteorology and Climatology, MeteoSwiss, Zurich, Switzerland
View Profile

,
Mauro Bianco

Swiss National Supercomputing Centre, CSCS, Lugano, Switzerland

Swiss National Supercomputing Centre, CSCS, Lugano, Switzerland
View Profile

,
Hannes Vogt

Swiss National Supercomputing Centre, CSCS, Zurich, Switzerland

Swiss National Supercomputing Centre, CSCS, Zurich, Switzerland
View Profile

,
Anton Afanasyev

Swiss National Supercomputing Centre, CSCS, Zurich, Switzerland

Swiss National Supercomputing Centre, CSCS, Zurich, Switzerland
View Profile

,
Lukas Mosimann

Swiss National Supercomputing Centre, CSCS, Zurich, Switzerland

Swiss National Supercomputing Centre, CSCS, Zurich, Switzerland
View Profile

,
Oliver Fuhrer

Federal Institute of Meteorology and Climatology, MeteoSwiss, Zurich, Switzerland

Federal Institute of Meteorology and Climatology, MeteoSwiss, Zurich, Switzerland
View Profile

,
Thomas C. Schulthess

Swiss National Supercomputing Centre, CSCS, Zurich, Switzerland

Swiss National Supercomputing Centre, CSCS, Zurich, Switzerland
View Profile

,
Torsten Hoefler

Scalable Parallel Computing Lab, ETH Zurich, Zurich, Switzerland

Scalable Parallel Computing Lab, ETH Zurich, Zurich, Switzerland
View Profile

PASC '19: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2019Article No.: 13Pages 1–11https://doi.org/10.1145/3324989.3325723

Published:12 June 2019Publication History

PASC '19: Proceedings of the Platform for Advanced Scientific Computing Conference

Pages 1–11

ABSTRACT

Weather and climate simulations are a major application driver in high-performance computing (HPC). With the end of Dennard scaling and Moore's law, the HPC industry increasingly employs specialized computation accelerators to increase computational throughput. Manycore architectures, such as Intel's Knights Landing (KNL), are a representative example of future processing devices. However, software has to be modified to use these devices efficiently. In this work, we demonstrate how an existing domain-specific language that has been designed for CPUs and GPUs can be extended to Manycore architectures such as KNL. We achieve comparable performance to the NVIDIA Tesla P100 GPU architecture on hand-tuned representative stencils of the dynamical core of the COSMO weather model and its radiation code. Further, we present performance within a factor of two of the P100 of the full DSL-based GPU-optimized COSMO dycore code. We find that optimizing code to full performance on modern manycore architectures requires similar effort and hardware knowledge as for GPUs. Further, we show limitations of the present approaches, and outline our lessons learned and possible principles for design of future DSLs for accelerators in the weather and climate domain.

References

Samantha V. Adams, Rupert W. Ford, M. Hambley, J. M. Hobson, I. Kavcic, C. M. Maynard, T. Melvin, Eike Hermann Müller, S. Mullerworth, A. R. Porter, Mike Rezny, Ben Shipway, and R. Wong. 2018. LFRic: Meeting the challenges of scalability and performance portability in Weather and Climate models. CoRR abs/1809.07267 (2018). arXiv:1809.07267 http://arxiv.org/abs/1809.07267Google Scholar
Valentin Clement, Sylvaine Ferrachat, Oliver Fuhrer, Xavier Lapillonne, Carlos E. Osuna, Robert Pincus, Jon Rood, and William Sawyer. 2018. The CLAW DSL: Abstractions for Performance Portable Weather and Climate Models. In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC '18). ACM, New York, NY, USA, Article 2, 10 pages. Google ScholarDigital Library
COSMO. 1998. Consortium for Small-scale Modeling. http://www.cosmo-model.org/Google Scholar
G Doms and M Baldauf. 2018. A Description of the Nonhydrostatic Regional COSMO-Model. http://www.cosmo-model.org/content/model/documentation/core/default.htmGoogle Scholar
H. Carter Edwards, Daniel Sunderland, Vicki Porter, Chris Amsler, and Sam Mish. 2012. Manycore performance-portability: Kokkos multidimensional array library. Scientific Programming 20 (2012), 89--114. Google ScholarDigital Library
Oliver Fuhrer, Tarun Chadha, Torsten Hoefler, Grzegorz Kwasniewski, Xavier Lapillonne, David Leutwyler, Daniel Lüthi, Carlos Osuna, Christoph Schär, Thomas C. Schulthess, and Hannes Vogt. 2018. Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0. Geoscientific Model Development 11, 4 (May 2018), 1665--1681.Google ScholarCross Ref
Oliver Fuhrer, Carlos Osuna, Xavier Lapillonne, Tobias Gysi, Ben Cumming, Mauro Bianco, Andrea Arteaga, and Thomas Christoph Schulthess. 2014. Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing Frontiers and Innovations 1, 1 (June 2014), 45-62-62. Google ScholarDigital Library
Mark Govett, Jim Rosinski, Jacques Middlecoff, Tom Henderson, Jin Lee, Alexander MacDonald, Ning Wang, Paul Madden, Julie Schramm, and Antonio Duarte. 2017. Parallelization and Performance of the NIM Weather Model on CPU, GPU, and MIC Processors. Bulletin of the American Meteorological Society 98, 10 (2017), 2201--2213. arXiv:https://doi.org/10.1175/BAMS-D-15-00278.1Google ScholarCross Ref
Tobias Grosser and Torsten Hoefler. 2016. Polly-ACC Transparent Compilation to Heterogeneous Hardware. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16). ACM, New York, NY, USA, Article 1, 13 pages. Google ScholarDigital Library
Tobias Gysi, Carlos Osuna, Oliver Fuhrer, Mauro Bianco, and Thomas C. Schulthess. 2015. STELLA: A Domain-specific Tool for Structured Grid Methods in Weather and Climate Models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, 41:1--41:12. Google ScholarDigital Library
Intel Corporation. 2016. Intel® 64 and IA-32 Architectures Optimization Reference Manual. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdfGoogle Scholar
Intel Corporation. 2017. Intel® Xeon Phi™ Coprocessor x200 Product Family Datasheet. https://www.intel.com.br/content/dam/www/public/us/en/documents/datasheets/xeon-phi-coprocessor-x200-family-datasheet.pdfGoogle Scholar
Intel Corporation. 2018. Product Change Notification 116378 - 00. https://qdms.intel.com/dm/i.aspx/9C54A9A7-BF37-4496-B268-BD2746EA54D3/PCN116378-00.pdfGoogle Scholar
Jim Jeffers, James Reinders, and Avinash Sodani. 2016. Intel Xeon Phi Processor High Performance Programming (Knights Landing Edition). Morgan Kaufmann, Boston. Google ScholarDigital Library
John Michalakes, Michael J. Iacono, and Elizabeth R. Jessup. 2016. Optimizing Weather Model Radiative Transfer Physics for Intel's Many Integrated Core (MIC) Architecture. Parallel Processing Letters 26 (2016), 1--16.Google ScholarCross Ref
J. Mielikainen, B. Huang, and A. H.-L. Huang. 2014. Intel Xeon Phi accelerated Weather Research and Forecasting (WRF) Goddard microphysics scheme. Geoscientific Model Development Discussions 7, 6 (Dec. 2014), 8941--8973.Google ScholarCross Ref
T. A. J. Ouermi, Aaron Knoll, Robert Michael Kirby, and Martin Berzins. 2017. OpenMP 4 Fortran Modernization of WSM6 for KNL. In PEARC. Google ScholarDigital Library
Sabela Ramos and Torsten Hoefler. 2017. Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, Orlando, FL, USA, 297--306.Google Scholar
Bodo Ritter and Jean-Francois Geleyn. 1992. A Comprehensive Radiation Scheme for Numerical Weather Prediction Models with Potential Applications in Climate Simulations. Monthly Weather Review 120, 2 (Feb. 1992), 303--325.Google ScholarCross Ref
T. C. Schulthess, P. Bauer, N. Wedi, O. Fuhrer, T. Hoefler, and C. Schär. 2019. Reflecting on the Goal and Baseline for Exascale Computing: A Roadmap Based on Weather and Climate Simulations. Computing in Science Engineering 21, 1 (Jan. 2019), 30--41.Google ScholarDigital Library
Pascal Spörri. 2017. COSMO C++ Dynamical Core Training Course - Introduction and Code Flow. https://wiki.c2sm.ethz.ch/pub/COSMO/CXXDynamicalCore/20170403_-_1_-_CPP_Dycore_Intro_Code_Flow.pdfGoogle Scholar
Erich Strohmaier, Jack Dongarra, Horst Simon, and Martin Meuer. 2018. TOP500 List -- November 2018. https://www.top500.org/lists/2018/11/Google Scholar
Lukasz Szustak, Krzysztof Rojek, and Pawel Gepner. 2014. Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm. In Parallel Processing and Applied Mathematics (Lecture Notes in Computer Science), Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, and Jerzy Waśniewski (Eds.). Springer Berlin Heidelberg, 582--592.Google Scholar
Llewellyn H. Thomas. 1949. Elliptic Problems in Linear Differential Equations over a Network. Watson Science Computer Laboratory Report. Columbia University, New York, NY, USA.Google Scholar
Louis J. Wicker and William C. Skamarock. 2002. Time-Splitting Methods for Elastic Models Using Forward Time Schemes. Monthly Weather Review 130, 8 (Aug. 2002), 2088--2097.Google ScholarCross Ref

Index Terms

Porting the COSMO Weather Model to Manycore CPUs

Recommendations

Implementing Genetic Algorithm Accelerated By Intel Xeon Phi
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology

In this paper, genetic algorithm (GA) accelerated by Intel Xeon Phi coprocessor based on Intel Many Integrated Chip (MIC) Architecture is proposed and called GAPhi framework. The GAPhi framework solves the power-aware task scheduling (PATS) problems in ...
Read More
Optimizing N-dimensional, winograd-based convolution for manycore CPUs
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Recent work on Winograd-based convolution allows for a great reduction of computational complexity, but existing implementations are limited to 2D data and a single kernel size of 3 by 3. They can achieve only slightly better, and often worse ...
Read More
OpenMP 4 Fortran Modernization of WSM6 for KNL
PEARC '17: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact

Parallel code portability in the petascale era requires modifying existing codes to support new architectures with large core counts and SIMD vector units. OpenMP is a well established and increasingly supported vehicle for portable parallelization. As ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

PASC '19: Proceedings of the Platform for Advanced Scientific Computing Conference
June 2019
177 pages
ISBN:9781450367707
DOI:10.1145/3324989

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
COSMO
Domain-Specific Languanges
KNL
Supercomputing
Weather Forecasting
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 196
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Porting the COSMO Weather Model to Manycore CPUs

PASC '19: Proceedings of the Platform for Advanced Scientific Computing Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Implementing Genetic Algorithm Accelerated By Intel Xeon Phi

Optimizing N-dimensional, winograd-based convolution for manycore CPUs

OpenMP 4 Fortran Modernization of WSM6 for KNL

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Porting the COSMO Weather Model to Manycore CPUs

PASC '19: Proceedings of the Platform for Advanced Scientific Computing Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Implementing Genetic Algorithm Accelerated By Intel Xeon Phi

Optimizing N-dimensional, winograd-based convolution for manycore CPUs

OpenMP 4 Fortran Modernization of WSM6 for KNL

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media