skip to main content
10.1145/3324989.3325723acmconferencesArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Porting the COSMO Weather Model to Manycore CPUs

Authors Info & Claims
Published:12 June 2019Publication History

ABSTRACT

Weather and climate simulations are a major application driver in high-performance computing (HPC). With the end of Dennard scaling and Moore's law, the HPC industry increasingly employs specialized computation accelerators to increase computational throughput. Manycore architectures, such as Intel's Knights Landing (KNL), are a representative example of future processing devices. However, software has to be modified to use these devices efficiently. In this work, we demonstrate how an existing domain-specific language that has been designed for CPUs and GPUs can be extended to Manycore architectures such as KNL. We achieve comparable performance to the NVIDIA Tesla P100 GPU architecture on hand-tuned representative stencils of the dynamical core of the COSMO weather model and its radiation code. Further, we present performance within a factor of two of the P100 of the full DSL-based GPU-optimized COSMO dycore code. We find that optimizing code to full performance on modern manycore architectures requires similar effort and hardware knowledge as for GPUs. Further, we show limitations of the present approaches, and outline our lessons learned and possible principles for design of future DSLs for accelerators in the weather and climate domain.

References

  1. Samantha V. Adams, Rupert W. Ford, M. Hambley, J. M. Hobson, I. Kavcic, C. M. Maynard, T. Melvin, Eike Hermann Müller, S. Mullerworth, A. R. Porter, Mike Rezny, Ben Shipway, and R. Wong. 2018. LFRic: Meeting the challenges of scalability and performance portability in Weather and Climate models. CoRR abs/1809.07267 (2018). arXiv:1809.07267 http://arxiv.org/abs/1809.07267Google ScholarGoogle Scholar
  2. Valentin Clement, Sylvaine Ferrachat, Oliver Fuhrer, Xavier Lapillonne, Carlos E. Osuna, Robert Pincus, Jon Rood, and William Sawyer. 2018. The CLAW DSL: Abstractions for Performance Portable Weather and Climate Models. In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC '18). ACM, New York, NY, USA, Article 2, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. COSMO. 1998. Consortium for Small-scale Modeling. http://www.cosmo-model.org/Google ScholarGoogle Scholar
  4. G Doms and M Baldauf. 2018. A Description of the Nonhydrostatic Regional COSMO-Model. http://www.cosmo-model.org/content/model/documentation/core/default.htmGoogle ScholarGoogle Scholar
  5. H. Carter Edwards, Daniel Sunderland, Vicki Porter, Chris Amsler, and Sam Mish. 2012. Manycore performance-portability: Kokkos multidimensional array library. Scientific Programming 20 (2012), 89--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Oliver Fuhrer, Tarun Chadha, Torsten Hoefler, Grzegorz Kwasniewski, Xavier Lapillonne, David Leutwyler, Daniel Lüthi, Carlos Osuna, Christoph Schär, Thomas C. Schulthess, and Hannes Vogt. 2018. Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0. Geoscientific Model Development 11, 4 (May 2018), 1665--1681.Google ScholarGoogle ScholarCross RefCross Ref
  7. Oliver Fuhrer, Carlos Osuna, Xavier Lapillonne, Tobias Gysi, Ben Cumming, Mauro Bianco, Andrea Arteaga, and Thomas Christoph Schulthess. 2014. Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing Frontiers and Innovations 1, 1 (June 2014), 45-62-62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mark Govett, Jim Rosinski, Jacques Middlecoff, Tom Henderson, Jin Lee, Alexander MacDonald, Ning Wang, Paul Madden, Julie Schramm, and Antonio Duarte. 2017. Parallelization and Performance of the NIM Weather Model on CPU, GPU, and MIC Processors. Bulletin of the American Meteorological Society 98, 10 (2017), 2201--2213. arXiv:https://doi.org/10.1175/BAMS-D-15-00278.1Google ScholarGoogle ScholarCross RefCross Ref
  9. Tobias Grosser and Torsten Hoefler. 2016. Polly-ACC Transparent Compilation to Heterogeneous Hardware. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16). ACM, New York, NY, USA, Article 1, 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tobias Gysi, Carlos Osuna, Oliver Fuhrer, Mauro Bianco, and Thomas C. Schulthess. 2015. STELLA: A Domain-specific Tool for Structured Grid Methods in Weather and Climate Models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, 41:1--41:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Intel Corporation. 2016. Intel® 64 and IA-32 Architectures Optimization Reference Manual. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdfGoogle ScholarGoogle Scholar
  12. Intel Corporation. 2017. Intel® Xeon Phi™ Coprocessor x200 Product Family Datasheet. https://www.intel.com.br/content/dam/www/public/us/en/documents/datasheets/xeon-phi-coprocessor-x200-family-datasheet.pdfGoogle ScholarGoogle Scholar
  13. Intel Corporation. 2018. Product Change Notification 116378 - 00. https://qdms.intel.com/dm/i.aspx/9C54A9A7-BF37-4496-B268-BD2746EA54D3/PCN116378-00.pdfGoogle ScholarGoogle Scholar
  14. Jim Jeffers, James Reinders, and Avinash Sodani. 2016. Intel Xeon Phi Processor High Performance Programming (Knights Landing Edition). Morgan Kaufmann, Boston. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John Michalakes, Michael J. Iacono, and Elizabeth R. Jessup. 2016. Optimizing Weather Model Radiative Transfer Physics for Intel's Many Integrated Core (MIC) Architecture. Parallel Processing Letters 26 (2016), 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Mielikainen, B. Huang, and A. H.-L. Huang. 2014. Intel Xeon Phi accelerated Weather Research and Forecasting (WRF) Goddard microphysics scheme. Geoscientific Model Development Discussions 7, 6 (Dec. 2014), 8941--8973.Google ScholarGoogle ScholarCross RefCross Ref
  17. T. A. J. Ouermi, Aaron Knoll, Robert Michael Kirby, and Martin Berzins. 2017. OpenMP 4 Fortran Modernization of WSM6 for KNL. In PEARC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sabela Ramos and Torsten Hoefler. 2017. Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, Orlando, FL, USA, 297--306.Google ScholarGoogle Scholar
  19. Bodo Ritter and Jean-Francois Geleyn. 1992. A Comprehensive Radiation Scheme for Numerical Weather Prediction Models with Potential Applications in Climate Simulations. Monthly Weather Review 120, 2 (Feb. 1992), 303--325.Google ScholarGoogle ScholarCross RefCross Ref
  20. T. C. Schulthess, P. Bauer, N. Wedi, O. Fuhrer, T. Hoefler, and C. Schär. 2019. Reflecting on the Goal and Baseline for Exascale Computing: A Roadmap Based on Weather and Climate Simulations. Computing in Science Engineering 21, 1 (Jan. 2019), 30--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pascal Spörri. 2017. COSMO C++ Dynamical Core Training Course - Introduction and Code Flow. https://wiki.c2sm.ethz.ch/pub/COSMO/CXXDynamicalCore/20170403_-_1_-_CPP_Dycore_Intro_Code_Flow.pdfGoogle ScholarGoogle Scholar
  22. Erich Strohmaier, Jack Dongarra, Horst Simon, and Martin Meuer. 2018. TOP500 List -- November 2018. https://www.top500.org/lists/2018/11/Google ScholarGoogle Scholar
  23. Lukasz Szustak, Krzysztof Rojek, and Pawel Gepner. 2014. Using Intel Xeon Phi Coprocessor to Accelerate Computations in MPDATA Algorithm. In Parallel Processing and Applied Mathematics (Lecture Notes in Computer Science), Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, and Jerzy Waśniewski (Eds.). Springer Berlin Heidelberg, 582--592.Google ScholarGoogle Scholar
  24. Llewellyn H. Thomas. 1949. Elliptic Problems in Linear Differential Equations over a Network. Watson Science Computer Laboratory Report. Columbia University, New York, NY, USA.Google ScholarGoogle Scholar
  25. Louis J. Wicker and William C. Skamarock. 2002. Time-Splitting Methods for Elastic Models Using Forward Time Schemes. Monthly Weather Review 130, 8 (Aug. 2002), 2088--2097.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Porting the COSMO Weather Model to Manycore CPUs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PASC '19: Proceedings of the Platform for Advanced Scientific Computing Conference
          June 2019
          177 pages
          ISBN:9781450367707
          DOI:10.1145/3324989

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 June 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader