skip to main content
10.1145/3303117.3306160acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Automatic Vectorization of Stencil Codes with the GGDML Language Extensions

Authors Info & Claims
Published:16 February 2019Publication History

ABSTRACT

Partial differential equation (PDE) solvers are important for many applications. PDE solvers execute kernels which apply stencil operations over 2D and 3D grids. As PDE solvers and stencil codes are widely used in performance critical applications, they must be well optimized.

Stencil computations naturally depend on neighboring grid elements. Therefore, data locality must be exploited to optimize the code and to better use the memory bandwidth -- at the same time, vector processing capabilities of the processor must be utilized.

In this work, we investigate the effectiveness of using high-level language extensions to exploit SIMD and vectorization features of multicore processors and vector engines. We write a prototype application using the GGDML high-level language extensions, and translate the high-level code with different configurations to investigate the efficiency of the language extensions and the source-to-source translation process to exploit the vector units of the multi-core processors and the vector engines.

The conducted experiments demonstrate the effectiveness of the language extensions and the translation tool to generate vectorized codes, which makes use of the natural data locality of stencil computations.

References

  1. Vincenzo Casulli. 1990. Semi-implicit finite difference methods for the two-dimensional shallow water equations. J. Comput. Phys. 86, 1 (1990), 56--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Matthias Christen, Olaf Schenk, and Helmar Burkhart. 2011. Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. IEEE, 676--687. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. 2008. Stencil computation optimization and autotuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing. IEEE Press, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tom Henretty, Kevin Stock, Louis-Noël Pouchet, Franz Franchetti, J Ramanujam, and P Sadayappan. 2011. Data layout transformation for stencil computations on short-vector simd architectures. In International Conference on Compiler Construction. Springer, 225--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2013. A stencil compiler for short-vector SIMD architectures. In Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nabeeh Jumah and Julian Kunkel. 2018. Performance Portability of Earth System Models with User-Controlled GGDML code Translation. In High Performance Computing (Lecture Notes in Computer Science). Springer.Google ScholarGoogle Scholar
  7. Nabeeh Jumah, Julian M Kunkel, Günther Zängl, Hisashi Yashiro, Thomas Dubos, and Thomas Meurdesoif. 2017. GGDML: icosahedral models language extensions. Journal of Computer Science Technology Updates 4, 1 (2017), 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  8. Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, and Samuel Williams. 2010. An autotuning framework for parallel multicore stencil computations. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jan Treibig, Georg Hager, and Gerhard Wellein. 2010. Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In Parallel Processing Workshops (ICPPW), 2010 39th International Conference on. IEEE, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Charles Yount. 2015. Vector Folding: improving stencil performance via multi-dimensional SIMD-vector representation. In High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on. IEEE, 865--870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Charles Yount, Josh Tobin, Alexander Breuer, and Alejandro Duran. 2016. YASK---Yet Another Stencil Kernel: A Framework for HPC Stencil Code-Generation and Tuning. In Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), 2016 Sixth International Workshop on. IEEE, 30--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gerhard Zumbusch. 2012. Vectorized higher order finite difference kernels. In International Workshop on Applied Parallel Computing. Springer, 343--357. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WPMVP'19: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing
    February 2019
    35 pages
    ISBN:9781450362917
    DOI:10.1145/3303117

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 16 February 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate20of30submissions,67%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader