research-article

Automatic Vectorization of Stencil Codes with the GGDML Language Extensions

Authors:
Nabeeh Jumah

Informatik Department, Universität Hamburg, Hamburg, Germany

Informatik Department, Universität Hamburg, Hamburg, Germany
View Profile

,
Julian Kunkel

Computer Science Department, University of Reading, reading, UK

Computer Science Department, University of Reading, reading, UK
View Profile

WPMVP'19: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector ProcessingFebruary 2019Article No.: 2Pages 1–7https://doi.org/10.1145/3303117.3306160

Published:16 February 2019Publication History

WPMVP'19: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing

Pages 1–7

ABSTRACT

Partial differential equation (PDE) solvers are important for many applications. PDE solvers execute kernels which apply stencil operations over 2D and 3D grids. As PDE solvers and stencil codes are widely used in performance critical applications, they must be well optimized.

Stencil computations naturally depend on neighboring grid elements. Therefore, data locality must be exploited to optimize the code and to better use the memory bandwidth -- at the same time, vector processing capabilities of the processor must be utilized.

In this work, we investigate the effectiveness of using high-level language extensions to exploit SIMD and vectorization features of multicore processors and vector engines. We write a prototype application using the GGDML high-level language extensions, and translate the high-level code with different configurations to investigate the efficiency of the language extensions and the source-to-source translation process to exploit the vector units of the multi-core processors and the vector engines.

The conducted experiments demonstrate the effectiveness of the language extensions and the translation tool to generate vectorized codes, which makes use of the natural data locality of stencil computations.

References

Vincenzo Casulli. 1990. Semi-implicit finite difference methods for the two-dimensional shallow water equations. J. Comput. Phys. 86, 1 (1990), 56--74. Google ScholarDigital Library
Matthias Christen, Olaf Schenk, and Helmar Burkhart. 2011. Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. IEEE, 676--687. Google ScholarDigital Library
Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. 2008. Stencil computation optimization and autotuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing. IEEE Press, 4. Google ScholarDigital Library
Tom Henretty, Kevin Stock, Louis-Noël Pouchet, Franz Franchetti, J Ramanujam, and P Sadayappan. 2011. Data layout transformation for stencil computations on short-vector simd architectures. In International Conference on Compiler Construction. Springer, 225--245. Google ScholarDigital Library
Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2013. A stencil compiler for short-vector SIMD architectures. In Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 13--24. Google ScholarDigital Library
Nabeeh Jumah and Julian Kunkel. 2018. Performance Portability of Earth System Models with User-Controlled GGDML code Translation. In High Performance Computing (Lecture Notes in Computer Science). Springer.Google Scholar
Nabeeh Jumah, Julian M Kunkel, Günther Zängl, Hisashi Yashiro, Thomas Dubos, and Thomas Meurdesoif. 2017. GGDML: icosahedral models language extensions. Journal of Computer Science Technology Updates 4, 1 (2017), 1--10.Google ScholarCross Ref
Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, and Samuel Williams. 2010. An autotuning framework for parallel multicore stencil computations. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 1--12.Google ScholarCross Ref
Jan Treibig, Georg Hager, and Gerhard Wellein. 2010. Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In Parallel Processing Workshops (ICPPW), 2010 39th International Conference on. IEEE, 207--216. Google ScholarDigital Library
Charles Yount. 2015. Vector Folding: improving stencil performance via multi-dimensional SIMD-vector representation. In High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on. IEEE, 865--870. Google ScholarDigital Library
Charles Yount, Josh Tobin, Alexander Breuer, and Alejandro Duran. 2016. YASK---Yet Another Stencil Kernel: A Framework for HPC Stencil Code-Generation and Tuning. In Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), 2016 Sixth International Workshop on. IEEE, 30--39. Google ScholarDigital Library
Gerhard Zumbusch. 2012. Vectorized higher order finite difference kernels. In International Workshop on Applied Parallel Computing. Springer, 343--357. Google ScholarDigital Library

Recommendations

Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs
IWOCL '21: Proceedings of the 9th International Workshop on OpenCL

The Portable Computing Language (PoCL) is a vendor independent open-source OpenCL implementation that aims to support a variety of compute devices in a single platform. Evaluating PoCL versus the Intel OpenCL implementation reveals significant ...
Read More
CUDA 2d stencil computations for the jacobi method
PARA'10: Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I

We are witnessing the consolidation of the GPUs streaming paradigm in parallel computing. This paper explores stencil operations in CUDA to optimize on GPUs the Jacobi method for solving Laplace's differential equation. The code keeps constant the ...
Read More
Algorithm 942: Semi-Stencil

Finite Difference (FD) is a widely used method to solve Partial Differential Equations (PDE). PDEs are the core of many simulations in different scientific fields, such as geophysics, astrophysics, etc. The typical FD solver performs stencil ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WPMVP'19: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing
February 2019
35 pages
ISBN:9781450362917
DOI:10.1145/3303117
Editors:
Jan Eitzinger
University Erlangen-Nuremberg, Germany
,
Sylvain Jubertie
University of Orleans, France
,
Lionel Lacassagne
Sorbonne University, France
,
Bertrand Le Gal
Bordeaux-INP, France
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 February 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Earth system modeling
HPC
SIMD
Stencil computation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate20of30submissions,67%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 89
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic Vectorization of Stencil Codes with the GGDML Language Extensions

WPMVP'19: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing

ABSTRACT

References

Cited By

Recommendations

Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs

CUDA 2d stencil computations for the jacobi method

Algorithm 942: Semi-Stencil

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic Vectorization of Stencil Codes with the GGDML Language Extensions

WPMVP'19: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing

ABSTRACT

References

Cited By

Recommendations

Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs

CUDA 2d stencil computations for the jacobi method

Algorithm 942: Semi-Stencil

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media