ABSTRACT
Partial differential equation (PDE) solvers are important for many applications. PDE solvers execute kernels which apply stencil operations over 2D and 3D grids. As PDE solvers and stencil codes are widely used in performance critical applications, they must be well optimized.
Stencil computations naturally depend on neighboring grid elements. Therefore, data locality must be exploited to optimize the code and to better use the memory bandwidth -- at the same time, vector processing capabilities of the processor must be utilized.
In this work, we investigate the effectiveness of using high-level language extensions to exploit SIMD and vectorization features of multicore processors and vector engines. We write a prototype application using the GGDML high-level language extensions, and translate the high-level code with different configurations to investigate the efficiency of the language extensions and the source-to-source translation process to exploit the vector units of the multi-core processors and the vector engines.
The conducted experiments demonstrate the effectiveness of the language extensions and the translation tool to generate vectorized codes, which makes use of the natural data locality of stencil computations.
- Vincenzo Casulli. 1990. Semi-implicit finite difference methods for the two-dimensional shallow water equations. J. Comput. Phys. 86, 1 (1990), 56--74. Google ScholarDigital Library
- Matthias Christen, Olaf Schenk, and Helmar Burkhart. 2011. Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. IEEE, 676--687. Google ScholarDigital Library
- Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. 2008. Stencil computation optimization and autotuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing. IEEE Press, 4. Google ScholarDigital Library
- Tom Henretty, Kevin Stock, Louis-Noël Pouchet, Franz Franchetti, J Ramanujam, and P Sadayappan. 2011. Data layout transformation for stencil computations on short-vector simd architectures. In International Conference on Compiler Construction. Springer, 225--245. Google ScholarDigital Library
- Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2013. A stencil compiler for short-vector SIMD architectures. In Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 13--24. Google ScholarDigital Library
- Nabeeh Jumah and Julian Kunkel. 2018. Performance Portability of Earth System Models with User-Controlled GGDML code Translation. In High Performance Computing (Lecture Notes in Computer Science). Springer.Google Scholar
- Nabeeh Jumah, Julian M Kunkel, Günther Zängl, Hisashi Yashiro, Thomas Dubos, and Thomas Meurdesoif. 2017. GGDML: icosahedral models language extensions. Journal of Computer Science Technology Updates 4, 1 (2017), 1--10.Google ScholarCross Ref
- Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, and Samuel Williams. 2010. An autotuning framework for parallel multicore stencil computations. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 1--12.Google ScholarCross Ref
- Jan Treibig, Georg Hager, and Gerhard Wellein. 2010. Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In Parallel Processing Workshops (ICPPW), 2010 39th International Conference on. IEEE, 207--216. Google ScholarDigital Library
- Charles Yount. 2015. Vector Folding: improving stencil performance via multi-dimensional SIMD-vector representation. In High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on. IEEE, 865--870. Google ScholarDigital Library
- Charles Yount, Josh Tobin, Alexander Breuer, and Alejandro Duran. 2016. YASK---Yet Another Stencil Kernel: A Framework for HPC Stencil Code-Generation and Tuning. In Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), 2016 Sixth International Workshop on. IEEE, 30--39. Google ScholarDigital Library
- Gerhard Zumbusch. 2012. Vectorized higher order finite difference kernels. In International Workshop on Applied Parallel Computing. Springer, 343--357. Google ScholarDigital Library
Recommendations
Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs
IWOCL '21: Proceedings of the 9th International Workshop on OpenCLThe Portable Computing Language (PoCL) is a vendor independent open-source OpenCL implementation that aims to support a variety of compute devices in a single platform. Evaluating PoCL versus the Intel OpenCL implementation reveals significant ...
CUDA 2d stencil computations for the jacobi method
PARA'10: Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part IWe are witnessing the consolidation of the GPUs streaming paradigm in parallel computing. This paper explores stencil operations in CUDA to optimize on GPUs the Jacobi method for solving Laplace's differential equation. The code keeps constant the ...
Algorithm 942: Semi-Stencil
Finite Difference (FD) is a widely used method to solve Partial Differential Equations (PDE). PDEs are the core of many simulations in different scientific fields, such as geophysics, astrophysics, etc. The typical FD solver performs stencil ...
Comments