Multi-dimensional Homomorphisms and Their Implementation in OpenCL

Rasch, Ari; Gorlatch, Sergei

doi:10.1007/s10766-017-0508-z

Multi-dimensional Homomorphisms and Their Implementation in OpenCL

Published: 08 May 2017

Volume 46, pages 101–119, (2018)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

679 Accesses
5 Citations
Explore all metrics

Abstract

Homomorphisms (traditionally defined on lists) are functions that can be parallelized by the divide-and-conquer paradigm. In this paper, we introduce an extension of the traditional homomorphism concept—multi-dimensional homomorphisms (MDHs)—which capture parallelism on multi-dimensional arrays. We propose md_hom—a new parallel pattern (a.k.a. algorithmic skeleton), based on the MDH concept, to simplify parallel programming for a broad class of applications. The md_hom pattern is general enough to subsume common parallel patterns such as map and reduce, and also more complex functions built by composing and nesting several patterns. We present a generic implementation schema for md_hom in form of an efficient, correct-by-construction OpenCL pseudocode that targets various parallel architectures such as multi-core CPU and graphics processing unit (GPU). We develop our pseudocode schema as parametrized in tuning parameters: these allow to optimize the code for different devices and input sizes by performing an automated search on the parameter space. We evaluate the schematically generated, executable OpenCL code using the example of general matrix–vector multiplication (GEMV)—an important linear algebra routine which has gained more attention recently due to its use in the application area of deep learning—on two parallel architectures—Intel CPU and NVIDIA GPU. Our performance results are competitive and in some cases even better than the hand-tuned GEMV implementations provided by the state-of-the-art libraries Intel MKL and NVIDIA cuBLAS, as well as the auto-tunable OpenCL BLAS library CLBlast.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aldinucci, M., Danelutto, M., Drocco, M., Kilpatrick, P., Pezzi, G.P., Torquati, M.: The loop-of-stencil-reduce paradigm. In: Trustcom/BigDataSE/ISPA, 2015 IEEE, vol. 3, pp. 172–177. IEEE (2015)
Ansel, J., Kamil, S., Veeramachaneni, K., Ragan-Kelley, J., Bosboom, J., O’Reilly, U.M., Amarasinghe, S.: OpenTuner: an extensible framework for program autotuning. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pp. 303–316. ACM (2014)
Cedric Nugteren: CLBlast. https://github.com/CNugteren/CLBlast (2017)
Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Surrounding theorem: developing parallel programs for matrix-convolutions. In: Euro-Par 2006 Parallel Processing, pp. 605–614. Springer (2006)
Enmyren, J., Kessler, C.W.: SkePU: A multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, pp. 5–14. ACM (2010)
Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int. J. High Perform. Comput. Netw. 7(2), 129–138 (2012)
Article Google Scholar
Gorlatch, S.: Extracting and implementing list homomorphisms in parallel program development. Sci. Comput. Program. 33(1), 1–27 (1999)
Article MathSciNet MATH Google Scholar
Gorlatch, S., Cole, M.: Parallel skeletons. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1417–1422. Springer (2011)
Grelck, C., Scholz, S.B.: SAC—a functional array language for efficient multi-threaded execution. Int. J. Parallel Program. 34(4), 383–427 (2006)
Article MATH Google Scholar
Intel: OpenCL Optimization Guide (2011)
Intel: Intel MKL. https://software.intel.com/en-us/intel-mkl (2016)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Keller, G., Chakravarty, M.M., Leshchinskiy, R., Peyton Jones, S., Lippmeier, B.: Regular, shape-polymorphic, parallel arrays in Haskell. In: ACM Sigplan Notices, vol. 45, pp. 261–272. ACM (2010)
Khronos OpenCL Working Group: The OpenCL Specification. https://www.khronos.org/opencl/ (2017)
Netlib: BLAS. http://www.netlib.org/blas/ (2016)
Nugteren, C., Codreanu, V.: CLTune: a generic auto-tuner for OpenCL kernels. In: Embedded Multicore/Many-Core Systems-on-Chip (MCSoC), pp. 195–202. IEEE (2015)
NVIDIA: NVIDIA OpenCL Best Practices Guide (2015)
NVIDIA: NVIDIA cuBLAS. https://developer.nvidia.com/cublas (2016)
Sørensen, H.H.B.: High-performance matrix-vector multiplication on the GPU. In: Alexander, M. (ed.) Euro-Par 2011: Parallel Processing Workshops, pp. 377–386. Springer (2011)
Steuwer, M., Gorlatch, S.: SkelCL: a high-level extension of OpenCL for multi-GPU systems. J. Supercomput. 69(1), 25–33 (2014)
Article Google Scholar
Steuwer, M., Fensch, C., Lindley, S., Dubach, C.: Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance Opencl code. In: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, pp. 205–217. ACM (2015)
Steuwer, M., Remmelg, T., Dubach, C.: Matrix multiplication beyond auto-tuning: rewrite-based GPU code generation. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, p. 15. ACM (2016)
Xu, W., Liu, Z., Wu, J., Ye, X., Jiao, S., Wang, D., Song, F., Fan, D.: Auto-tuning GEMV on many-core GPU. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), pp. 30–36. IEEE (2012)

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Muenster, Münster, Germany
Ari Rasch & Sergei Gorlatch

Authors

Ari Rasch
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Gorlatch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ari Rasch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rasch, A., Gorlatch, S. Multi-dimensional Homomorphisms and Their Implementation in OpenCL. Int J Parallel Prog 46, 101–119 (2018). https://doi.org/10.1007/s10766-017-0508-z

Download citation

Received: 04 October 2016
Accepted: 02 May 2017
Published: 08 May 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s10766-017-0508-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-dimensional Homomorphisms and Their Implementation in OpenCL

Abstract

Access this article

Similar content being viewed by others

Automatic generation of ARM NEON micro-kernels for matrix multiplication

oclCUB: an OpenCL parallel computing library for deep learning operators

MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-dimensional Homomorphisms and Their Implementation in OpenCL

Abstract

Access this article

Similar content being viewed by others

Automatic generation of ARM NEON micro-kernels for matrix multiplication

oclCUB: an OpenCL parallel computing library for deep learning operators

MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation