Kernel Fusion in OpenCL

Stratton, John A.; Krishna V. S., Jyothi; Palanisamy, Jeevitha; Chinnaraju, Karthikadevi

doi:10.1007/978-3-031-06156-1_16

John A. Stratton^18,19,
Jyothi Krishna V. S.¹⁹,
Jeevitha Palanisamy¹⁹ &
…
Karthikadevi Chinnaraju¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13098))

Included in the following conference series:

European Conference on Parallel Processing

738 Accesses

Abstract

Kernel Fusion is a widely applicable optimization for numerical libraries on heterogeneous systems. However, most automated systems capable of performing the optimization require changes to software development practices, through language extensions or constraints on software organization and compilation. This makes such techniques inapplicable for preexisting software in a language like OpenCL.

This work introduces an implementation of kernel fusion that can be deployed fully within the defined role of the OpenCL library implementation. This means that programmers with no explicit intervention, or even precompiled OpenCL applications, could utilize the optimization. Despite the lack of explicit programmer effort, our compiler was able to deliver an average of 12.3% speedup over a range of applicable benchmarks on a target CPU platform.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aliaga, J.I., Pérez, J., Quintana-Ortí, E.S.: Systematic fusion of CUDA kernels for iterative sparse linear system solvers. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 675–686. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48096-0_52
Chapter Google Scholar
Ciglarič, T., Češnovar, R., Štrumbelj, E.: Automated OpenCL GPU kernel fusion for Stan math. In: Proceedings of the International Workshop on OpenCL, IWOCL 2020. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3388333.3388654
Filipovic, J., Benkner, S.: OpenCL kernel fusion for GPU, Xeon Phi and CPU. In: 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 98–105 (2015). https://doi.org/10.1109/SBAC-PAD.2015.29
Filipovič, J., Madzin, M., Fousek, J., Matyska, L.: Optimizing CUDA code by kernel fusion: application on BLAS. J. Supercomput. 71(10), 3934–3957 (2015). https://doi.org/10.1007/s11227-015-1483-z
Article Google Scholar
Gong, X., Chen, Z., Ziabari, A.K., Ubal, R., Kaeli, D.: TwinKernels: an execution model to improve GPU hardware scheduling at compile time. In: 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 39–49 (2017). https://doi.org/10.1109/CGO.2017.7863727
Jääskeläinen, P.O., de La Lama, C.S., Huerta, P., Takala, J.H.: OpenCL-based design methodology for application-specific processors. In: 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 223–230 (2010). https://doi.org/10.1109/ICSAMOS.2010.5642061
Jääskeläinen, P., et al.: Exploiting task parallelism with OpenCL: a case study. J. Signal Process. Syst. 91, 1–14 (2019)
Article Google Scholar
Jääskeläinen, P., de La Lama, C.S., Schnetter, E., Raiskila, K., Takala, J., Berg, H.: POCL: a performance-portable OpenCL implementation. Int. J. Parallel Prog. 43(5), 752–785 (2014). https://doi.org/10.1007/s10766-014-0320-y
Article Google Scholar
Jiao, Q., Lu, M., Huynh, H.P., Mitra, T.: Improving GPGPU energy-efficiency through concurrent kernel execution and DVFs. In: 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1–11 (2015). https://doi.org/10.1109/CGO.2015.7054182
Kessenich, J., Ouriel, B., Krisch, R.: SPIR-V specification (2021)
Google Scholar
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO 2004), Palo Alto, California, March 2004
Google Scholar
Potter, R., Keir, P., Bradford, R.J., Murray, A.: Kernel composition in SYCL. In: Proceedings of the 3rd International Workshop on OpenCL, IWOCL 2015. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2791321.2791332
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York (2013). https://doi.org/10.1145/2491956.2462176
Rotem, N., et al.: Glow: graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018)
Wang, G., Lin, Y., Yi, W.: Kernel fusion: an effective method for better power efficiency on multithreaded GPU. In: 2010 IEEE/ACM International Conference on Green Computing and Communications International Conference on Cyber, Physical and Social Computing, pp. 344–350 (2010). https://doi.org/10.1109/GreenCom-CPSCom.2010.102
Wang, Z., Yang, J., Melhem, R., Childers, B., Zhang, Y., Guo, M.: Simultaneous multikernel GPU: multi-tasking throughput processors via fine-grained sharing. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 358–369 (2016). https://doi.org/10.1109/HPCA.2016.7446078
Wen, Y., O’Boyle, M.F.: Merge or separate? Multi-job scheduling for OpenCL kernels on CPU/GPU platforms. In: Proceedings of the General Purpose GPUs, GPGPU-10, pp. 22–31. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3038228.3038235

Download references

Author information

Authors and Affiliations

Whitman College, Walla Walla, WA, 99362, USA
John A. Stratton
MulticoreWare Inc., Chennai, Tamil Nadu, India
John A. Stratton, Jyothi Krishna V. S., Jeevitha Palanisamy & Karthikadevi Chinnaraju

Authors

John A. Stratton
View author publications
You can also search for this author in PubMed Google Scholar
Jyothi Krishna V. S.
View author publications
You can also search for this author in PubMed Google Scholar
Jeevitha Palanisamy
View author publications
You can also search for this author in PubMed Google Scholar
Karthikadevi Chinnaraju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John A. Stratton .

Editor information

Editors and Affiliations

University of Lisbon, Lisbon, Portugal
Ricardo Chaves
Department of Computer Engineering, CiTIUS, University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Dora B. Heras
University of Lisbon, Lisbon, Portugal
Aleksandar Ilic
Koç University, Istanbul, Turkey
Didem Unat
Barcelona Supercomputing Center, Barcelona, Spain
Rosa M. Badia
University of Stirling, Stirling, UK
Andrea Bracciali
Louisiana State University, Baton Rouge, USA
Patrick Diehl
Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA
Anshu Dubey
Ajou University, Suwon, Korea (Republic of)
Oh Sangyoon
Tennessee Technological University, Cookeville, TN, USA
Stephen L. Scott
University of Pisa, Pisa, Italy
Laura Ricci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stratton, J.A., Krishna V. S., J., Palanisamy, J., Chinnaraju, K. (2022). Kernel Fusion in OpenCL. In: Chaves, R., et al. Euro-Par 2021: Parallel Processing Workshops. Euro-Par 2021. Lecture Notes in Computer Science, vol 13098. Springer, Cham. https://doi.org/10.1007/978-3-031-06156-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-06156-1_16
Published: 09 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06155-4
Online ISBN: 978-3-031-06156-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Kernel Fusion in OpenCL