Memory-Access-Pattern Analysis Techniques for OpenCL Kernels

Jo, Gangwon; Jung, Jaehoon; Park, Jiyoung; Lee, Jaejin

doi:10.1007/978-3-030-35225-7_9

Gangwon Jo⁹,
Jaehoon Jung⁹,
Jiyoung Park⁹ &
…
Jaejin Lee⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11403))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

441 Accesses

Abstract

Previous pattern-by-pattern approaches for OpenCL/CUDA memory optimization require explicit user interventions to extract the kernel memory access patterns. This paper presents an automatic memory-access-pattern analysis framework called MAPA. It is based on a source-level analysis technique derived from traditional symbolic analyses and a run-time pattern selection technique. We propose formal notations of the memory access patterns, analysis algorithms based on the SSA form, and the integration method of MAPA with auto-tuners. The experimental results indicate that MAPA properly analyzes 116 real-world OpenCL kernels from Rodinia and Parboil benchmark suites. We also show an auto-tuner case study, Auto-Dymaxion, which exploits MAPA to automate a memory-access-pattern-based optimization approach.

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (MSIT) (No. 2013R1A3A2003664), PF Class Heterogeneous High Performance Computer Development through the NRF funded by the MSIT (No. 2016M3C4A7952587), and BK21 Plus for Pioneers in Innovative Computing (Dept. of Computer Science and Engineering, SNU) through the NRF funded by the Ministry of Education (21A20151113068). ICT at Seoul National University provided research facilities for this study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

AMD: AMD APP SDK OpenCL optimization guide (2015). http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide2.pdf
Ballance, R.A., Maccabe, A.B., Ottenstein, K.J.: The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages. In: Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pp. 257–271 (1990)
Google Scholar
Bauer, M., Cook, H., Khailany, B.: CudaDMA. http://lightsighter.github.io/CudaDMA/
Bauer, M., Cook, H., Khailany, B.: CudaDMA: optimizing GPU memory bandwidth via warp specialization. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (2011)
Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 101–113 (2008)
Google Scholar
Brown, W.M., Wang, P., Plimpton, S.J., Tharrington, A.N.: Implementing molecular dynamics on hybrid high performance computers - short range forces. Comput. Phys. Commun. 182(4), 898–911 (2011)
Article Google Scholar
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of 2009 IEEE International Symposium on Workload Characterization, pp. 44–54 (2009)
Google Scholar
Che, S., Sheaffer, J.W., Skadron, K.: Dymaxion: optimizing memory access patterns for heterogeneous systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (2011)
Google Scholar
Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13(4), 451–490 (1991)
Article Google Scholar
Eklund, A., Dufort, P., Forsberg, D., LaConte, S.M.: Medical image processing on the GPU - past, present and future. Med. Image Anal. 17(8), 1073–1094 (2013)
Google Scholar
Götz, A.W., Williamson, M.J., Xu, D., Poole, D., Le Grand, S., Walker, R.C.: Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. generalized born. J. Chem. Theory Comput. 8(5), 1542–1555 (2012)
Article Google Scholar
Grosser, T., Groesslinger, A., Lengauer, C.: Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(4), 1250010 (2012)
Article MathSciNet Google Scholar
Haghighat, M.R., Polychronopoulos, C.D.: Symbolic analysis for parallelizing compilers. ACM Trans. Program. Lang. Syst. 18, 477–518 (1996)
Article Google Scholar
Jang, B., Schaa, D., Mistry, P., Kaeli, D.: Exploiting memory access patterns to improve memory performance in data parallel architectures. IEEE Trans. Parallel Distrib. Syst. 22(1), 105–118 (2011)
Article Google Scholar
Khronos Group: SPIR generator/Clang. https://github.com/KhronosGroup/SPIR
Kim, J., Kim, H., Lee, J.H., Lee, J.: Achieving a single compute device image in OpenCL for multiple GPUs. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, pp. 277–288 (2011)
Google Scholar
NVIDIA: cuDNN. https://developer.nvidia.com/cudnn
NVIDIA: CUDA C best practices guide (2015). http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/
Pop, S., Cohen, A., Bastoul, C., Girbal, S., Silber, G.A., Vasilache, N.: GRAPHITE: polyhedral analyses and optimizations for GCC. In: Proceedings of the 2006 GCC Developers Summit (2006)
Google Scholar
Schatz, M.C., Trapnell, C., Delcher, A.L., Varshney, A.: High-throughput sequence alignment using graphics processing units. BMC Bioinform. 8(1), 1–10 (2007)
Article Google Scholar
Seo, S., Lee, J., Jo, G., Lee, J.: Automatic OpenCL work-group size selection for multicore CPUs. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp. 387–397 (2013)
Google Scholar
Steensgaard, B.: Points-to analysis in almost linear time. In: Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 32–41 (1996)
Google Scholar
Stratton, J.A., et al.: Optimization and architecture effects on GPU computing workload performance. In: Proceedings of Innovative Parallel Computing (InPar) (2012)
Google Scholar
Stratton, J.A., et al.: Parboil: a revised benchmark suite for scientific and commercial throughput computing. Technical report, IMPACT-12-01, IMPACT, University of Illinois at Urbana-Champaign (2012)
Google Scholar
Tal, B.N., Levy, E., Barak, A., Rubin, E.: Memory access patterns: the missing piece of the multi-GPU puzzle. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2015)
Google Scholar
Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5–6), 232–240 (2010)
Article Google Scholar
Tu, P., Padua, D.: Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In: Proceedings of the 9th International Conference on Supercomputing, pp. 414–423 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Manycore Programming, Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea
Gangwon Jo, Jaehoon Jung, Jiyoung Park & Jaejin Lee

Authors

Gangwon Jo
View author publications
You can also search for this author in PubMed Google Scholar
Jaehoon Jung
View author publications
You can also search for this author in PubMed Google Scholar
Jiyoung Park
View author publications
You can also search for this author in PubMed Google Scholar
Jaejin Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gangwon Jo .

Editor information

Editors and Affiliations

Texas A&M University, College Station, TX, USA
Lawrence Rauchwerger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jo, G., Jung, J., Park, J., Lee, J. (2019). Memory-Access-Pattern Analysis Techniques for OpenCL Kernels. In: Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2017. Lecture Notes in Computer Science(), vol 11403. Springer, Cham. https://doi.org/10.1007/978-3-030-35225-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-35225-7_9
Published: 15 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35224-0
Online ISBN: 978-3-030-35225-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics