Abstract
The programming process for modern parallel processors including multi-core CPUs and many-core GPUs (Graphics Processing Units) represents a significant challenge for application developers. We propose to use the widely-popular programming language C++ for parallel programming in a portable way, allowing the same program to be run on different target architectures. In this paper we extend our framework PACXX (Programming Accelerators in C++) with an additional compilation pass which simplifies data management for the programmer and makes the programming process less error-prone. These changes result in a significant reduction of execution stalls caused by memory throttling. We describe the implementation of the new data layout optimization and we report experimental results that confirm the advantages of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The OpenACC Application Programming Interface (2013). openacc-standard.org, version 2.0a
Bolt C++ Template Library, version 1.2 (2014)
Programming Languages - C++ (Committee Draft) (2014). isocpp.org
An, P., et al.: STAPL: an adaptive, generic parallel C++ library. In: Dietz, H.G. (ed.) LCPC 2001. LNCS, vol. 2624, pp. 193–208. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-35767-X_13
Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21487-5_9
CUDA Vector addition example (2019). https://github.com/olcf/vector_addition_tutorials/tree/master/CUDA
Haidl, M., Gorlatch, S.: PACXX: towards a unified programming model for programming accelerators using C++14. In: 2014 LLVM Compiler Infrastructure in HPC, pp. 1–11, November 2014. https://doi.org/10.1109/LLVM-HPC.2014.9
Hoberock, J., Bell, N.: Thrust: A Parallel Template Library, version 1.6 (2014)
Khronos OpenCL Working Group: The OpenCL Specification, version 1.2 (2012)
Khronos SYCL Working Group: The SYCL Specification, version 2020 (2021)
Kucher, V., Fey, F., Gorlatch, S.: Unified cross-platform profiling of parallel C++ applications. In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp. 57–62 (2018)
Kucher, V., Gorlatch, S.: Towards implicit memory management for portable parallel programming in C++. In: Proceedings of the 2020 ASSE, pp. 52–56. ACM, New York (2020). https://doi.org/10.1145/3399871.3399881
Kucher, V., Hunloh, J., Gorlatch, S.: Toward performance-portable finite element methods on high-performance systems. In: 2019 SigTelCom, pp. 69–73, March 2019. https://doi.org/10.1109/SIGTELCOM.2019.8696146
Lattner, C.: LLVM and Clang: next generation compiler technology. In: The BSD Conference, pp. 1–2 (2008)
Li, L., Kessler, C.: VectorPU: a generic and efficient data-container and component model for transparent data transfer on GPU-based heterogeneous systems. PARMA-DITAM 2017, pp. 7–12. ACM, New York (2017). https://doi.org/10.1145/3029580.3029582
Microsoft: C++ AMP: Language and Programming Model, version 1.0 (2012)
Nvidia: CUDA C Programming Guide, version 6.5 (2014)
OpenCL Vector addition example (2019). https://github.com/olcf/vector_addition_tutorials/tree/master/OpenCL
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kucher, V., Gorlatch, S. (2021). Implicit Data Layout Optimization for Portable Parallel Programming in C++. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2021. Lecture Notes in Computer Science(), vol 12942. Springer, Cham. https://doi.org/10.1007/978-3-030-86359-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-86359-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86358-6
Online ISBN: 978-3-030-86359-3
eBook Packages: Computer ScienceComputer Science (R0)