Implicit Data Layout Optimization for Portable Parallel Programming in C++

Kucher, Vladyslav; Gorlatch, Sergei

doi:10.1007/978-3-030-86359-3_17

Vladyslav Kucher⁹ &
Sergei Gorlatch⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12942))

Included in the following conference series:

International Conference on Parallel Computing Technologies

860 Accesses

Abstract

The programming process for modern parallel processors including multi-core CPUs and many-core GPUs (Graphics Processing Units) represents a significant challenge for application developers. We propose to use the widely-popular programming language C++ for parallel programming in a portable way, allowing the same program to be run on different target architectures. In this paper we extend our framework PACXX (Programming Accelerators in C++) with an additional compilation pass which simplifies data management for the programmer and makes the programming process less error-prone. These changes result in a significant reduction of execution stalls caused by memory throttling. We describe the implementation of the new data layout optimization and we report experimental results that confirm the advantages of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

The OpenACC Application Programming Interface (2013). openacc-standard.org, version 2.0a
Bolt C++ Template Library, version 1.2 (2014)
Google Scholar
Programming Languages - C++ (Committee Draft) (2014). isocpp.org
An, P., et al.: STAPL: an adaptive, generic parallel C++ library. In: Dietz, H.G. (ed.) LCPC 2001. LNCS, vol. 2624, pp. 193–208. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-35767-X_13
Chapter Google Scholar
Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21487-5_9
Chapter Google Scholar
CUDA Vector addition example (2019). https://github.com/olcf/vector_addition_tutorials/tree/master/CUDA
Haidl, M., Gorlatch, S.: PACXX: towards a unified programming model for programming accelerators using C++14. In: 2014 LLVM Compiler Infrastructure in HPC, pp. 1–11, November 2014. https://doi.org/10.1109/LLVM-HPC.2014.9
Hoberock, J., Bell, N.: Thrust: A Parallel Template Library, version 1.6 (2014)
Google Scholar
Khronos OpenCL Working Group: The OpenCL Specification, version 1.2 (2012)
Google Scholar
Khronos SYCL Working Group: The SYCL Specification, version 2020 (2021)
Google Scholar
Kucher, V., Fey, F., Gorlatch, S.: Unified cross-platform profiling of parallel C++ applications. In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp. 57–62 (2018)
Google Scholar
Kucher, V., Gorlatch, S.: Towards implicit memory management for portable parallel programming in C++. In: Proceedings of the 2020 ASSE, pp. 52–56. ACM, New York (2020). https://doi.org/10.1145/3399871.3399881
Kucher, V., Hunloh, J., Gorlatch, S.: Toward performance-portable finite element methods on high-performance systems. In: 2019 SigTelCom, pp. 69–73, March 2019. https://doi.org/10.1109/SIGTELCOM.2019.8696146
Lattner, C.: LLVM and Clang: next generation compiler technology. In: The BSD Conference, pp. 1–2 (2008)
Google Scholar
Li, L., Kessler, C.: VectorPU: a generic and efficient data-container and component model for transparent data transfer on GPU-based heterogeneous systems. PARMA-DITAM 2017, pp. 7–12. ACM, New York (2017). https://doi.org/10.1145/3029580.3029582
Microsoft: C++ AMP: Language and Programming Model, version 1.0 (2012)
Google Scholar
Nvidia: CUDA C Programming Guide, version 6.5 (2014)
Google Scholar
OpenCL Vector addition example (2019). https://github.com/olcf/vector_addition_tutorials/tree/master/OpenCL

Download references

Author information

Authors and Affiliations

University of Muenster, Muenster, Germany
Vladyslav Kucher & Sergei Gorlatch

Authors

Vladyslav Kucher
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Gorlatch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladyslav Kucher .

Editor information

Editors and Affiliations

Institute of Computational Mathematics and Mathematical Geophysics SB RAS, Novosibirsk, Russia
Victor Malyshkin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kucher, V., Gorlatch, S. (2021). Implicit Data Layout Optimization for Portable Parallel Programming in C++. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2021. Lecture Notes in Computer Science(), vol 12942. Springer, Cham. https://doi.org/10.1007/978-3-030-86359-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-86359-3_17
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86358-6
Online ISBN: 978-3-030-86359-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics