Skip to main content
Log in

High-Level Programming for Many-Cores Using C++14 and the STL

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Programming many-core systems with accelerators (e.g., GPUs) remains a challenging task, even for expert programmers. In the current, low-level approaches—OpenCL and CUDA—two distinct programming models are employed: the host code for the CPU is written in C/C++ with a restricted memory model, while the device code for the accelerator is written using a device-dependent model of CUDA or OpenCL. The programmer is responsible for explicitly specifying parallelism, memory transfers, and synchronization, and also for configuring the program and optimizing its performance for a particular many-core system. This leads to long, poorly structured and error-prone codes, often with a suboptimal performance. We present PACXX—an alternative, unified programming approach for accelerators. In PACXX, both host and device programs are written in the same programming language—the newest C++14 standard with the Standard Template Library (STL), including all modern features: type inference (auto), variadic templates, generic lambda expressions, and the newly proposed parallel extensions of the STL. PACXX includes an easy-to-use and type-safe API for multi-stage programming which allows for aggressive runtime compiler optimizations. We implement PACXX by developing a custom compiler (based on the Clang and LLVM frameworks) and a runtime system, that together perform memory management and synchronization automatically and transparently for the programmer. We evaluate our approach by comparing it to OpenCL regarding program size and target performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Aldinucci, M., Campa, S., Danelutto, M., Kilpatrick, P., Torquati, M.: Targeting distributed systems in fastflow. In: Euro-Par 2012: Parallel Processing Workshops, pp. 47–56, Springer (2012)

  2. AMD: Bolt C++ Template Library. Version 1.2 (2014)

  3. Bell, N., Hoberock, J.: Thrust: a parallel template library. GPU Computing Gems Jade Edition. pp. 359–372 (2011)

  4. Bischof, H., Gorlatch, S., Leshchinskiy, R., Müller, J.: Data parallelism in C++ template programs: a Barnes-Hut case study. Parallel Process. Lett. 15(03), 257–272 (2005)

    Article  MathSciNet  Google Scholar 

  5. Enmyren, J., Kessler, C.: SkePU: A multi-backend skeleton programming library for multi-GPU Systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, ACM, pp 5–14 (2010)

  6. Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int. J. High Perform. Comput. Netw. 7(2), 129–138 (2012)

    Article  Google Scholar 

  7. Gorlatch, S., Cole, M.: Parallel skeletons. In: Encyclopedia of Parallel Computing. pp. 1417–1422, Springer (2011)

  8. isocpp (2014a) Programming languages - C++ (committee draft)

  9. isocpp (2014b) Working draft, C++ extensions for ranges [N4569]

  10. isocpp (2015a) Programming languages—C++ extensions for library fundamentals [N4480]

  11. isocpp (2015b) Technical specification for C++ extensions for parallelism [N4578]

  12. Khronos Group: the OpenCL specification. Version 1.2 (2012)

  13. Khronos Group: the SPIR specification. Version 1.2 (2014)

  14. Khronos Group: SYCL specifcation. Version 1.2 (2015)

  15. Lattner, C.: LLVM and Clang: next generation compiler technology. In: Proceedings of the BSD Conference, pp 1–2 (2008)

  16. Lutz, T.: ParallelSTL. https://github.com/t-lutz/ParallelSTL. Accessed 30 Apr 2016

  17. Microsoft: C++ AMP: language and programming model. Version 1.0 (2012)

  18. Microsoft: Parallel STL. https://parallelstl.codeplex.com/ Accessed 30 Apr 2016

  19. Nvidia: CUDA programming guide. Version 7.5 (2015a)

  20. Nvidia: CUDA Toolkit 7.5 (2015b)

  21. Nvidia: Parallel thread execution ISA. Version 4.3 (2015c)

  22. Nyland, L., Harris, M., Prins, J.: Fast N-body simulation with CUDA. GPU Gems 3(1), 677–696 (2007)

    Google Scholar 

  23. Okabe, A., Boots, B., Sugihara, K., Chiu, S.N.: Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, vol. 501. Wiley, Hoboken (2009)

    MATH  Google Scholar 

  24. Rompf, T., Odersky, M.: Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. ACM SIGPLAN Notices, vol. 46, pp 127–136, ACM (2010)

  25. Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL—a portable skeleton library for high-level GPU programming. In: Workshop on High-Level Parallel Programming Models and Supportive Environments at IPDPS 2011, IEEE, pp 1176–1182 (2011)

  26. Sujeeth, A.K., Brown, K.J., Lee, H., et al.: Delite: a compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst. 13(4s), 134:1–134:25 (2014)

    Article  Google Scholar 

  27. Taha, W.: A gentle introduction to multi-stage programming. In: Domain-Specific Program Generation, pp 30–50, Springer (2004)

Download references

Acknowledgements

We would like to thank Michel Steuwer for many fruitful discussions and Nvidia Corp. for their generous hardware donation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Haidl.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Haidl, M., Gorlatch, S. High-Level Programming for Many-Cores Using C++14 and the STL. Int J Parallel Prog 46, 23–41 (2018). https://doi.org/10.1007/s10766-017-0497-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-017-0497-y

Keywords

Navigation