skip to main content
10.1145/3078155.3078187acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiwoclConference Proceedingsconference-collections
research-article

Using SYCL as an Implementation Framework for HPX.Compute

Published: 16 May 2017 Publication History

Abstract

The recent advancements in High Performance Computing and ongoing research to reach Exascale has been heavily supported by introducing dedicated massively parallel accelerators. Programmers wishing to maximize utilization of current supercomputers are required to develop software which not only involves scaling across multiple nodes but are capable of offloading data-parallel computation to dedicated hardware such as graphic processors. Introduction of new types of hardware has been followed by developing new languages, extensions, compilers and libraries. Unfortunately, none of those solutions seem to be fully portable and independent from specific vendor and type of hardware.
HPX.Compute, a programming model developed on top of HPX, a C++ standards library for concurrency and parallelism, uses existing and proposed C++ language and library capabilities to support various types of parallelism. It aims to provide a generic interface allowing for writing code which is portable between hardware architectures.
We have implemented a new backend for HPX.Compute based on SYCL, a Khronos standard for single-source programming of OpenCL devices in C++. We present how this runtime may be used to target OpenCL devices through our C++ API. We have evaluated performance of new implementation on graphic processors with STREAM benchmark and compare results with existing CUDA-based implementation.

References

[1]
AMD. Bolt C++ Template Library, version 1.3, 2015. https://github.com/HSA-Libraries/Bolt.
[2]
AMD. Heterogeneous Computing C++ API, 2016. https://scchan.github.io/hcc/.
[3]
AMD. HCC: An open source C++ compiler for heterogeneous devices, 2016. https://github.com/RadeonOpenCompute/hcc/.
[4]
M. Copik. HPX and GPU parallelized STL. C++Now, 2016. URL https://cppnow2016.sched.org/event/6SfU.
[5]
T. Deakin, J. Price, M. Martineau, and S. McIntosh-Smith. GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models, pages 489--507. Springer International Publishing, Cham, 2016. ISBN 978-3-319-46079-6.
[6]
T. Heller, H. Kaiser, P. Diehl, D. Fey, and M. A. Schweitzer. Closing the Performance Gap with Modern C++, pages 18--31. Springer International Publishing, Cham, 2016. ISBN 978-3-319-46079-6.
[7]
J. Hoberock and N. Bell. Thrust: A parallel template library. Thrust: A Parallel Template Library, 2009.
[8]
J. Hoberock, M. Garland, and O. Girioux. N4406 Parallel Algorithms Need Executors. Technical report, 2015. URL http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4406.pdf.
[9]
H. Kaiser, B. Adelstein-Lelbach, T. Heller, A. BergÃl', J. Biddiscombe, A. Bikineev, G. Mercer, A. SchÃd'fer, J. Habraken, A. Serio, M. Anderson, M. Stumpf, D. Bourgeois, P. Grubel, S. R. Brandt, M. Copik, V. Amatya, K. Huck, L. Viklund, Z. Khatami, D. Bacharwar, S. Yang, E. Schnetter, Bcorde5, M. Brodowicz, Bibek, atrantan, L. Troska, Z. Byerly, and S. Upadhyay. hpx: HPX V0.9.99: A general purpose C++ runtime system for parallel and distributed applications of any scale, July 2016. URL
[10]
J. D. McCalpin. Stream: Sustainable memory bandwidth in high performance computers. Technical report, University of Virginia, Charlottesville, Virginia, 1991--2007. URL http://www.cs.virginia.edu/stream/. A continually updated technical report. http://www.cs.virginia.edu/stream/.
[11]
Microsoft Corporation. C++ AMP Open Specification V1.2. Technical report, 2013.
[12]
J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with cuda. Queue, 6(2):40--53, Mar. 2008. ISSN 1542-7730.
[13]
R. Potter, P. Keir, R. J. Bradford, and A. Murray. Kernel composition in sycl. In Proceedings of the 3rd International Workshop on OpenCL, IWOCL '15, pages 11:1--11:7, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3484-6.
[14]
A. D. Robison, P. Halpern, R. Geva, and C. Nelson. P0075r1 Template Library for Parallel For Loops. Technical report, 2016. URL http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0075r1.pdf.
[15]
J. Szuppe. Boost.compute: A parallel computing library for c++ based on opencl. In Proceedings of the 4th International Workshop on OpenCL, IWOCL '16, pages 15:1--15:39, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4338-1.
[16]
The Khronos Group. SYCL Provisional Specification Version 2.2. Technical report, 2016.
[17]
A. Vilches and R. Reyes. Syclparallelstl: A parallel stl library for heterogeneous systems. 1st SYCL Programming Workshop, 2016. URL http://ppopp16.sigplan.org/event/sycl-2016-papers-syclparallelstl-a-parallel-stl-library-for-heterogeneous-systems.
[18]
M. Wong, A. Richards, M. Rovatsou, and R. Reyes. P0236R0 Khronos's OpenCL SYCL to support Heterogeneous Devices for C++. Technical report, 2016. URL http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0236r0.pdf.
[19]
J. Wu, A. Belevich, E. Bendersky, M. Heffernan, C. Leary, J. Pienaar, B. Roune, R. Springer, X. Weng, and R. Hundt. Gpucc: An open-source gpgpu compiler. In Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO '16, pages 105--116, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-3778-6.

Cited By

View all
  • (2024)Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons ApproachSN Computer Science10.1007/s42979-024-02749-w5:4Online publication date: 6-Apr-2024
  • (2023)Remote Execution of OpenCL and SYCL Applications via rOpenCL2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00020(51-60)Online publication date: May-2023
  • (2023)rFaaS: Enabling High Performance Serverless with RDMA and Leases2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00094(897-907)Online publication date: May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IWOCL '17: Proceedings of the 5th International Workshop on OpenCL
May 2017
135 pages
ISBN:9781450352147
DOI:10.1145/3078155
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • The University of Bristol: The University of Bristol

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. C++
  2. GPGPU
  3. HPX
  4. SYCL
  5. heterogeneous programming
  6. parallel programming

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IWOCL 2017
IWOCL 2017: 5th International Workshop on OpenCL
May 16 - 18, 2017
Toronto, Canada

Acceptance Rates

IWOCL '17 Paper Acceptance Rate 15 of 29 submissions, 52%;
Overall Acceptance Rate 84 of 152 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons ApproachSN Computer Science10.1007/s42979-024-02749-w5:4Online publication date: 6-Apr-2024
  • (2023)Remote Execution of OpenCL and SYCL Applications via rOpenCL2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00020(51-60)Online publication date: May-2023
  • (2023)rFaaS: Enabling High Performance Serverless with RDMA and Leases2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00094(897-907)Online publication date: May-2023
  • (2023)Command Horizons: Coalescing Data Dependencies While Maintaining AsynchronicityAsynchronous Many-Task Systems and Applications10.1007/978-3-031-32316-4_2(13-26)Online publication date: 15-Feb-2023
  • (2023)Improving performance of SYCL applications on CPU architectures using LLVM‐directed compilation flowConcurrency and Computation: Practice and Experience10.1002/cpe.781035:27Online publication date: 30-May-2023
  • (2022)On the Compilation Performance of Current SYCL ImplementationsProceedings of the 10th International Workshop on OpenCL10.1145/3529538.3529548(1-12)Online publication date: 10-May-2022
  • (2022)Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flowProceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3528425.3529099(1-10)Online publication date: 2-Apr-2022
  • (2022)The Celerity High-level API: C++20 for Accelerator ClustersInternational Journal of Parallel Programming10.1007/s10766-022-00731-850:3-4(341-359)Online publication date: 1-Aug-2022
  • (2020)HPX - The C++ Standard Library for Parallelism and ConcurrencyJournal of Open Source Software10.21105/joss.023525:53(2352)Online publication date: Sep-2020
  • (2020)EngineCLFuture Generation Computer Systems10.1016/j.future.2020.02.016107:C(522-537)Online publication date: 1-Jun-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media