research-article

Using SYCL as an Implementation Framework for HPX.Compute

Authors:

Hartmut KaiserAuthors Info & Claims

IWOCL '17: Proceedings of the 5th International Workshop on OpenCL

Article No.: 30, Pages 1 - 7

https://doi.org/10.1145/3078155.3078187

Published: 16 May 2017 Publication History

Abstract

The recent advancements in High Performance Computing and ongoing research to reach Exascale has been heavily supported by introducing dedicated massively parallel accelerators. Programmers wishing to maximize utilization of current supercomputers are required to develop software which not only involves scaling across multiple nodes but are capable of offloading data-parallel computation to dedicated hardware such as graphic processors. Introduction of new types of hardware has been followed by developing new languages, extensions, compilers and libraries. Unfortunately, none of those solutions seem to be fully portable and independent from specific vendor and type of hardware.

HPX.Compute, a programming model developed on top of HPX, a C++ standards library for concurrency and parallelism, uses existing and proposed C++ language and library capabilities to support various types of parallelism. It aims to provide a generic interface allowing for writing code which is portable between hardware architectures.

We have implemented a new backend for HPX.Compute based on SYCL, a Khronos standard for single-source programming of OpenCL devices in C++. We present how this runtime may be used to target OpenCL devices through our C++ API. We have evaluated performance of new implementation on graphic processors with STREAM benchmark and compare results with existing CUDA-based implementation.

References

[1]

AMD. Bolt C++ Template Library, version 1.3, 2015. https://github.com/HSA-Libraries/Bolt.

[2]

AMD. Heterogeneous Computing C++ API, 2016. https://scchan.github.io/hcc/.

[3]

AMD. HCC: An open source C++ compiler for heterogeneous devices, 2016. https://github.com/RadeonOpenCompute/hcc/.

[4]

M. Copik. HPX and GPU parallelized STL. C++Now, 2016. URL https://cppnow2016.sched.org/event/6SfU.

[5]

T. Deakin, J. Price, M. Martineau, and S. McIntosh-Smith. GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models, pages 489--507. Springer International Publishing, Cham, 2016. ISBN 978-3-319-46079-6.

[6]

T. Heller, H. Kaiser, P. Diehl, D. Fey, and M. A. Schweitzer. Closing the Performance Gap with Modern C++, pages 18--31. Springer International Publishing, Cham, 2016. ISBN 978-3-319-46079-6.

[7]

J. Hoberock and N. Bell. Thrust: A parallel template library. Thrust: A Parallel Template Library, 2009.

[8]

J. Hoberock, M. Garland, and O. Girioux. N4406 Parallel Algorithms Need Executors. Technical report, 2015. URL http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4406.pdf.

[9]

H. Kaiser, B. Adelstein-Lelbach, T. Heller, A. BergÃl', J. Biddiscombe, A. Bikineev, G. Mercer, A. SchÃd'fer, J. Habraken, A. Serio, M. Anderson, M. Stumpf, D. Bourgeois, P. Grubel, S. R. Brandt, M. Copik, V. Amatya, K. Huck, L. Viklund, Z. Khatami, D. Bacharwar, S. Yang, E. Schnetter, Bcorde5, M. Brodowicz, Bibek, atrantan, L. Troska, Z. Byerly, and S. Upadhyay. hpx: HPX V0.9.99: A general purpose C++ runtime system for parallel and distributed applications of any scale, July 2016. URL

[10]

J. D. McCalpin. Stream: Sustainable memory bandwidth in high performance computers. Technical report, University of Virginia, Charlottesville, Virginia, 1991--2007. URL http://www.cs.virginia.edu/stream/. A continually updated technical report. http://www.cs.virginia.edu/stream/.

[11]

Microsoft Corporation. C++ AMP Open Specification V1.2. Technical report, 2013.

[12]

J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with cuda. Queue, 6(2):40--53, Mar. 2008. ISSN 1542-7730.

Digital Library

[13]

R. Potter, P. Keir, R. J. Bradford, and A. Murray. Kernel composition in sycl. In Proceedings of the 3rd International Workshop on OpenCL, IWOCL '15, pages 11:1--11:7, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3484-6.

Digital Library

[14]

A. D. Robison, P. Halpern, R. Geva, and C. Nelson. P0075r1 Template Library for Parallel For Loops. Technical report, 2016. URL http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0075r1.pdf.

[15]

J. Szuppe. Boost.compute: A parallel computing library for c++ based on opencl. In Proceedings of the 4th International Workshop on OpenCL, IWOCL '16, pages 15:1--15:39, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4338-1.

Digital Library

[16]

The Khronos Group. SYCL Provisional Specification Version 2.2. Technical report, 2016.

[17]

A. Vilches and R. Reyes. Syclparallelstl: A parallel stl library for heterogeneous systems. 1st SYCL Programming Workshop, 2016. URL http://ppopp16.sigplan.org/event/sycl-2016-papers-syclparallelstl-a-parallel-stl-library-for-heterogeneous-systems.

[18]

M. Wong, A. Richards, M. Rovatsou, and R. Reyes. P0236R0 Khronos's OpenCL SYCL to support Heterogeneous Devices for C++. Technical report, 2016. URL http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0236r0.pdf.

[19]

J. Wu, A. Belevich, E. Bendersky, M. Heffernan, C. Leary, J. Pienaar, B. Roune, R. Springer, X. Weng, and R. Hundt. Gpucc: An open-source gpgpu compiler. In Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO '16, pages 105--116, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-3778-6.

Digital Library

Cited By

Thoman PSalzmann P(2024)Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons ApproachSN Computer Science10.1007/s42979-024-02749-w5:4Online publication date: 6-Apr-2024
https://doi.org/10.1007/s42979-024-02749-w
Alves RRufino J(2023)Remote Execution of OpenCL and SYCL Applications via rOpenCL2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00020(51-60)Online publication date: May-2023
https://doi.org/10.1109/IPDPSW59300.2023.00020
Copik MTaranov KCalotoiu AHoefler T(2023)rFaaS: Enabling High Performance Serverless with RDMA and Leases2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00094(897-907)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00094
Show More Cited By

Index Terms

Using SYCL as an Implementation Framework for HPX.Compute

Index terms have been assigned to the content through auto-classification.

Recommendations

Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL
IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogeneity of available accelerator cards within current supercomputers, portability is a key aspect for modern HPC applications. In Octo-Tiger, an astrophysics application simulating ...
Accelerated Neural Networks on OpenCL Devices Using SYCL-DNN
IWOCL '19: Proceedings of the International Workshop on OpenCL

Over the past few years machine learning has seen a renewed explosion of interest, following a number of studies showing the effectiveness of neural networks in a range of tasks which had previously been considered incredibly hard. Neural networks' ...
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IWOCL '17: Proceedings of the 5th International Workshop on OpenCL

May 2017

135 pages

ISBN:9781450352147

DOI:10.1145/3078155

General Chairs:
Simon McIntosh-Smith
University of Bristol, UK
,
Ben Bergen
Los Alamos National Laboratory, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

The University of Bristol: The University of Bristol

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IWOCL 2017

IWOCL 2017: 5th International Workshop on OpenCL

May 16 - 18, 2017

Toronto, Canada

Acceptance Rates

IWOCL '17 Paper Acceptance Rate 15 of 29 submissions, 52%;

Overall Acceptance Rate 84 of 152 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
167
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Thoman PSalzmann P(2024)Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons ApproachSN Computer Science10.1007/s42979-024-02749-w5:4Online publication date: 6-Apr-2024
https://doi.org/10.1007/s42979-024-02749-w
Alves RRufino J(2023)Remote Execution of OpenCL and SYCL Applications via rOpenCL2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00020(51-60)Online publication date: May-2023
https://doi.org/10.1109/IPDPSW59300.2023.00020
Copik MTaranov KCalotoiu AHoefler T(2023)rFaaS: Enabling High Performance Serverless with RDMA and Leases2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00094(897-907)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00094
Thoman PSalzmann P(2023)Command Horizons: Coalescing Data Dependencies While Maintaining AsynchronicityAsynchronous Many-Task Systems and Applications10.1007/978-3-031-32316-4_2(13-26)Online publication date: 15-Feb-2023
https://dl.acm.org/doi/10.1007/978-3-031-32316-4_2
Ghiglio PDolinsky UGoli MNarasimhan K(2023)Improving performance of SYCL applications on CPU architectures using LLVM‐directed compilation flowConcurrency and Computation: Practice and Experience10.1002/cpe.781035:27Online publication date: 30-May-2023
https://doi.org/10.1002/cpe.7810
Thoman PMolina Heredia FFahringer T(2022)On the Compilation Performance of Current SYCL ImplementationsProceedings of the 10th International Workshop on OpenCL10.1145/3529538.3529548(1-12)Online publication date: 10-May-2022
https://dl.acm.org/doi/10.1145/3529538.3529548
Ghiglio PDolinsky UGoli MNarasimhan K(2022)Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flowProceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3528425.3529099(1-10)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3528425.3529099
Thoman PTischler FSalzmann PFahringer T(2022)The Celerity High-level API: C++20 for Accelerator ClustersInternational Journal of Parallel Programming10.1007/s10766-022-00731-850:3-4(341-359)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1007/s10766-022-00731-8
Kaiser HDiehl PLemoine ALelbach BAmini PBerge ABiddiscombe JBrandt SGupta NHeller THuck KKhatami ZKheirkhahan AReverdell AShirzad SSimberg MWagle BWei WZhang T(2020)HPX - The C++ Standard Library for Parallelism and ConcurrencyJournal of Open Source Software10.21105/joss.023525:53(2352)Online publication date: Sep-2020
https://doi.org/10.21105/joss.02352
Nozal RBosque JBeivide R(2020)EngineCLFuture Generation Computer Systems10.1016/j.future.2020.02.016107:C(522-537)Online publication date: 1-Jun-2020
https://dl.acm.org/doi/10.1016/j.future.2020.02.016
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten