Using C++ AMP to Accelerate HPC Applications on Multiple Platforms

Lopez, M. Graham; Bergstrom, Christopher; Li, Ying Wai; Elwasif, Wael; Hernandez, Oscar

doi:10.1007/978-3-319-46079-6_38

Using C++ AMP to Accelerate HPC Applications on Multiple Platforms

M. Graham Lopez¹⁶,
Christopher Bergstrom¹⁷,
Ying Wai Li¹⁸,
Wael Elwasif¹⁶ &
…
Oscar Hernandez¹⁶

Conference paper
First Online: 06 October 2016

2429 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Abstract

Many high-end HPC systems support accelerators in their compute nodes to target a variety of workloads including high-performance computing simulations, big data / data analytics codes and visualization. To program both the CPU cores and attached accelerators, users now have multiple programming models available such as CUDA, OpenMP 4, OpenACC, C++14, etc., but some of these models fall short in their support for C++ on accelerators because they can have difficulty supporting advanced C++ features e.g. templating, class members, loops with iterators, lambdas, deep copy, etc. Usually, they either rely on unified memory, or the programming language is not aware of accelerators (e.g. C++14). In this paper, we explore a base-language solution called C++ Accelerated Massive Parallelism (AMP), which was developed by Microsoft and implemented by the PathScale ENZO compiler to program GPUs on a variety of HPC architectures including OpenPOWER and Intel Xeon. We report some prelminary in-progress results using C++ AMP to accelerate a matrix multiplication and quantum Monte Carlo application kernel, examining its expressiveness and performance using NVIDIA GPUs and the PathScale ENZO compiler. We hope that this preliminary report will provide a data point that will inform the functionality needed for future C++ standards to support accelerators with discrete memory spaces.

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This paper is authored by an employee(s) of the United States Government and is in the public domain. Non-exclusive copying or redistribution is allowed, provided that the article citation is given and the authors and agency are clearly identified as its source.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-Specific Languages and High-Level Frameworks High-Performance Computing. http://www.sciencedirect.com/science/article/pii/S0743731514001257
Article Google Scholar
Hornung, R.D., Keasler, J.A.: The RAJA portability layer: Overview and status (2014). https://e-reports-ext.llnl.gov/pdf/782261.pdf
Hoberock, J.: Working draft, technical specification for C++ extensions for parallelism (2014). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4071.htm
Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011)
Chapter Google Scholar
Liao, C., Yan, Y., de Supinski, B.R., Quinlan, D.J., Chapman, B.: Early experiences with the OpenMP accelerator model. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 84–98. Springer, Heidelberg (2013)
Chapter Google Scholar
CAPS, CRAY and NVIDIA, PGI: The OpenACC application programming interface (2013). http://openacc.org
Microsoft Corporation: C++ AMP: Language and programming model (2013). http://download.microsoft.com/download/2/2/9/22972859-15C2-4D96-97AE-93344241D56C/CppAMPOpenSpecificationV12.pdf
Microsoft Corporation “Reference (C++ AMP)” (2012). http://msdn.microsoft.com/en-us/library/hh289390%28v=vs.110%29.aspx
PathSCale Inc.: PathScale EKOPath Compiler & ENZO GPGPU Solutions (2016). http://www.pathscale.com
Sharlet, D., Kunze, A., Junkins, S., Joshi, D.: Shevlin Park: ImplementingC++ AMP with Clang/LLVM and OpenCL 2012 LLVM Developers’ Meeting (2012). http://llvm.org/devmtg/201211#talk10
HSA Foundation: Bringing C++ AMP Beyond Windows via CLANG and LLVM (2013). http://www.hsafoundation.com/bringing-camp-beyond-windows-via-clang-llvm/
INCITE program. http://www.doeleadershipcomputing.org/incite-program/
CORAL fact sheet. http://www.anl.gov/sites/anl.gov/files/CORAL%20Fact%20Sheet.pdf
Bland, A.S., Wells, J.C., Messer, O.E., Hernandez, O.R., Rogers, J.H.: Titan: early experience with the cray XK6 at Oak Ridge National Laboratory. In: Proceedings of Cray User Group Conference (CUG) (2012)
Google Scholar
SUMMIT: Scale new heights. Discover new solutions. https://www.olcf.ornl.gov/summit/
Walkthrough: Matrix multiplication. https://msdn.microsoft.com/en-us/library/hh873134.aspx
Kim, J., Esler, K.P., McMinis, J., Morales, M.A., Clark, B.K., Shulenburger, L., Ceperley, D.M.: Hybrid algorithms in quantum Monte Carlo. J. Phys.: Conf. Ser. 402(1), 012008 (2012). http://stacks.iop.org/1742-6596/402/i=1/a=012008
Google Scholar
Esler, K.P., Kim, J., Schulenburger, L., Ceperley, D.: Fully accelerating quantum monte carlo simulations of real materials on GPU clusters. Comput. Sci. Eng. 13(5), 1–9 (2011)
Article Google Scholar
Wong, M., Kaiser, H., Heller, T.: Towards Massive Parallelism (aka Heterogeneous Devices/Accelerator/GPGPU) support in C++ with HPX (2015). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0234r0.pdf
Wong, M., Richards, A., Rovatsou, M., Reyes, R.: Kronos’s OpenCL SYCL to support Heterogeneous Devices for C++ (2016). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0236r0.pdf
Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, ser PGAS 2014, pp. 6:1–6:11. ACM, New York (2014). http://doi.acm.org/10.1145/2676870.2676883
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. IEEE Des. Test 12(3), 66–73 (2010). http://dx.doi.org/10.1109/MCSE.2010.69
Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the U.S. Department of Energy, Office of science, and this research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
M. Graham Lopez, Wael Elwasif & Oscar Hernandez
Pathscale Inc., Wilmington, DE, USA
Christopher Bergstrom
National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, USA
Ying Wai Li

Authors

M. Graham Lopez
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Bergstrom
View author publications
You can also search for this author in PubMed Google Scholar
Ying Wai Li
View author publications
You can also search for this author in PubMed Google Scholar
Wael Elwasif
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Hernandez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Graham Lopez .

Editor information

Editors and Affiliations

University of Delaware, Newark, Delaware, USA
Michela Taufer
Forschungszentrum Jülich, Jülich, Germany
Bernd Mohr
DKRZ, Hamburg, Germany
Julian M. Kunkel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lopez, M.G., Bergstrom, C., Li, Y.W., Elwasif, W., Hernandez, O. (2016). Using C++ AMP to Accelerate HPC Applications on Multiple Platforms. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-46079-6_38
Published: 06 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics