Abstract
Many high-end HPC systems support accelerators in their compute nodes to target a variety of workloads including high-performance computing simulations, big data / data analytics codes and visualization. To program both the CPU cores and attached accelerators, users now have multiple programming models available such as CUDA, OpenMP 4, OpenACC, C++14, etc., but some of these models fall short in their support for C++ on accelerators because they can have difficulty supporting advanced C++ features e.g. templating, class members, loops with iterators, lambdas, deep copy, etc. Usually, they either rely on unified memory, or the programming language is not aware of accelerators (e.g. C++14). In this paper, we explore a base-language solution called C++ Accelerated Massive Parallelism (AMP), which was developed by Microsoft and implemented by the PathScale ENZO compiler to program GPUs on a variety of HPC architectures including OpenPOWER and Intel Xeon. We report some prelminary in-progress results using C++ AMP to accelerate a matrix multiplication and quantum Monte Carlo application kernel, examining its expressiveness and performance using NVIDIA GPUs and the PathScale ENZO compiler. We hope that this preliminary report will provide a data point that will inform the functionality needed for future C++ standards to support accelerators with discrete memory spaces.
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This paper is authored by an employee(s) of the United States Government and is in the public domain. Non-exclusive copying or redistribution is allowed, provided that the article citation is given and the authors and agency are clearly identified as its source.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-Specific Languages and High-Level Frameworks High-Performance Computing. http://www.sciencedirect.com/science/article/pii/S0743731514001257
Hornung, R.D., Keasler, J.A.: The RAJA portability layer: Overview and status (2014). https://e-reports-ext.llnl.gov/pdf/782261.pdf
Hoberock, J.: Working draft, technical specification for C++ extensions for parallelism (2014). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4071.htm
Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011)
Liao, C., Yan, Y., de Supinski, B.R., Quinlan, D.J., Chapman, B.: Early experiences with the OpenMP accelerator model. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 84–98. Springer, Heidelberg (2013)
CAPS, CRAY and NVIDIA, PGI: The OpenACC application programming interface (2013). http://openacc.org
Microsoft Corporation: C++ AMP: Language and programming model (2013). http://download.microsoft.com/download/2/2/9/22972859-15C2-4D96-97AE-93344241D56C/CppAMPOpenSpecificationV12.pdf
Microsoft Corporation “Reference (C++ AMP)” (2012). http://msdn.microsoft.com/en-us/library/hh289390%28v=vs.110%29.aspx
PathSCale Inc.: PathScale EKOPath Compiler & ENZO GPGPU Solutions (2016). http://www.pathscale.com
Sharlet, D., Kunze, A., Junkins, S., Joshi, D.: Shevlin Park: ImplementingC++ AMP with Clang/LLVM and OpenCL 2012 LLVM Developers’ Meeting (2012). http://llvm.org/devmtg/201211#talk10
HSA Foundation: Bringing C++ AMP Beyond Windows via CLANG and LLVM (2013). http://www.hsafoundation.com/bringing-camp-beyond-windows-via-clang-llvm/
INCITE program. http://www.doeleadershipcomputing.org/incite-program/
CORAL fact sheet. http://www.anl.gov/sites/anl.gov/files/CORAL%20Fact%20Sheet.pdf
Bland, A.S., Wells, J.C., Messer, O.E., Hernandez, O.R., Rogers, J.H.: Titan: early experience with the cray XK6 at Oak Ridge National Laboratory. In: Proceedings of Cray User Group Conference (CUG) (2012)
SUMMIT: Scale new heights. Discover new solutions. https://www.olcf.ornl.gov/summit/
Walkthrough: Matrix multiplication. https://msdn.microsoft.com/en-us/library/hh873134.aspx
Kim, J., Esler, K.P., McMinis, J., Morales, M.A., Clark, B.K., Shulenburger, L., Ceperley, D.M.: Hybrid algorithms in quantum Monte Carlo. J. Phys.: Conf. Ser. 402(1), 012008 (2012). http://stacks.iop.org/1742-6596/402/i=1/a=012008
Esler, K.P., Kim, J., Schulenburger, L., Ceperley, D.: Fully accelerating quantum monte carlo simulations of real materials on GPU clusters. Comput. Sci. Eng. 13(5), 1–9 (2011)
Wong, M., Kaiser, H., Heller, T.: Towards Massive Parallelism (aka Heterogeneous Devices/Accelerator/GPGPU) support in C++ with HPX (2015). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0234r0.pdf
Wong, M., Richards, A., Rovatsou, M., Reyes, R.: Kronos’s OpenCL SYCL to support Heterogeneous Devices for C++ (2016). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0236r0.pdf
Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, ser PGAS 2014, pp. 6:1–6:11. ACM, New York (2014). http://doi.acm.org/10.1145/2676870.2676883
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. IEEE Des. Test 12(3), 66–73 (2010). http://dx.doi.org/10.1109/MCSE.2010.69
Acknowledgements
This material is based upon work supported by the U.S. Department of Energy, Office of science, and this research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Lopez, M.G., Bergstrom, C., Li, Y.W., Elwasif, W., Hernandez, O. (2016). Using C++ AMP to Accelerate HPC Applications on Multiple Platforms. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-46079-6_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)