Abstract
High Performance Computing relies on accelerators (such as GPGPUs) to achieve fast execution of scientific applications. Traditionally these accelerators have been programmed with specialized languages, such as CUDA or OpenCL. In recent years, OpenMP emerged as a promising alternative for supporting accelerators, providing advantages such as maintaining a single code base for the host and different accelerator types and providing a simple way to extend support for accelerators to existing code. Efficiently using this support requires solving several challenges, related to performance, work partitioning, and concurrent execution on multiple device types. In this paper, we discuss these challenges and introduce a library, HybridOMP, that addresses several of them, thus enabling the effective use of OpenMP for accelerators. We apply HybridOMP to a scientific application, PlasCom2, that has not previously been able to use accelerators. Experiments on three architectures show that HybridOMP results in performance gains of up to 10x compared to CPU-only execution. Concurrent execution on the host and GPU resulted in additional gains of up to 10% compared to running on the GPU only.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23(2), 187–198 (2011). https://doi.org/10.1002/cpe.1631
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2012). https://doi.org/10.1109/SC.2012.71
Bercea, G.T., et al.: Performance analysis of OpenMP on a GPU using a CORAL proxy application. In: International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems (PBMS), pp. 1–11 (2015). https://doi.org/10.1145/2832087.2832089
Carter Edwards, H., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). https://doi.org/10.1016/j.jpdc.2014.07.003
Diener, M., White, S., Kale, L.V., Campbell, M., Bodony, D.J., Freund, J.B.: Improving the memory access locality of hybrid MPI applications. In: European MPI Users’ Group Meeting (EuroMPI), pp. 1–10. ACM Press, New York (2017). https://doi.org/10.1145/3127024.3127038
Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: XKaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: International Parallel and Distributed Processing Symposium (IPDPS), pp. 1299–1308 (2013). https://doi.org/10.1109/IPDPS.2013.66
Gregory, K., Miller, A.: C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++. Microsoft Press (2012)
Jacob, A.C., et al.: Efficient fork-join on GPUs through warp specialization. In: 2017 IEEE 24th International Conference on High Performance Computing (HiPC), pp. 358–367. IEEE, December 2017. https://doi.org/10.1109/HiPC.2017.00048
Khronos Group: OpenCL 2.2 Reference Guide. Technical report (2017)
Ǵomez Luna, J., et al.: Collaborative heterogeneous applications for integrated-architectures. In: ISPASS 2017 - IEEE International Symposium on Performance Analysis of Systems and Software, pp. 43–54 (2017). https://doi.org/10.1109/ISPASS.2017.7975269
Nvidia: CUDA C programming guide, version 9.1. Technical report (2018)
OpenACC-Standard.org: OpenACC Programming and Best Practices Guide. Technical report, June 2015
OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 4.0 (2013)
Planas, J., Badia, R.M., Ayguadé, E., Labarta, J.: Self-adaptive OmpSs tasks in heterogeneous environments. In: PAR International. pp. 138–149 (2013). https://doi.org/10.1109/IPDPS.2013.53
Sun, Y.G., et al.: Hetero-mark, a benchmark suite for CPU-GPU collaborative computing. In: Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016, pp. 13–22 (2016). https://doi.org/10.1109/IISWC.2016.7581262
Acknowledgments
This material is based in part upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number DE-NA0002374.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Diener, M., Bodony, D.J., Kale, L. (2019). Accelerating Scientific Applications on Heterogeneous Systems with HybridOMP. In: Senger, H., et al. High Performance Computing for Computational Science – VECPAR 2018. VECPAR 2018. Lecture Notes in Computer Science(), vol 11333. Springer, Cham. https://doi.org/10.1007/978-3-030-15996-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-15996-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15995-5
Online ISBN: 978-3-030-15996-2
eBook Packages: Computer ScienceComputer Science (R0)