Abstract
The popularity of heterogeneous computing continues to increase rapidly due to the high peak performance, favorable energy efficiency, and comparatively low cost of accelerators. However, heterogeneous programming models still lack the flexibility of their CPU-only counterparts. Accelerated OpenMP models, including OpenMP 4.0 and OpenACC, ease the migration of code from CPUs to GPUs but lack much of OpenMP’s flexibility: OpenMP applications can run on any number of CPUs without extra user effort, but GPU implementations do not offer similar adaptive worksharing across GPUs in a node, nor do they employ a mix of CPUs and GPUs. To address these shortcomings, we present CoreTSAR, our library for scheduling cores via a task-size adapting runtime system by supporting worksharing of loop nests across arbitrary heterogeneous resources. Beyond scheduling the computational load across devices, CoreTSAR includes a memory-management system that operates based on task association, enabling the runtime to dynamically manage memory movement and task granularity. Our evaluation shows that CoreTSAR can provide nearly linear scaling to four GPUs and all cores in a node without modifying the code within the parallel region. Furthermore, CoreTSAR provides portable performance across a variety of system configurations.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This work was supported in part by the Air Force Office of Scientific Research (AFOSR) Computational Mathematics Program via Grant No. FA9550-12-1-0442, NSF I/UCRC IIP-1266245 via the NSF Center for High-Performance Reconfigurable Computing (CHREC) and a DoD National Defense Science & Engineering Graduate Fellowship (NDSEG).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Anandakrishnan, R., Scogland, T.R.W., Fenley, A.T., Gordon, J.C., Feng, W.-c., Onufriev, A.V.: Accelerating Electrostatic Surface Potential Calculation with Multi-Scale Approximation on Graphics Processing Units. Journal of Molecular Graphics and Modelling 28(8), 904–910 (2009)
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009)
Ayguadé, E., Blainey, B., Duran, A., Labarta, J., Martínez, F., Martorell, X., Silvera, R.: Is the Schedule Clause Really Necessary in OpenMP? In: Voss, M.J. (ed.) WOMPAT 2003. LNCS, vol. 2716, pp. 147–160. Springer, Heidelberg (2003)
Berkelaar, M., Notebaert, P., Eikland, K.: lp_solve(mixed integer) linear programming problem solver (2003), http://lpsolve.sourceforge.net/5.0/
Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011)
CAPS Enterprise, Cray Inc., NVIDIA and the Portland Group. The openacc application programming interface, v1.0. (November 2011), http://www.openacc-standard.org
Daga, M., Scogland, T., Feng, W.: Architecture-aware mapping and optimization on a 1600-core gpu. In: 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), pp. 316–323. IEEE (2011)
Dagum, L., Menon, R.: OpenMP: An Industry Standard API for Shared-Memory Programming. IEEE Computational Science & Engineering 5(1), 46–55 (1998)
Duran, A., Ayguade, E., Badia, R., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: A Proposal for Programming Heterogeneous Multi-Core Architectures. Parallel Processing Letters 21(2), 173–193 (2011)
Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S.: Auto-tuning a High-Level Language Targeted to GPU Codes. cis.udel.edu
Munshi, A.: Khronos OpenCL Working Group and others. The opencl specification (2008)
OpenMP Architecture Review Board. OpenMP application program interface version 4.0 (2013)
Ravi, V.T., Agrawal, G.: A dynamic scheduling framework for emerging heterogeneous systems. In: 2011 18th International Conference on High Performance Computing (HiPC), pp. 1–10 (2011)
Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: ICS 2010: Proceedings of the 24th ACM International Conference on Supercomputing, ACM Request Permissions (June 2010)
Reinders, J.: Intel Threading Building Blocks (2007)
Scogland, T.R.W., Rountree, B., Feng, W.-c., de Supinski, B.R.: Heterogeneous Task Scheduling for Accelerated OpenMP. In: 2012 IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Scogland, T.R.W., Feng, Wc., Rountree, B., de Supinski, B.R. (2014). CoreTSAR: Adaptive Worksharing for Heterogeneous Systems. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-07518-1_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07517-4
Online ISBN: 978-3-319-07518-1
eBook Packages: Computer ScienceComputer Science (R0)