Abstract
We present a workstealing scheduler and show its use in two separate areas: (1) to enable hierarchical parallelism and per-core load balancing in stencil codes, and (2) to reduce overhead in per-thread load balancing in particle codes.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Andreolli, C.: Eight Optimizations for 3-Dimensional Finite Difference (3DFD) Code with an Isotropic (ISO). https://software.intel.com/en-us/articles/eight-optimizations-for-3-dimensional-finite-difference-3dfd-code-with-an-isotropic-iso. Accessed 21 Oct 2014
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. CACM 52(4), 65 (2009)
Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High-Performance Programming. Morgan Kauffman, Boston (2013)
Dempsey, J.: Plesiochronous phasing barriers. In: Jeffers, J., Reinders, J. (eds.) High Performance Parallelism Pearls, pp. 87–115. Morgan Kauffman, Boston (2015)
Briggs, J., et al.: Separable projection integrals for higher-order correlators of the cosmic microwave sky: acceleration by factors exceeding 100, Cornell University Library. http://arxiv.org/abs/1503.08809
Meadows, L., Kim, J., Wells, A.: Parallelization methods for hierarchical SMP systems. In: Terboven, C., et al. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 247–259. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24595-9_18
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995
Sbalzarini, I.F., Walther, J.H., Bergdorf, M., Hieber, S.E., Kotsalis, E.M., Koumoutsakos, P.: PPM a highly efficient parallel particlemesh library for the simulation of continuum systems. J. Comput. Phys. 215(2), 566 (2006)
Madduri, K., Im, E.-J., Ibrahim, K.Z., Williams, S., Ethier, S., Oliker, L.: Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms. Parallel Comput. 37(9), 501 (2011)
Schweizer, H., Besta, M., Hoefler, T.: Evaluating the cost of atomic operations on modern architectures. In: Proceedings of Parallel Architectures and Compilation (2015)
Dureau, D., Poëtte, G.: Hybrid parallel programming models for AMR neutron Monte-Carlo transport. In: Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Meadows, L., Pennycook, S.J., Duran, A., Wilmarth, T., Cownie, J. (2016). Workstealing and Nested Parallelism in SMP Systems. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-45550-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)