Overlapping Computations with Communications and I/O Explicitly Using OpenMP Based Heterogeneous Threading Models

Alam, Sadaf R.; Fourestey, Gilles; Videau, Brice; Genovese, Luigi; Goedecker, Stefan; Dugan, Nazim

doi:10.1007/978-3-642-30961-8_23

Sadaf R. Alam¹⁹,
Gilles Fourestey¹⁹,
Brice Videau²⁰,
Luigi Genovese²⁰,
Stefan Goedecker²¹ &
…
Nazim Dugan²¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7312))

Included in the following conference series:

International Workshop on OpenMP

1656 Accesses

Abstract

Holistic tuning and optimization of hybrid MPI and OpenMP applications is becoming focus for parallel code developers as the number of cores and hardware threads in processing nodes of high-end systems continue to increase. For example, there is support for 32 hardware threads on a Cray XE6 node with Interlagos processors while the IBM Blue Gene/Q system could support up to 64 threads per node. Note that, by default, OpenMP threads and MPI tasks are pinned to processor cores on these high-end systems and throughout the paper we assume fix bindings of threads to physical cores for the discussion. A number of OpenMP runtimes also support user specified bindings of threads to physical cores. Parallel and node efficiencies on these high-end systems for hybrid MPI and OpenMP applications largely depend on balancing and overlapping computation and communication workloads. This issue is further intensified when the nodes have a non-uniform access memory (NUMA) model and I/O accelerator devices. In these environments, where access to I/O devices such as GPU for code acceleration and network interface for MPI communication and parallel file I/O are managed and scheduled by a host CPU, application developers could introduce innovative solutions to overlap CPUs and I/O operations to improve node and parallel efficiencies. For example, in a production level application called BigDFT, the developers have introduced a master-slave model to explicitly overlap blocking, collective communication operations and local multi-threaded computation. Similarly some applications parallelized with MPI, OpenMP and GPU acceleration could assign a management thread for the GPU data and control orchestration, an MPI control thread for communication management while the CPU threads perform overlapping calculations, and potentially a background thread can be set aside for file I/O based fault-tolerance. Considering these emerging applications design needs, we would like to motivate the OpenMP standards committee, through examples and empirical results, to introduce thread and task heterogeneity in the language specification. This will allow code developers, especially those programming for large-scale distributed-memory HPC systems and accelerator devices, to design and develop portable solutions with overlapping control and data flow for their applications without resorting to custom solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Remote OpenMP Offloading

Enhancing MPI+OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support

Integrating Asynchronous Task Parallelism with OpenSHMEM

References

BigDFT code, http://inac.cea.fr/L_Sim/BigDFT/
Cray XE6 system, http://www.cray.com/Products/XE/CrayXE6System.aspx
Cray XK6 system, http://www.cray.com/Products/XK6/XK6.aspx
Ayguade, E., Badia, R.M., Cabrera, D., Duran, A., Gonzalez, M., Igual, F., Jimenez, D., Labarta, J., Martorell, X., Mayo, R., Perez, J.M., Quintana-Ortí, E.S.: A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 154–167. Springer, Heidelberg (2009)
Chapter Google Scholar
Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for Accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011)
Chapter Google Scholar
Fatica, M.: Accelerating Linpack with CUDA on heterogeneous clusters. In: GPGPU-2 Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. ACM, New York (2009)
Google Scholar
Genovese, L., Neelov, A., Goedecker, S., Deutsch, T., Ghasemi, A., Zilberberg, O., Bergman, Rayson, M., Schneider, R.: Daubechies wavelets as a basis set for density functional pseudopotential calculations. J. Chem. Phys. 129, 14109 (2008)
Article Google Scholar
Jones, W.M., Daly, J.T., DeBardeleben, N.A.: Application Resilience: Making Progress in Spite of Failure. In: Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pp. 789–794 (2008)
Google Scholar
Park, B.H., Naughton, T.J., Agarwal, P.K., Bernholdt, D.E., Geist, A., Tippens, J.L.: Realization of User Level Fault Tolerant Policy Management through a Holistic Approach for Fault Correlation. In: IEEE Symp. on Policies for Distributed Systems and Networks (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Swiss National Supercomputing Centre, Switzerland
Sadaf R. Alam & Gilles Fourestey
CEA, Grenoble, France
Brice Videau & Luigi Genovese
University of Basel, Switzerland
Stefan Goedecker & Nazim Dugan

Authors

Sadaf R. Alam
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Fourestey
View author publications
You can also search for this author in PubMed Google Scholar
Brice Videau
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Genovese
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Goedecker
View author publications
You can also search for this author in PubMed Google Scholar
Nazim Dugan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Houston, Houston, TX, USA
Barbara M. Chapman
CASPUR, Via dei Tizii, 6, 00185, Rome, Italy
Federico Massaioli & Marco Rorro &
Center for Information Services and High Performance Computing (ZIH), Dresden University of Technology, 01062, Dresden, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alam, S.R., Fourestey, G., Videau, B., Genovese, L., Goedecker, S., Dugan, N. (2012). Overlapping Computations with Communications and I/O Explicitly Using OpenMP Based Heterogeneous Threading Models. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds) OpenMP in a Heterogeneous World. IWOMP 2012. Lecture Notes in Computer Science, vol 7312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30961-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-30961-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30960-1
Online ISBN: 978-3-642-30961-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics