Integrating Asynchronous Task Parallelism with OpenSHMEM

Grossman, Max; Kumar, Vivek; Budimlić, Zoran; Sarkar, Vivek

doi:10.1007/978-3-319-50995-2_1

Integrating Asynchronous Task Parallelism with OpenSHMEM

Max Grossman¹⁷,
Vivek Kumar¹⁷,
Zoran Budimlić¹⁷ &
…
Vivek Sarkar¹⁷

Conference paper
First Online: 15 December 2016

470 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10007))

Abstract

Partitioned Global Address Space (PGAS) programming models combine shared and distributed memory features, and provide a foundation for high-productivity parallel programming using lightweight one-sided communications. The OpenSHMEM programming interface has recently begun gaining popularity as a lightweight library-based approach for developing PGAS applications, in part through its use of a symmetric heap to realize more efficient implementations of global pointers than in other PGAS systems. However, current approaches to hybrid inter-node and intra-node parallel programming in OpenSHMEM rely on the use of multithreaded programming models (e.g., pthreads, OpenMP) that harness intra-node parallelism but are opaque to the OpenSHMEM runtime. This OpenSHMEM+X approach can encounter performance challenges such as bottlenecks on shared resources, long pause times due to load imbalances, and poor data locality. Furthermore, OpenSHMEM+X requires the expertise of hero-level programmers, compared to the use of just OpenSHMEM. All of these are hard challenges to mitigate with incremental changes. This situation will worsen as computing nodes increase their use of accelerators and heterogeneous memories.

In this paper, we introduce the AsyncSHMEM PGAS library which supports a tighter integration of shared and distributed memory parallelism than past OpenSHMEM implementations. AsyncSHMEM integrates the existing OpenSHMEM reference implementation with a thread-pool-based, intra-node, work-stealing runtime. It aims to prepare OpenSHMEM for future generations of HPC systems by enabling the use of asynchronous computation to hide data transfer latencies, supporting tight interoperability of OpenSHMEM with task parallel programming, improving load balance (both of communication and computation), and enhancing locality. In this paper we present the design of AsyncSHMEM, and demonstrate the performance of our initial AsyncSHMEM implementation by performing a scalability analysis of two benchmarks on the Titan supercomputer. These early results are promising, and demonstrate that AsyncSHMEM is more programmable than the OpenSHMEM+OpenMP model, while delivering comparable performance for a regular benchmark (ISx) and superior performance for an irregular benchmark (UTS).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

OpenSHMEM context extension proposal draft. https://github.com/jdinan/openshmem-contexts
OpenSHMEM Redmine Issue #218 - Thread Safety Proposal. http://www.openshmem.org/redmine/issues/218
Thread-safe SHMEM Extensions. http://www.csm.ornl.gov/workshops/openshmem2014/documents/Thred-safeSHMEM_Extensions.pdf
Bhatele, A., Mohror, K., Langer, S.H., Isaacs, K.E.: There goes the neighborhood: performance degradation due to nearby jobs. In: SC, pp. 41:1–41:12. ACM (2013)
Google Scholar
Cavé, V., Zhao, J., Shirako, J., Sarkar, V.: Habanero-Java: the new adventures of old X10. In: PPPJ 2011: Proceedings of the 9th International Conference on the Principles and Practice of Programming in Java (2011)
Google Scholar
Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)
Article Google Scholar
Chapman, B., Curtis, T., Pophale, S., Poole, S., Kuehn, J., Koelbel, C., Smith, L.: Introducing OpenSHMEM: SHMEM for the PGAS community. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, p. 2. ACM (2010)
Google Scholar
Chatterjee, S.: Integrating asynchronous task parallelism with MPI. In: IPDPS 2013: Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing. IEEE Computer Society (2013)
Google Scholar
Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Ebcioglu, K., Saraswat, V., Sarkar, V.: X10: an experimental language for high productivity programming of scalable systems. In: Proceedings of the Second Workshop on Productivity and Performance in High-End Computing, pp. 45–52. Citeseer (2005)
Google Scholar
El-Ghazawi, T., Smith, L.: UPC: unified parallel C. In: SC (2006)
Google Scholar
Frigo, M.: Multithreaded programming in Cilk. In: PASCO 2007, pp. 13–14 (2007)
Google Scholar
Grossman, M., Shirako, J., Sarkar, V.: OpenMP as a high-level specification language for parallelism. In: IWOMP 2016 (2016)
Google Scholar
Hanebutte, U., Hemstad, J.: ISx: a scalable integer sort for co-design in the exascale era. In: 2015 9th International Conference on Partitioned Global Address Space Programming Models (PGAS), pp. 102–104, September 2015
Google Scholar
Kessler, R.E., Schwarzmeier, J.L.: Cray T3D: a new dimension for Cray research. In: COMPCON Spring 1993, Digest of Papers, pp. 176–182. IEEE (1993)
Google Scholar
Kowalke, O.: Boost C++ Libraries. https://olk.github.io/libs/fiber/doc/html/
Kumar, V., Zheng, Y., Cavé, V., Budimlić, Z., Sarkar, V.: HabaneroUPC++: a compiler-free PGAS library. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, pp. 5:1–5:10. ACM, New York (2014). http://doi.acm.org/10.1145/2676870.2676879
Numrich, R.W., Reid, J.: Co-array Fortran for parallel programming. SIGPLAN Fortran Forum 17(2), 1–31 (1998)
Article Google Scholar
Olivier, S., Huan, J., Liu, J., Prins, J., Dinan, J., Sadayappan, P., Tseng, C.-W.: UTS: an unbalanced tree search benchmark. In: Almási, G., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 235–250. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72521-3_18
Chapter Google Scholar
PGAS: Partitioned Global Address Space (2011). http://www.pgas.org/
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O’Reilly Media, Inc., Sebastopol (2010)
Google Scholar
Habanero-C Overview. Rice University (2013) https://wiki.rice.edu/confluence/display/HABANERO/Habanero-C
Snir, M., Otto, S.W., Walker, D.W., Dongarra, J., Huss-Lederman, S.: MPI: The Complete Reference. MIT Press, Cambridge (1995)
Google Scholar
Yan, Y., Zhao, J., Guo, Y., Sarkar, V.: Hierarchical place trees: a portable abstraction for task parallelism and data movement. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 172–187. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13374-9_12
Chapter Google Scholar
Yelick, K. et al.: Productivity and performance using partitioned global address space languages. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, PASCO 2007, pp. 24–32. ACM (2007)
Google Scholar
Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., Aiken, A.: Titanium: a high-performance Java dialect. In: ACM, pp. 10–11 (1998)
Google Scholar
Zheng, Y., Kamil, A., Driscoll, M.B., Shan, H., Yelick, K.: UPC++: a PGAS extension for C++. In: 2014 IEEE 28th International Conference on Parallel and Distributed Processing Symposium, pp. 1105–1114. IEEE (2014)
Google Scholar

Download references

Acknowledgments

This research was funded in part by the United States Department of Defense, and was supported by resources at Los Alamos National Laboratory.

Author information

Authors and Affiliations

Rice University, Houston, USA
Max Grossman, Vivek Kumar, Zoran Budimlić & Vivek Sarkar

Authors

Max Grossman
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Zoran Budimlić
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Max Grossman .

Editor information

Editors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Manjunath Gorentla Venkata
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Neena Imam
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Swaroop Pophale
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Tiffany M. Mintz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grossman, M., Kumar, V., Budimlić, Z., Sarkar, V. (2016). Integrating Asynchronous Task Parallelism with OpenSHMEM. In: Gorentla Venkata, M., Imam, N., Pophale, S., Mintz, T. (eds) OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments. OpenSHMEM 2016. Lecture Notes in Computer Science(), vol 10007. Springer, Cham. https://doi.org/10.1007/978-3-319-50995-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-50995-2_1
Published: 15 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50994-5
Online ISBN: 978-3-319-50995-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics