Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory

Papadopoulos, Ioannis; Thomas, Nathan; Fidel, Adam; Hoxha, Dielli; Amato, Nancy M.; Rauchwerger, Lawrence

doi:10.1007/978-3-319-29778-1_7

Ioannis Papadopoulos¹⁶,
Nathan Thomas¹⁶,
Adam Fidel¹⁶,
Dielli Hoxha¹⁶,
Nancy M. Amato¹⁶ &
…
Lawrence Rauchwerger¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9519))

Included in the following conference series:

Languages and Compilers for Parallel Computing

677 Accesses
1 Citations

Abstract

Nested parallelism is of increasing interest for both expressivity and performance. Many problems are naturally expressed with this divide-and-conquer software design approach. In addition, programmers with target architecture knowledge employ nested parallelism for performance, imposing a hierarchy in the application to increase locality and resource utilization, often at the cost of implementation complexity.

While dynamic applications are a natural fit for the approach, support for nested parallelism in distributed systems is generally limited to well-structured applications engineered with distinct phases of intra-node computation and inter-node communication. This model makes expressing irregular applications difficult and also hurts performance by introducing unnecessary latency and synchronizations. In this paper we describe an approach to asynchronous nested parallelism which provides uniform treatment of nested computation across distributed memory. This approach allows efficient execution while supporting dynamic applications which cannot be mapped onto the machine in the rigid manner of regular applications. We use several graph algorithms as examples to demonstrate our library’s expressivity, flexibility, and performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Accelerating Nested Data Parallelism: Preserving Regularity

Nested Parallelism with Algorithmic Skeletons

Declarative Data Flow in a Graph-Based Distributed Memory Runtime System

Article Open access 26 December 2022

References

The grapph 500 list. (2011). http://www.graph500.org
Baker, C.G., Heroux, M.A.: Tpetra, and the use of generic programming in scientific computing. Sci. Program. 20(2), 115–128 (2012)
Google Scholar
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11, November 2012
Google Scholar
Blelloch, G.: NESL: A Nested Data-Parallel Language. Technical report CMU-CS-93-129, Carnegie Mellon University (1993)
Google Scholar
Blumofe, R.D., et al.: Cilk: An efficient multithreaded runtime system. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programme (PPoPP), vol. 30, pp. 207–216. ACM, New York, July 1995
Google Scholar
Buluç, A., Madduri, K.: Parallel breadth-first search on distributed memory systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 65:1–65:12. ACM, New York (2011)
Google Scholar
Buss, A., et al.: The STAPL pView. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 261–275. Springer, Heidelberg (2011)
Chapter Google Scholar
Buss, A., Harshvardhan, Papadopoulos, I., Pearce, O., Smith, T., Tanase, G., Thomas, N., Xu, X., Bianco, M., Amato, N.M., Rauchwerger, L.: STAPL: Standard template adaptive parallel library. In: Proceedings of Annual Haifa Experimental Systems Conference (SYSTOR), pp. 1–10. ACM, New York (2010)
Google Scholar
Callahan, D., Chamberlain, B.L., Zima, H.P.: The cascade high productivity language. In: The Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, vol. 26, pp. 52–60, Los Alamitos (2004)
Google Scholar
Cappello, F., Etiemble, D.: MPI versus MPI+OpenMp on IBM SP for the NAS benchmarks. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC 2000. IEEE Computer Society, Washington, DC (2000)
Google Scholar
Cavé, V., Zhao, J., Shirako, J., Sarkar, V.: Habanero-Java: The new adventures of old X10. In: Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ 2011, pp. 51–61. ACM, New York (2011)
Google Scholar
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 519–538. ACM Press, New York (2005)
Google Scholar
Chitnis, L., et al.: Finding connected components in map-reduce in logarithmic rounds. In: Proceedings of the 2013 IEEE International Conference on Data Engineering, ICDE 2013, pp. 50–61. IEEE Computer Society, Washington, DC (2013)
Google Scholar
Consortium, U.: UPC Language Specifications V1.2, (2005). http://www.gwu.edu/~upc/publications/LBNL-59208.pdf
Duran, A., Silvera, R., Corbalán, J., Labarta, J.: Runtime adjustment of parallel nested loops. In: Chapman, B.M. (ed.) WOMPAT 2004. LNCS, vol. 3349, pp. 137–147. Springer, Heidelberg (2005)
Chapter Google Scholar
Fatahalian, K., et al.: Sequoia: programming the memory hierarchy. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC 2006. ACM, New York (2006)
Google Scholar
Gonzalez, J.E., et al.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI 2012, pp. 17–30. USENIX Association, Berkeley (2012)
Google Scholar
Harshvardhan, A.F., Amato, N.M., Rauchwerger, L.: The STAPL parallel graph library. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 46–60. Springer, Heidelberg (2013)
Chapter Google Scholar
Hartley, T.D.R., et al.: Improving performance of adaptive component-based dataflow middleware. Parallel Comput. 38(6–7), 289–309 (2012)
Article Google Scholar
Heller, T., et al.: Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers. In: Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale System, ScalA 2013, pp. 1:1–1:8. ACM, New York (2013)
Google Scholar
Steele Jr., G.L., et al.: Fortress (Sun HPCS Language). In: Padua, D.A. (ed.) Encyclopedia of Parallel Computing, pp. 718–735. Springer, Heidelberg (2011)
Google Scholar
Kamil, A., Yelick, K.: Hierarchical computation in the SPMD programming model. In: Caṣcaval, C., Montesinos-Ortego, P. (eds.) LCPC 2013 - Testing. LNCS, vol. 8664, pp. 3–19. Springer, Heidelberg (2014)
Google Scholar
Keßler, C.W.: NestStep: nested parallelism and virtual shared memory for the BSP model. J. Supercomput. 17(3), 245–262 (2000)
Article MATH Google Scholar
Mellor-Crummey, J., et al.: A new vision for coarray Fortran. In: Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, PGAS 2009, pp. 5:1–5:9. ACM, New York (2009)
Google Scholar
MPI forum. MPI: A Message-Passing Interface Standard Version 3.1 (2015). http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
Musser, D., Derge, G., Saini, A.: STL Tutorial and Reference Guide, 2nd edn. Addison-Wesley, Boston (2001)
Google Scholar
OpenMP Architecture Review Board. OpenMP Application Program Interface Specification (2011)
Google Scholar
Page, L., et al.: The pagerank citation ranking: bringing order to the web (1998)
Google Scholar
Papadopoulos, I., et al.: STAPL-RTS: An application driven runtime system. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, Newport Beach/Irvine, CA, USA, pp. 425–434, June 2015
Google Scholar
Pearce, R., Gokhale, M., Amato, N.M.: Scaling techniques for massive scale-free graphs in distributed (external) memory. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, IPDPS 2013, pp. 825–836. IEEE Computer Society, Washington (2013)
Google Scholar
Pearce, R., Gokhale, M., Amato, N.M.: Faster parallel traversal of scale free graphs at extreme scale with vertex delegates. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 549–559. IEEE Press, Piscataway (2014)
Google Scholar
Reinders, J.: Intel Threading Building Blocks. O’Reilly & Associates Inc., Sebastopol (2007)
Google Scholar
Sillero, J., Borrell, G., Jiménez, J., Moser, R.D.: Hybrid OpenMP-MPI turbulent boundary layer code over 32k cores. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 218–227. Springer, Heidelberg (2011)
Chapter Google Scholar
Tanase, G., Buss, A., Fidel, A., Harshvardhan, Papadopoulos, I., Pearce, O., Smith, T., Thomas, N., Xu, X., Mourad, N., Vu, J., Bianco, M., Amato, N.M., Rauchwerger, L.: The STAPL parallel container framework. In: Proceedings of ACM SIGPLAN Symposium Principles and Practice Parallel Programming (PPoPP), San Antonio, pp. 235–246 (2011)
Google Scholar
Thomas, N., et al.: ARMI: a high level communication library for STAPL. Parallel Process. Lett. 16(2), 261–280 (2006)
Article MathSciNet Google Scholar
Zandifar, M., Abdul Jabbar, M., Majidi, A., Keyes, D., Amato, N.M., Rauchwerger, L.: Composing algorithmic skeletons to express high-performance scientific applications. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS 2015, pp. 415–424. ACM, New York (2015)
Google Scholar
Zhao, J., et al.: Isolation for nested task parallelism. In: Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, pp. 571–588. ACM, New York (2013)
Google Scholar
Zheng, Y., et al.: UPC++: A PGAS extension for C++. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1105–1114, May 2014
Google Scholar

Download references

Acknowledgments

This research is supported in part by NSF awards CNS-0551685, CCF-0702765, CCF-0833199, CCF-1439145, CCF-1423111, CCF-0830753, IIS-0916053, IIS-0917266, EFRI–1240483, RI-1217991, by NIH NCI R25 CA090301-11, by DOE awards DE-AC02-06CH11357, DE-NA0002376, B575363, by Samsung, IBM, Intel, and by Award KUS-C1-016-04, made by King Abdullah University of Science and Technology (KAUST). This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Parasol Laboratory, Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
Ioannis Papadopoulos, Nathan Thomas, Adam Fidel, Dielli Hoxha, Nancy M. Amato & Lawrence Rauchwerger

Authors

Ioannis Papadopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Adam Fidel
View author publications
You can also search for this author in PubMed Google Scholar
Dielli Hoxha
View author publications
You can also search for this author in PubMed Google Scholar
Nancy M. Amato
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence Rauchwerger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ioannis Papadopoulos .

Editor information

Editors and Affiliations

North Carolina State University, Raleigh, North Carolina, USA
Xipeng Shen
North Carolina State University, Raleigh, North Carolina, USA
Frank Mueller
North Carolina State University, Raleigh, North Carolina, USA
James Tuck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papadopoulos, I., Thomas, N., Fidel, A., Hoxha, D., Amato, N.M., Rauchwerger, L. (2016). Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory. In: Shen, X., Mueller, F., Tuck, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2015. Lecture Notes in Computer Science(), vol 9519. Springer, Cham. https://doi.org/10.1007/978-3-319-29778-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-29778-1_7
Published: 20 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29777-4
Online ISBN: 978-3-319-29778-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics