Unbalanced tree search on a manycore system using the GPI programming model

Machado, Rui; Lojewski, Carsten; Abreu, Salvador; Pfreundt, Franz-Josef

doi:10.1007/s00450-011-0163-3

Unbalanced tree search on a manycore system using the GPI programming model

Special Issue Paper
Published: 08 April 2011

Volume 26, pages 229–236, (2011)
Cite this article

Computer Science - Research and Development

Rui Machado¹,
Carsten Lojewski¹,
Salvador Abreu² &
…
Franz-Josef Pfreundt¹

151 Accesses
11 Citations
Explore all metrics

Abstract

The recent developments in computer architectures progress towards systems with large core count (Manycore) which expose more parallelism to applications. Some applications named irregular and unbalanced applications demand a dynamic and asynchronous load balance implementation to utilize the full performance a Manycore system. For example, the recently established Graph500 benchmark aims at such applications. The UTS benchmark characterizes the performance of such irregular and unbalanced computations with a tree-structured search space that requires continuous dynamic load balancing. GPI is a PGAS API that delivers the full performance of RDMA-enabled networks directly to the application. Its programming model focuses the use of one-sided asynchronous communication, overlapping computation and communication. In this paper we address the dynamic load balancing requirements of unbalanced applications using the GPI programming model. Using the UTS benchmark, we detail the implementation of a work stealing algorithm using GPI and present the performance results. Our performance evaluation shows significant improvements when compared with the optimized MPI version with a maximum performance of 9.5 billion nodes per second on 3072 cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Load Balancing Techniques for Graph Traversal Applications on GPUs

Accelerating Direction-Optimized Breadth First Search on Hybrid Architectures

Load-Balanced Breadth-First Search on GPUs

References

Olivier S, Huan J, Liu J, Prins J, Dinan J, Sadayappan P, Tseng C-W (2006) UTS: an unbalanced tree search benchmark. In: Proc 19th intl workshop on languages and compilers for parallel computing (LCPC), New Orleans, LA, November 2–4, 2006
Google Scholar
Machado R, Lojewski C (2009) The Fraunhofer virtual machine: a communication library and runtime system based on the RDMA model. Comput Sci Res Dev 23(3):125–132
Article Google Scholar
Kumar V, Grama AY, Vempaty NR (1994) Scalable load balancing techniques for parallel computers. J Parallel Distrib Comput 22(1):60–79
Article Google Scholar
Devine KD, Boman EG, Heaphy RT, Hendrickson BA, Teresco JD, Faik J, Flaherty JE, Gervasio LG (2005) New challenges in dynamic load balancing. J Appl Numer Math 52(2–3):133–152
Article MathSciNet MATH Google Scholar
Devine K, Hendrickson B, Boman E, St John M, Vaughan C (2000) Design of dynamic load-balancing tools for parallel applications. In: Proc of the 14th int conference on supercomputing (ICS ’00). ACM, New York, pp 110–118. doi:10.1145/335231.335242. http://doi.acm.org/10.1145/335231.335242
Chapter Google Scholar
Chakrabarti S, Yelick K (1994) Randomized load-balancing for tree-structured computation. In: IEEE scalable high performance computing conference, pp 666–673
Chapter Google Scholar
Blumofe R, Leiserson C (1994) Scheduling multithreaded computations by work stealing. In: Proc 35th ann symp found comp sci, pp 356–368
Chapter Google Scholar
Frigo M, Leiserson CE, Randall KH (1998) The implementation of the Cilk-5 multithreaded language. In: Proc conference on prog language design and implementation (PLDI), ACM SIGPLAN. ACM, New York, pp 212–223
Google Scholar
Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: Proc conference on object oriented prog systems, languages, and applications (OOPSLA), pp 519–538
Google Scholar
Cong G, Kodali S, Krishnamoorty S, Lea D, Saraswat V, Wen T (2008) Solving irregular graph problems using adaptive work-stealing. In: Proc 37th int conference on parallel processing (ICPP), Portland, OR, September 2008
Google Scholar
Dinan J, Olivier S, Sabin G, Prins J, Sadayappan P, Tseng C-W (2007) Dynamic load balancing of unbalanced computations using message passing. In: Proc of 6th intl workshop on performance modeling, evaluation, and optimization of parallel and distributed systems (PMEO-PDS), pp 1–8
Google Scholar
Dinan J, Olivier S, Sabin G, Prins J, Sadayappan P, Tseng C-W (2008) A message passing benchmark for unbalanced applications. J Simul Model Pract Theory 16(9):1177–1189
Article Google Scholar
UPC Consortium (2005) UPC language specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab
Olivier S, Prins J (2008) Scalable dynamic load balancing using UPC. In: Proc of 37th int conference on parallel processing (ICPP-08), Portland, OR, September 2008
Google Scholar
Nieplocha J, Carpenter B (1999) ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems. Lecture notes in computer science, vol 1586, pp 533–546
Google Scholar
Dinan J, Krishnamoorthy S, Larkins DB, Nieplocha J, Sadayappan P (2009) Scalable work stealing. In: Proc 21st intl conference on supercomputing (SC), Portland, OR, November 14–20, 2009
Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer Institut Techno-und Wirtschaftsmathematik, Competence Center for High Performance Computing, Kaiserslautern, Germany
Rui Machado, Carsten Lojewski & Franz-Josef Pfreundt
University of Evora, Evora, Portugal
Salvador Abreu

Authors

Rui Machado
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Lojewski
View author publications
You can also search for this author in PubMed Google Scholar
Salvador Abreu
View author publications
You can also search for this author in PubMed Google Scholar
Franz-Josef Pfreundt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Machado.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Machado, R., Lojewski, C., Abreu, S. et al. Unbalanced tree search on a manycore system using the GPI programming model. Comput Sci Res Dev 26, 229–236 (2011). https://doi.org/10.1007/s00450-011-0163-3

Download citation

Published: 08 April 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s00450-011-0163-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unbalanced tree search on a manycore system using the GPI programming model

Abstract

Access this article

Similar content being viewed by others

Efficient Load Balancing Techniques for Graph Traversal Applications on GPUs

Accelerating Direction-Optimized Breadth First Search on Hybrid Architectures

Load-Balanced Breadth-First Search on GPUs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unbalanced tree search on a manycore system using the GPI programming model

Abstract

Access this article

Similar content being viewed by others

Efficient Load Balancing Techniques for Graph Traversal Applications on GPUs

Accelerating Direction-Optimized Breadth First Search on Hybrid Architectures

Load-Balanced Breadth-First Search on GPUs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation