Abstract
The recent developments in computer architectures progress towards systems with large core count (Manycore) which expose more parallelism to applications. Some applications named irregular and unbalanced applications demand a dynamic and asynchronous load balance implementation to utilize the full performance a Manycore system. For example, the recently established Graph500 benchmark aims at such applications. The UTS benchmark characterizes the performance of such irregular and unbalanced computations with a tree-structured search space that requires continuous dynamic load balancing. GPI is a PGAS API that delivers the full performance of RDMA-enabled networks directly to the application. Its programming model focuses the use of one-sided asynchronous communication, overlapping computation and communication. In this paper we address the dynamic load balancing requirements of unbalanced applications using the GPI programming model. Using the UTS benchmark, we detail the implementation of a work stealing algorithm using GPI and present the performance results. Our performance evaluation shows significant improvements when compared with the optimized MPI version with a maximum performance of 9.5 billion nodes per second on 3072 cores.
Similar content being viewed by others
References
Olivier S, Huan J, Liu J, Prins J, Dinan J, Sadayappan P, Tseng C-W (2006) UTS: an unbalanced tree search benchmark. In: Proc 19th intl workshop on languages and compilers for parallel computing (LCPC), New Orleans, LA, November 2–4, 2006
Machado R, Lojewski C (2009) The Fraunhofer virtual machine: a communication library and runtime system based on the RDMA model. Comput Sci Res Dev 23(3):125–132
Kumar V, Grama AY, Vempaty NR (1994) Scalable load balancing techniques for parallel computers. J Parallel Distrib Comput 22(1):60–79
Devine KD, Boman EG, Heaphy RT, Hendrickson BA, Teresco JD, Faik J, Flaherty JE, Gervasio LG (2005) New challenges in dynamic load balancing. J Appl Numer Math 52(2–3):133–152
Devine K, Hendrickson B, Boman E, St John M, Vaughan C (2000) Design of dynamic load-balancing tools for parallel applications. In: Proc of the 14th int conference on supercomputing (ICS ’00). ACM, New York, pp 110–118. doi:10.1145/335231.335242. http://doi.acm.org/10.1145/335231.335242
Chakrabarti S, Yelick K (1994) Randomized load-balancing for tree-structured computation. In: IEEE scalable high performance computing conference, pp 666–673
Blumofe R, Leiserson C (1994) Scheduling multithreaded computations by work stealing. In: Proc 35th ann symp found comp sci, pp 356–368
Frigo M, Leiserson CE, Randall KH (1998) The implementation of the Cilk-5 multithreaded language. In: Proc conference on prog language design and implementation (PLDI), ACM SIGPLAN. ACM, New York, pp 212–223
Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: Proc conference on object oriented prog systems, languages, and applications (OOPSLA), pp 519–538
Cong G, Kodali S, Krishnamoorty S, Lea D, Saraswat V, Wen T (2008) Solving irregular graph problems using adaptive work-stealing. In: Proc 37th int conference on parallel processing (ICPP), Portland, OR, September 2008
Dinan J, Olivier S, Sabin G, Prins J, Sadayappan P, Tseng C-W (2007) Dynamic load balancing of unbalanced computations using message passing. In: Proc of 6th intl workshop on performance modeling, evaluation, and optimization of parallel and distributed systems (PMEO-PDS), pp 1–8
Dinan J, Olivier S, Sabin G, Prins J, Sadayappan P, Tseng C-W (2008) A message passing benchmark for unbalanced applications. J Simul Model Pract Theory 16(9):1177–1189
UPC Consortium (2005) UPC language specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab
Olivier S, Prins J (2008) Scalable dynamic load balancing using UPC. In: Proc of 37th int conference on parallel processing (ICPP-08), Portland, OR, September 2008
Nieplocha J, Carpenter B (1999) ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems. Lecture notes in computer science, vol 1586, pp 533–546
Dinan J, Krishnamoorthy S, Larkins DB, Nieplocha J, Sadayappan P (2009) Scalable work stealing. In: Proc 21st intl conference on supercomputing (SC), Portland, OR, November 14–20, 2009
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Machado, R., Lojewski, C., Abreu, S. et al. Unbalanced tree search on a manycore system using the GPI programming model. Comput Sci Res Dev 26, 229–236 (2011). https://doi.org/10.1007/s00450-011-0163-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-011-0163-3