Skip to main content
Log in

Unbalanced tree search on a manycore system using the GPI programming model

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

The recent developments in computer architectures progress towards systems with large core count (Manycore) which expose more parallelism to applications. Some applications named irregular and unbalanced applications demand a dynamic and asynchronous load balance implementation to utilize the full performance a Manycore system. For example, the recently established Graph500 benchmark aims at such applications. The UTS benchmark characterizes the performance of such irregular and unbalanced computations with a tree-structured search space that requires continuous dynamic load balancing. GPI is a PGAS API that delivers the full performance of RDMA-enabled networks directly to the application. Its programming model focuses the use of one-sided asynchronous communication, overlapping computation and communication. In this paper we address the dynamic load balancing requirements of unbalanced applications using the GPI programming model. Using the UTS benchmark, we detail the implementation of a work stealing algorithm using GPI and present the performance results. Our performance evaluation shows significant improvements when compared with the optimized MPI version with a maximum performance of 9.5 billion nodes per second on 3072 cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Olivier S, Huan J, Liu J, Prins J, Dinan J, Sadayappan P, Tseng C-W (2006) UTS: an unbalanced tree search benchmark. In: Proc 19th intl workshop on languages and compilers for parallel computing (LCPC), New Orleans, LA, November 2–4, 2006

    Google Scholar 

  2. Machado R, Lojewski C (2009) The Fraunhofer virtual machine: a communication library and runtime system based on the RDMA model. Comput Sci Res Dev 23(3):125–132

    Article  Google Scholar 

  3. Kumar V, Grama AY, Vempaty NR (1994) Scalable load balancing techniques for parallel computers. J Parallel Distrib Comput 22(1):60–79

    Article  Google Scholar 

  4. Devine KD, Boman EG, Heaphy RT, Hendrickson BA, Teresco JD, Faik J, Flaherty JE, Gervasio LG (2005) New challenges in dynamic load balancing. J Appl Numer Math 52(2–3):133–152

    Article  MathSciNet  MATH  Google Scholar 

  5. Devine K, Hendrickson B, Boman E, St John M, Vaughan C (2000) Design of dynamic load-balancing tools for parallel applications. In: Proc of the 14th int conference on supercomputing (ICS ’00). ACM, New York, pp 110–118. doi:10.1145/335231.335242. http://doi.acm.org/10.1145/335231.335242

    Chapter  Google Scholar 

  6. Chakrabarti S, Yelick K (1994) Randomized load-balancing for tree-structured computation. In: IEEE scalable high performance computing conference, pp 666–673

    Chapter  Google Scholar 

  7. Blumofe R, Leiserson C (1994) Scheduling multithreaded computations by work stealing. In: Proc 35th ann symp found comp sci, pp 356–368

    Chapter  Google Scholar 

  8. Frigo M, Leiserson CE, Randall KH (1998) The implementation of the Cilk-5 multithreaded language. In: Proc conference on prog language design and implementation (PLDI), ACM SIGPLAN. ACM, New York, pp 212–223

    Google Scholar 

  9. Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: Proc conference on object oriented prog systems, languages, and applications (OOPSLA), pp 519–538

    Google Scholar 

  10. Cong G, Kodali S, Krishnamoorty S, Lea D, Saraswat V, Wen T (2008) Solving irregular graph problems using adaptive work-stealing. In: Proc 37th int conference on parallel processing (ICPP), Portland, OR, September 2008

    Google Scholar 

  11. Dinan J, Olivier S, Sabin G, Prins J, Sadayappan P, Tseng C-W (2007) Dynamic load balancing of unbalanced computations using message passing. In: Proc of 6th intl workshop on performance modeling, evaluation, and optimization of parallel and distributed systems (PMEO-PDS), pp 1–8

    Google Scholar 

  12. Dinan J, Olivier S, Sabin G, Prins J, Sadayappan P, Tseng C-W (2008) A message passing benchmark for unbalanced applications. J Simul Model Pract Theory 16(9):1177–1189

    Article  Google Scholar 

  13. UPC Consortium (2005) UPC language specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab

  14. Olivier S, Prins J (2008) Scalable dynamic load balancing using UPC. In: Proc of 37th int conference on parallel processing (ICPP-08), Portland, OR, September 2008

    Google Scholar 

  15. Nieplocha J, Carpenter B (1999) ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems. Lecture notes in computer science, vol 1586, pp 533–546

    Google Scholar 

  16. Dinan J, Krishnamoorthy S, Larkins DB, Nieplocha J, Sadayappan P (2009) Scalable work stealing. In: Proc 21st intl conference on supercomputing (SC), Portland, OR, November 14–20, 2009

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Machado.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Machado, R., Lojewski, C., Abreu, S. et al. Unbalanced tree search on a manycore system using the GPI programming model. Comput Sci Res Dev 26, 229–236 (2011). https://doi.org/10.1007/s00450-011-0163-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-011-0163-3

Keywords

Navigation