Skip to main content
Log in

Analysis and performance results of computing betweenness centrality on IBM Cyclops64

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper presents a joint study of application and architecture to improve the performance and scalability of an irregular application—computing betweenness centrality—on a many-core architecture IBM Cyclops64. The characteristics of unstructured parallelism, dynamically non-contiguous memory access, and low arithmetic intensity in betweenness centrality pose an obstacle to an efficient mapping of parallel algorithms on such many-core architectures. By identifying several key architectural features, we propose and evaluate efficient strategies for achieving scalability on a massive multi-threading many-core architecture. We demonstrate several optimization strategies including multi-grain parallelism, just-in-time locality with explicit memory hierarchy and non-preemptive thread execution, and fine-grain data synchronization. Comparing with a conventional parallel algorithm, we get 4X-50X improvement in performance and 16X improvement in scalability on a 128-cores IBM Cyclops64 simulator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alderson D, Doyle JC, Li L, Willinger W (2005) Towards a theory of scale-free graphs: definition, properties, and implications. Internet Math 2(4):431–523

    Article  MATH  MathSciNet  Google Scholar 

  2. Bader DA (2006) Hpcs scalable synthetic compact applications 2 graph analysis. www.highproductivity.org/SSCABmks.htm

  3. Bader DA, Madduri K (2006) Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray mta-2. In: The 35th international conference on parallel processing (ICPP 2006)

  4. Bader DA, Madduri K (2006) Parallel algorithms for evaluating centrality indices in real-world networks. In: The 35th international conference on parallel processing (ICPP 2006)

  5. Brandes U (2001) A faster algorithm for betweenness centrality. J Math Social 25(2):163–177

    Article  MATH  Google Scholar 

  6. Chilimbi TM, Hirzel M (2002) Dynamic hot data stream prefetching for general-purpose programs. In: PLDI ’02: Proceedings of the ACM SIGPLAN 2002 conference on programming language design and implementation, New York, NY, USA, 2002. ACM Press, New York, pp 199–209

    Chapter  Google Scholar 

  7. Collins JD, Tullsen DM, Wang H, Shen JP (2001) Dynamic speculative precomputation. In: The 34th annual international symposium on microarchitecture

  8. Collins JD, Wang H, Tullsen DM, Hughes C, Lavery D, Shen JP (2001) Speculative precomputation: long-range prefetching of delinquent loads. In: The 28th international symposium on computer architecture

  9. del Cuvillo J, Zhu W, Gao GR (2005) Landing openmp on cyclops-64: an efficient mapping of openmp to a many-core system-on-a-chip. In: The 3rd ACM international conference on computing frontiers, Ischia, Italy

  10. del Cuvillo J, Zhu W, Hu Z, Gao GR (2005) Fast: a functionally accurate simulation toolset for the cyclops-64 cellular architecture. In: Workshop on modeling, benchmarking and simulation (MoBS), held in conjunction with the annual international symposium on computer architecture (ISCA’05)

  11. del Cuvillo J, Zhu W, Hu Z, Gao GR (2005) Tiny threads: a thread virtual machine for the cyclops-64 cellular architecture. In: Fifth workshop on massively parallel processing (WMPP), held in conjunction with the 19th international parallel and distributed processing system

  12. Denneau M, Warren HS Jr (2005) 64-bit Cyclops: principles of operation. April 2005

  13. Erez M, Ahn JH, Gummaraju J, Rosenblum M, Dally WJ (2007) Executing irregular scientific applications on stream architectures. In: ICS ’07: Proceedings of the 21st annual international conference on supercomputing, New York, NY, USA, 2007. ACM Press, New York, pp 93–104

    Chapter  Google Scholar 

  14. Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41

    Article  Google Scholar 

  15. Ganusov I, Burtscher M (2005) Future execution: a hardware prefetching technique for chip multiprocessors. In: 2005 International conference on parallel architectures and compilation techniques, pp 350–360

  16. Ganusov I, Burtscher M (2006) Efficient emulation of hardware prefetchers via event-driven helper threading. In: 2006 International conference on parallel architectures and compilation techniques, pp 144–153

  17. Gao GR, Likharev KK, Messina PC, Sterling TL (1996) Hybrid technology multi-threaded architecture. In: Proceedings of frontiers ’96: the sixth symposium on the frontiers of massively parallel computation, pp 98–105

  18. Gao G, Nelson Amaral J, Marquez A, Theobald K (1998) A refinement of the “htmt” program execution model. Technical report, CAPSL, University of Delaware, 1998

  19. García Quinones C, Madriles C, Sánchez J, Marcuello P, González A, Tullsen DM (2005) Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In: PLDI ’05: Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation, pp 269–279

  20. Gordon M, Thies W, Amarasinghe S (2006) Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: International conference on architectural support for programming languages and operating systems, San Jose, CA, October 2006

  21. Herlihy M (1991) Wait-free synchronization. ACM Trans Program Lang Syst 11(1):124–149

    Article  Google Scholar 

  22. Lin Y, Padua D (2000) Compiler analysis of irregular memory accesses. In: PLDI ’00: Proceedings of the ACM SIGPLAN 2000 conference on programming language design and implementation, New York, NY, USA, 2000. ACM Press, New York, pp 157–168

    Chapter  Google Scholar 

  23. Lu J, Das A, Hsu W-C, Nguyen K, Abraham SG (2005) Dynamic helper threaded prefetching on the sun ultrasparc cmp processor. In: MICRO 38: Proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture, Washington, DC, USA, 2005. IEEE Computer Society, Los Alamitos, pp 93–104

    Google Scholar 

  24. Luk C-K, Mowry TC (1999) Automatic compiler-inserted prefetching for pointer-based applications. IEEE Trans Comput 48(2)

  25. Mellor-Crummey JM, Scott ML (1991) Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans Comput Syst 9:1

    Article  Google Scholar 

  26. Mowry T, Gupta A (1991) Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J Parallel Distrib Comput 12(2):87–106

    Article  Google Scholar 

  27. Ponnusamy R, Saltz J, Choudhary A (1993) Runtime-compilation techniques for data partitioning and communication schedule reuse. In: Supercomputing’93

  28. Rauchwerger L, Zhan Y, Torrellas J (1998) Hardware for speculative run-time parallelization in distributed shared memory multiprocessors. In: Proceedings of the 4th international symposium on high-performance computer architecture, p 162

  29. Sharma S, Ponnusamy R, Moon B, Hwang Y, Das R, Saltz J (1994) Run-time and compile-time support for adaptive irregular problems. In: Supercomputing’94

  30. Steffan JG, Colohan CB, Zhai A, Mowry TC (2000) A scalable approach to thread-level speculation. In: Proceedings of the 27th annual international symposium on computer architecture

  31. Tan G, Tu D (2009) Characterizing betweenness centrality algorithm on multi-core architectures. In: The 2009 IEEE international symposium on parallel and distributed processing with applications (ISPA’09)

  32. Tan G, Sreedhar VC, Gao GR (2008) Just-in-time locality and percolation for optimizing irregular applications on a manycore architecture. In: 21st Annual languages and compilers for parallel computing workshop

  33. Wu Y (2002) Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching. In: PLDI ’02: Proceedings of the ACM SIGPLAN 2002 conference on programming language design and implementation, New York, NY, USA, 2002. ACM Press, New York, pp 210–221

    Chapter  Google Scholar 

  34. Zhang Z, Torrellas J (1995) Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group, prefetching. In: 22nd International symposium on computer architecture

  35. Zhang W, Tullsen DM (2007) Accelerating and adapting precomputation threads for efficient prefetching. In: 3th International symposium on high performance computer architecture

  36. Zhu W, Sreedhar VC, Hu Z, Gao GR (2007) Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures. In: The 34th international symposium on computer architecture

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangming Tan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, G., Sreedhar, V.C. & Gao, G.R. Analysis and performance results of computing betweenness centrality on IBM Cyclops64. J Supercomput 56, 1–24 (2011). https://doi.org/10.1007/s11227-009-0339-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-009-0339-9

Navigation