skip to main content
10.1145/3075564.3075567acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Work Stealing in a Shared Virtual-Memory Heterogeneous Environment: A Case Study with Betweenness Centrality

Published:15 May 2017Publication History

ABSTRACT

This paper uses betweenness centrality as a case study to research efficient work stealing in a heterogeneous system environment. Betweenness centrality is an important algorithm in graph processing. It presents multiple-level parallelism and is an interesting problem to exploit various optimizations. We investigate queue-based work stealing to distribute its tasks across GPU compute units (CUs) and across the CPU and the GPU, which has not been done by prior work. In particular, we demonstrate how to leverage the new platform-atomic operations on AMD Accelerated Processing Units (APUs) to operate cross-device queues in a lock-free manner in shared virtual memory. To make the work stealing runtime and the application more efficient, we apply new architectural features, including atomic operations with different memory scopes and or-derings for different synchronization scenarios. We implement our solution using heterogeneous system architecture (HSA). Our results show that betweenness centrality with CPU-GPU work stealing achieves an average of 15% (up to 30%) performance improvement over GPU-only execution for diverse graph inputs. Our work stealing solution can be applied widely to other applications too. Finally, we analyze important parameters critical for queuing and stealing.

References

  1. The 10th DIMACS Implementation Challenge Graph Partitioning and Graph Clustering. Web resource. http://www.cc.gatech.edu/dimacs10/.Google ScholarGoogle Scholar
  2. C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23(2), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Benkner, S. Pllana, J.L. Traff, P. Tsigas, U. Dolinsky, C. Augonnet, B. Bachmayer, C. Kessler, D. Moloney, and V. Osipov. Peppher: Efficient and productive usage of hybrid computing systems. IEEE Micro, 31(5), Sept 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Bernaschi, G. Carbone, and F. Vella. Scalable betweenness centrality on multi-GPU systems. In Proceedings of the ACM International Conference on Computing Frontiers, May 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Aug 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Boyer, K. Skadron, S. Che, and N. Jayasena. Load balancing in a changing world: Dealing with heterogeneity and performance variability. In Proceedings of the ACM International Conference on Computing Frontiers, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25:163--177, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Chatterjee, M. Grossman, A. S. Sbirlea, and V. Sarkar. Dynamic task parallelism with a GPU work-stealing runtime system. Languages and Compilers for Parallel Computing, pages 203--217, 2011.Google ScholarGoogle Scholar
  10. CL Offline Compiler and SNACK. Web resource. https://github.com/HSAFoundation/CLOC.Google ScholarGoogle Scholar
  11. Graph input for interacting proteins. Web resource. http://www.sommer.jp/graphs/.Google ScholarGoogle Scholar
  12. Heterogeneous System Architecture: A Technical Review. Web resource. http://developer.amd.com/wordpress/media/2012/10/hsa10.pdf.Google ScholarGoogle Scholar
  13. Heterogeneous System Architecture (HSA). Web resource. http://hsafoundation.com/.Google ScholarGoogle Scholar
  14. Y. Jia, V. Lu, J. Hoberock, M. Garland, and J. C. Hart. Edge v. node parallelism for graph centrality metrics. GPU Computing Gems, 2:15--30, 2011.Google ScholarGoogle Scholar
  15. D. Kaeli, P. Mistry, D. Schaa, and D. P. Zhang. Heterogeneous Computing with OpenCL 2.0. Morgan Kaufmann, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Kepner and J. Gilbert. Graph Algorithms in the Language of Linear Algebra. Society for Industrial and Applied Mathematics, January 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. M. Lê, A. Pop, A. Cohen, and F. Zappa Nardelli. Correct and efficient work-stealing for weak memory models. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C.-K. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In MICRO-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. McLaughlin and D. Bader. Scalable and high performance betweenness centrality on the gpu. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Nov 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Mukherjee, Y. Sun, P. Blinzer, A. K. Ziabari, and D. R. Kaeli. A comprehensive performance analysis of HSA and opencl 2.0. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software, April 2016.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Nasre, M. Burtscher, and K. Pingali. Data-driven versus topology-driven irregular computations on gpus. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. GTGraph: A Suite of Synthetic Random Graph Generators. Web resource. http://www.cse.psu.edu/~madduri/software/GTgraph/index.html.Google ScholarGoogle Scholar
  23. OpenCL. Web resource. http://www.khronos.org/opencl/.Google ScholarGoogle Scholar
  24. A. E. Sariyuce, K. Kaya, E. Saule, and U. V. Catalyurek. Betweenness centrality on GPUs and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, Mar 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Shi and B. Zhang. Fast network centrality analysis using GPUs. BMC Bioinformatics, 12(149), 2011.Google ScholarGoogle Scholar
  26. The University of Florida Sparse Matrix Collection. Web resource. http://www.cise.ufl.edu/research/sparse/matrices/.Google ScholarGoogle Scholar
  27. P. Tsigas and D. Cedermann. GPU Computing Gems Jade Edition, chapter Dynamic Load Balancing Using Work-Stealing. Morgan Kaufmann, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Tzeng, A. Patney, and J. D. Owens. Task management for irregular-parallel workloads on the GPU. In Proceedings of High Performance Graphics, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Wang, A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st Symposium on Principles and Practice of Parallel Programming, Mar 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Work Stealing in a Shared Virtual-Memory Heterogeneous Environment: A Case Study with Betweenness Centrality

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CF'17: Proceedings of the Computing Frontiers Conference
          May 2017
          450 pages
          ISBN:9781450344876
          DOI:10.1145/3075564

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 May 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          CF'17 Paper Acceptance Rate43of87submissions,49%Overall Acceptance Rate240of680submissions,35%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader