research-article

Affinity-aware work-stealing for integrated CPU-GPU processors

Authors:
Naila Farooqui

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

,
Rajkishore Barik

Intel Labs

Intel Labs
View Profile

,
Brian T. Lewis

Intel Labs

Intel Labs
View Profile

,
Tatiana Shpeisman

Intel Labs

Intel Labs
View Profile

,
Karsten Schwan

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingFebruary 2016Article No.: 30Pages 1–2https://doi.org/10.1145/2851141.2851194

Published:27 February 2016Publication History

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 1–2

ABSTRACT

Recent integrated CPU-GPU processors like Intel's Broadwell and AMD's Kaveri support hardware CPU-GPU shared virtual memory, atomic operations, and memory coherency. This enables fine-grained CPU-GPU work-stealing, but architectural differences between the CPU and GPU hurt the performance of traditionally-implemented work-stealing on such processors. These architectural differences include different clock frequencies, atomic operation costs, and cache and shared memory latencies. This paper describes a preliminary implementation of our work-stealing scheduler, Libra, which includes techniques to deal with these architectural differences in integrated CPU-GPU processors. Libra's affinity-aware techniques achieve significant performance gains over classically-implemented work-stealing. We show preliminary results using a diverse set of nine regular and irregular workloads running on an Intel Broadwell Core-M processor. Libra currently achieves up to a 2× performance improvement over classical work-stealing, with a 20% average improvement.

References

Intel thread building blocks. URL www.threadbuildingblocks.org.Google Scholar
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23 (2):187--198, 2011. Google ScholarDigital Library
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46(5):720--748, Sept. 1999. ISSN 0004-5411. doi: 10.1145/324133.324234. Google ScholarDigital Library
Y. Guo, J. Zhao, V. Cave, and V. Sarkar. Slaw: A scalable locality-aware adaptive work-stealing scheduler for multi-core systems. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 341--342, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-877-3. doi: 10.1145/1693453.1693504. URL http://doi.acm.org/10.1145/1693453.1693504. Google ScholarDigital Library
S. jai Min, C. Iancu, and K. Yelick. Hierarchical work stealing on manycore clusters. In In Fifth Conference on Partitioned Global Address Space Programming Models, 2011.Google Scholar
R. Kaleem, R. Barik, T. Shpeisman, B. T. Lewis, C. Hu, and K. Pingali. Adaptive heterogeneous scheduling for integrated gpus. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT '14, pages 151--162, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2809-8. doi: 10.1145/2628071.2628088. URL http://doi.acm.org/10.1145/2628071.2628088. Google ScholarDigital Library
J. Lee, M. Samadi, Y. Park, and S. Mahlke. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, PACT, 2013. Google ScholarDigital Library
C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 45--55, NY, USA, 2009. ACM. ISBN 978-1-60558-798-1. doi: 10.1145/1669112.1669121. URL http://doi.acm.org/10.1145/1669112.1669121. Google ScholarDigital Library

Recommendations

Affinity-aware work-stealing for integrated CPU-GPU processors
PPoPP '16

Recent integrated CPU-GPU processors like Intel's Broadwell and AMD's Kaveri support hardware CPU-GPU shared virtual memory, atomic operations, and memory coherency. This enables fine-grained CPU-GPU work-stealing, but architectural differences between ...
Read More
Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures

Architecture designers tend to integrate both CPUs and GPUs on the same chip to deliver energy-efficient designs. It is still an open problem to effectively leverage the advantages of both CPUs and GPUs on integrated architectures. In this work, we port ...
Read More
Optimized HPL for AMD GPU and multi-core CPU usage

The installation of the LOEWE-CSC ( http://csc.uni-frankfurt.de/csc/__ __51 ) supercomputer at the Goethe University in Frankfurt lead to the development of a Linpack which can fully utilize the installed AMD Cypress GPUs. At its core, a fast DGEMM for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2016
420 pages
ISBN:9781450340922
DOI:10.1145/2851141
General Chair:
Rafael Asenjo
University of Málaga, Spain
,
Program Chair:
Tim Harris
Oracle Labs, Cambridge, UK
ACM SIGPLAN Notices Volume 51, Issue 8
PPoPP '16
August 2016
405 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3016078
Editor:
Matthew Fluet
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 February 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate230of1,014submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 466
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Affinity-aware work-stealing for integrated CPU-GPU processors

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

ABSTRACT

References

Cited By

Recommendations

Affinity-aware work-stealing for integrated CPU-GPU processors

Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures

Optimized HPL for AMD GPU and multi-core CPU usage