research-article

Locality-aware task management for unstructured parallelism: a quantitative limit study

Authors:

Richard M. Yoo,

Christopher J. Hughes,

Changkyu Kim,

Yen-Kuang Chen,

Christos KozyrakisAuthors Info & Claims

SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

Pages 315 - 325

https://doi.org/10.1145/2486159.2486175

Published: 23 July 2013 Publication History

Get Access

Abstract

As we increase the number of cores on a processor die, the on-chip cache hierarchies that support these cores are getting larger, deeper, and more complex. As a result, non-uniform memory access effects are now prevalent even on a single chip. To reduce execution time and energy consumption, data access locality should be exploited. This is especially important for task-based programming systems, where a scheduler decides when and where on the chip the code segments, i.e., tasks, should execute. Capturing locality for structured task parallelism has been done effectively, but the more difficult case, unstructured parallelism, remains largely unsolved - little quantitative analysis exists to demonstrate the potential of locality-aware scheduling, and to guide future scheduler implementations in the most fruitful direction.

This paper quantifies the potential of locality-aware scheduling for unstructured parallelism on three different many-core processors. Our simulation results of 32-core systems show that locality-aware scheduling can bring up to 2.39x speedup over a randomized schedule, and 2.05x speedup over a state-of-the-art baseline scheduling scheme. At the same time, a locality-aware schedule reduces average energy consumption by 55% and 47%, relative to the random and the baseline schedule, respectively. In addition, our 1024-core simulation results project that these benefits will only increase: Compared to 32-core executions, we see up to 1.83x additional locality benefits. To capture such potentials in a practical setting, we also perform a detailed scheduler design space exploration to quantify the impact of different scheduling decisions. We also highlight the importance of locality-aware stealing, and demonstrate that a stealing scheme can exploit significant locality while performing load balancing. Over randomized stealing, our proposed scheme shows up to 2.0x speedup for stolen tasks.

References

[1]

Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. The data locality of work stealing. In Proc. of the 12th SPAA, pages 1--12, 2000.

Abstract

References

Cited By

Index Terms

Recommendations

Locality-Driven Dynamic GPU Cache Bypassing

Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore Architectures

Coordinated Multilevel Buffer Cache Management with Consistent Access Locality Quantification

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations