research-article

Software-level scheduling to exploit non-uniformly shared data cache on GPGPU

Authors:
Bo Wu

The College of William and Mary

The College of William and Mary
View Profile

,
Weilin Wang

The College of William and Mary

The College of William and Mary
View Profile

,
Xipeng Shen

The College of William and Mary

The College of William and Mary
View Profile

MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and CorrectnessJune 2013Article No.: 10Pages 1–2https://doi.org/10.1145/2492408.2492421

Published:16 June 2013Publication History

MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness

Pages 1–2

ABSTRACT

Data cache is introduced to GPUs to mitigate the irregular memory access problem. But few studies have investigated how to exploit its full potential. In this work, we consider some important GPU applications that feature data sharing across thread blocks. We show that the sharing is not well exploited because current GPU runtime ignores such a factor when scheduling threads. We then present an application-level transformation to remap thread blocks to data on the fly. With the software-level scheduler, thread blocks with much data sharing are scheduled to share the cache on a streaming multiprocessor (SM). Experiments on four benchmarks show 1.23X speedup on average.

References

W. Jia, K. A. Shaw, and M. Martonosi. Characterizing and improving the use of demand-fetched caches in gpus. In ICS, 2012. Google ScholarDigital Library
B. Wu, E. Zhang, and X. Shen. Enhancing data locality for dynamic simulations through asynchronous data transformations and adaptive control. In PACT, 2011. Google ScholarDigital Library
B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu. In PPoPP, 2013. Google ScholarDigital Library
E. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In ASPLOS, 2011. Google ScholarDigital Library
E. Z. Zhang, Y. Jiang, and X. Shen. Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs? In PPoPP, 2010. Google ScholarDigital Library

Index Terms

Software-level scheduling to exploit non-uniformly shared data cache on GPGPU
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Efficient utilization of GPGPU cache hierarchy
GPGPU-8: Proceedings of the 8th Workshop on General Purpose Processing using GPUs

Recent GPUs are equipped with general-purpose L1 and L2 caches in an attempt to reduce memory bandwidth demand and improve the performance of some irregular GPGPU applications. However, due to the massive multithreading, GPGPU caches suffer from severe ...
Read More
Tag-Split Cache for Efficient GPGPU Cache Utilization
ICS '16: Proceedings of the 2016 International Conference on Supercomputing

Modern GPUs employ cache to improve memory system efficiency. However, large amount of cache space is underutilized due to irregular memory accesses and poor spatial locality which exhibited commonly in GPU applications. Our experiments show that using ...
Read More
Analyzing Memory Access on CPU-GPGPU Shared LLC Architecture
ISPDC '15: Proceedings of the 2015 14th International Symposium on Parallel and Distributed Computing

The data exchange between GPGPUs and CPUs are becoming more and more important nowadays. One trend in industry to alleviate the long latency is to integrate CPUs and GPGPUs on a single chip. In this paper, we analyze the reference interactions between ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
June 2013
60 pages
ISBN:9781450321037
DOI:10.1145/2492408

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPGPU
cache performance
scheduling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate6of20submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 83
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Software-level scheduling to exploit non-uniformly shared data cache on GPGPU

MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient utilization of GPGPU cache hierarchy

Tag-Split Cache for Efficient GPGPU Cache Utilization

Analyzing Memory Access on CPU-GPGPU Shared LLC Architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Software-level scheduling to exploit non-uniformly shared data cache on GPGPU

MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient utilization of GPGPU cache hierarchy

Tag-Split Cache for Efficient GPGPU Cache Utilization

Analyzing Memory Access on CPU-GPGPU Shared LLC Architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media