skip to main content
10.1145/3061639.3062320acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Statistical Pattern Based Modeling of GPU Memory Access Streams

Published: 18 June 2017 Publication History

Abstract

Recent research studies have shown that modern GPU performance is often limited by the memory system performance. Optimizing memory hierarchy performance requires GPU designers to draw design insights based on the cache & memory behavior of end-user applications. Unfortunately, it is often difficult to get access to end-user workloads due to the confidential or proprietary nature of the software/data. Furthermore, the efficiency of early design space exploration of cache & memory systems is often limited due to either the slow speed of detailed simulation techniques or limited scope of state-of-the-art cache analytical models.
To enable efficient GPU memory system exploration, we present a novel methodology and framework that statistically models the GPU memory access stream locality. The proposed G-MAP (GPU Memory Access Proxy) framework models the regularity in code-localized memory access patterns of GPGPU applications and the parallelism in GPU's execution model to create miniaturized memory proxies. We evaluate G-MAP using 18 GPGPU benchmarks and show that G-MAP proxies can replicate cache/memory performance of original applications with over 90% accuracy across over 5000 different L1/L2 cache, prefetcher and memory configurations.

References

[1]
NVIDIA's next generation CUDA compute architecture, Fermi, 2009.
[2]
Nvidia. CUDA c/c++ sdk code samples, 2011.
[3]
A. Awad and Y. Solihin. Stm: Cloning the spatial and temporal memory access behavior. HPCA, pages 237--247, 2014.
[4]
A. Bakhoda et al. Analyzing CUDA workloads using a detailed GPU simulator, In ISPASS, pages 163--174. IEEE Computer Society, 2009.
[5]
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44--54, 2009.
[6]
E. Deniz and A. Sen. Minime-gpu: Multicore benchmark synthesizer for gpus, ACM Trans. Archit. Code Optim., 12(4):34:l--34:25, Nov. 2015.
[7]
K. Ganesan et al. Synthesizing memory-level parallelism aware miniature clones for spec cpu2006 and implantbench workloads. ISPASS, 2010.
[8]
S. Hong and H. Kim. An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput. Archit. News, 37(3):152--163, June 2009.
[9]
A. Jaleel, R. S. Cohn, C. keung Luk, and B. Jacob. Cmp$im: A pin-based on-the-fly multi-core cache simulator. In MoBS, 2008.
[10]
A. Joshi et al. Performance cloning: A technique for disseminating proprietary applications as benchmarks. In IISWC, pages 105--115, 2006.
[11]
Y. Kim, W. Yang, and O. Mutlu. Ramulator: A fast and extensible dram simulator. IEEE Computer Architecture Letters, 15(1):45--49, 2016.
[12]
J. Lee et al. Many-thread aware prefetching mechanisms for GPGPU applications. In MICRO, pages 213--224. IEEE Computer Society, 2010.
[13]
S. Y. Lee and C. J. Wu. Characterizing the latency hiding ability of gpus. In ISPASS, pages 145--146, 2014.
[14]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Syst. J., 9(2):78--117, June 1970.
[15]
C. Nugteren et al. A detailed gpu cache model based on reuse distance theory, HPCA, pages 37--48, 2014.
[16]
NVIDIA. Cuda c programming guide 5.5. 2013.
[17]
R. Panda et al. Prefetching techniques for near-memory throughput processors, In ICS, 2016.
[18]
R. Panda, X. Zheng, and L. John. Accurate address streams for llc and beyond (slab): A methodology to enable system exploration. In IEEE ISPASS, 2017.
[19]
J. Power et al. gem5-gpu: A heterogeneous cpu-gpu simulator. IEEE CAL, 14(1):34--36, Jan 2015.
[20]
J. Sim, A. Dasgupta, H. Kim, and R. Vuduc. A performance analysis framework for identifying potential benefits in gpgpu applications. In PPoPP, 2012.
[21]
T. Tang et al. Cache miss analysis for gpu programs based on stack distance profile. In ICDCS, pages 623--634, 2011.
[22]
Z. Yu et al. Gpgpu-minibench: Accelerating gpgpu micro-architecture simulation. IEEE Transactions on Computers, 64(11):3153--3166, Nov 2015.

Cited By

View all
  • (2023)Mystique: Enabling Accurate and Scalable Generation of Production AI BenchmarksProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589072(1-13)Online publication date: 17-Jun-2023
  • (2023)GPU thread throttling for page-level thrashing reduction via static analysisThe Journal of Supercomputing10.1007/s11227-023-05787-y80:7(9829-9847)Online publication date: 16-Dec-2023
  • (2019)Compiler-Assisted GPU Thread Throttling for Reduced Cache ContentionProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337886(1-10)Online publication date: 5-Aug-2019
  • Show More Cited By
  1. Statistical Pattern Based Modeling of GPU Memory Access Streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
    June 2017
    533 pages
    ISBN:9781450349277
    DOI:10.1145/3061639
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    DAC '17
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Mystique: Enabling Accurate and Scalable Generation of Production AI BenchmarksProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589072(1-13)Online publication date: 17-Jun-2023
    • (2023)GPU thread throttling for page-level thrashing reduction via static analysisThe Journal of Supercomputing10.1007/s11227-023-05787-y80:7(9829-9847)Online publication date: 16-Dec-2023
    • (2019)Compiler-Assisted GPU Thread Throttling for Reduced Cache ContentionProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337886(1-10)Online publication date: 5-Aug-2019
    • (2019)A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00062(506-518)Online publication date: Feb-2019
    • (2019)Static code transformations for thread‐dense memory accesses in GPU computingConcurrency and Computation: Practice and Experience10.1002/cpe.551232:5Online publication date: 18-Oct-2019
    • (2018)Towards automatic restrictification of CUDA kernel argumentsProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering10.1145/3238147.3241533(928-931)Online publication date: 3-Sep-2018
    • (2018)HALOProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205323(118-128)Online publication date: 12-Jun-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media