research-article

GCoM: a detailed GPU core model for accurate analytical modeling of modern GPUs

Authors:

Youngsok KimAuthors Info & Claims

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

Pages 424 - 436

https://doi.org/10.1145/3470496.3527384

Published: 11 June 2022 Publication History

Abstract

Analytical models can greatly help computer architects perform orders of magnitude faster early-stage design space exploration than using cycle-level simulators. To facilitate rapid design space exploration for graphics processing units (GPUs), prior studies have proposed GPU analytical models which capture first-order stall events causing performance degradation; however, the existing analytical models cannot accurately model modern GPUs due to their outdated and highly abstract GPU core microarchitecture assumptions. Therefore, to accurately evaluate the performance of modern GPUs, we need a new GPU analytical model which accurately captures the stall events incurred by the significant changes in the core microarchitectures of modern GPUs.

We propose GCoM, an accurate GPU analytical model which faithfully captures the key core-side stall events of modern GPUs. Through detailed microarchitecture-driven GPU core modeling, GCoM accurately models modern GPUs by revealing the following key core-side stalls overlooked by the existing GPU analytical models. First, GCoM identifies the compute structural stall events caused by the limited per-sub-core functional units. Second, GCoM exposes the memory structural stalls due to the limited banks and shared nature of per-core L1 data caches. Third, GCoM correctly predicts the memory data stalls induced by the sectored L1 data caches which split a cache line into a set of sectors sharing the same tag. Fourth, GCoM captures the idle stalls incurred by the inter- and intra-core load imbalances. Our experiments using an NVIDIA RTX 2060 configuration show that GCoM greatly improves the modeling accuracy by achieving a mean absolute error of 10.0% against Accel-Sim cycle-level simulator, whereas the state-of-the-art GPU analytical model achieves a mean absolute error of 44.9%.

References

[1]

Advanced Micro Devices, Inc. 2016. RADEON: Dissecting the Polaris Architecture. https://www.amd.com/system/files/documents/polaris-whitepaper.pdf.

[2]

Johnathan Alsop, Matthew D. Sinclair, Rakesh Komuravelli, and Sarita V. Adve. 2016. GSI: A GPU Stall Inspector to Characterize the Sources of Memory Stalls for Tightly Coupled GPUs. In Proc. 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[3]

Yehia Arafa, Abdel-Hameed Badawy, Ammar ElWazir, Atanu Barai, Ali Eker, Gopinath Chennupati, Nandakishore Santhi, and Stephan Eidenbenz. 2021. Hybrid, scalable, trace-driven performance modeling of GPGPUs. In Proc. 2021 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

Digital Library

[4]

Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Nandakishore Santhi, and Stephan Eidenbenz. 2019. PPT-GPU: Scalable GPU Performance Modeling. IEEE Computer Architecture Letters (CAL) 18 (2019).

[5]

Newsha Ardalani, Clint Lestourgeon, Karthikeyan Sankaralingam, and Xiaojin Zhu. 2015. Cross-Architecture Performance Prediction (XAPP) Using CPU Code to Predict GPU Performance. In Proc. 48th IEEE/ACM International Symposium on Microarchitecture (MICRO).

Digital Library

[6]

Cesar A. Baddouh, Mahmoud Khairy, Roland Green, Mathias Payer, and Timothy G. Rogers. 2021. Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads. In Proc. 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]

Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. Patel, William D. Gropp, and Wen mei W. Hwu. 2010. An Adaptive Performance Modeling Tool for GPU Architectures. In Proc. 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).

Digital Library

[8]

Lorenz Braun, Sotirios Nikas, Chen Song, Vincent Heuveline, and Holger Fröning. 2020. A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels. ACM Transactions on Architecture and Code Optimization (TACO) 18 (2020).

[9]

John Burgess. 2020. RTX on---The NVIDIA Turing GPU. IEEE Micro 40 (2020).

[10]

Shuai Che, Bradford M. Beckmann, Steven K. Reinhardt, and Kevin Skadron. 2013. Pannotia: Understanding irregular GPGPU graph applications. In Proc. 2013 IEEE International Symposium on Workload Characterization (IISWC).

[11]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W, Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proc. 2009 IEEE International Symposium on Workload Characterization (IISWC).

Digital Library

[12]

Jack Choquette, Olivier Giroux, and Denis Foley. 2018. Volta: Performance and Programmability. IEEE Micro 38 (2018).

[13]

Saumay Dublish, Vijay Nagarajan, and Nigel Topham. 2019. Poise: Balancing Thread-Level Parallelism and Memory System Performance in GPUs Using Machine Learning. In Proc. 25th IEEE International Symposium on High Performance Computer Architecture (HPCA).

[14]

Stijin Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. 2009. A Mechanistic Performance Model for Superscalar Out-of-Order Processors. ACM Transactions on Computer Systems 27 (2009).

[15]

B.A. Fields, R. Bodik, M.D. Hill, and C.J. Newburn. 2003. Using interaction costs for microarchitectural bottleneck analysis. In Proc. 36th IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]

B. Fields, S. Rubin, and R. Bodik. 2001. Focusing processor policies via critical-path prediction. In Proc. 28th IEEE/ACM International Symposium on Computer Architecture (ISCA).

[17]

Denis Foley and John Danskin. 2017. Ultra-Performance Pascal GPU and NVLink Interconnect. IEEE Micro 37 (2017).

Digital Library

[18]

Michael Garland, Scott Le Grand, John Nickolls, Joshua Anderson, Jim Hardwick, Scott Morton, Everett Phillips, Yao Zhang, and Vasily Volkov. 2008. Parallel Computing Experiences with CUDA. IEEE Micro 28 (2008).

[19]

Xiang Gong, Chunling HU, and Chu-Cheow Lim. 2020. PAQSIM: Fast Performance Model for Graphics Workload on Mobile GPUs. In Proc. 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES).

Digital Library

[20]

Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a bhigh-level language targeted to GPU codes. In 2012 Innovative Parallel Computing (InPar).

[21]

João Guerreiro, Aleksandar Ilic, Nuno Roma, and Pedro Tomás. 2019. GPU Static Modeling Using PTX and Deep Structured Learning. IEEE Access 7 (2019).

[22]

Anthony Gutierrez, Bradford M. Beckmann, Alexandru Dutu, Joseph Gross, Michael LeBeane, John Kalamatianos, Onur Kayiran, Matthew Poremba, Brandon Potter, Sooraj Puthoor, Matthew D. Sinclair, Mark Wyse, Jieming Yin, Xianwei Zhang, Akshay Jain, and Timothy Rogers. 2018. Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level. In Proc. 24th IEEE International Symposium on High Performance Computer Architecture (HPCA).

[23]

Seonyeong Heo, Sungjun Cho, Youngsok Kim, and Hanjun Kim. 2020. Real-Time Object Detection System with Multi-Path Neural Networks. In Proc. 26th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[24]

Sunpyo Hong and Hyesoon Kim. 2009. An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness. In Proc. 36th IEEE/ACM International Symposium on Computer Architecture (ISCA).

Digital Library

[25]

Jen-Cheng Huang, Joo Hwan Lee, Hyesoon Kim, and Hsien-Hsin S. Lee. 2014. GPUMech: GPU Performance Modeling Techniques based on Interval Analysis. In Proc. 47th IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]

Jen-Cheng Huang, Lifeng Nai, Hyesoon Kim, and Hsien-Hsin S. Lee. 2014. TB-Point: Reducing Simulation Time for Large-Scale GPGPU Kernels. In Proc. 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[27]

Hanhwi Jang, Jae-Eon Jo, Jaewon Lee, and Jangwoo Kim. 2018. RpStacks-MT: A High-throughput Design Evaluation Methodology for Multi-core Processors. In Proc. 51st IEEE/ACM International Symposium on Microarchitecture (MICRO).

Digital Library

[28]

Tejas S. Karkhanis and James E. Smith. 2004. A First-Order Superscalar Processor Model. In Proc. 31st IEEE/ACM International Symposium on Computer Architecture (ISCA).

[29]

Aajna Karki, Chethan Palangotu Keshava, Spoorthi Mysore Shivakumar, Joshua Skow, Goutam Madhukeshwar Hegde, and Hyeran Jeon. 2019. Detailed Characterization of Deep Neural Networks on GPUs and FPGAs. In Proc. 12th Workshop on General Purpose Processing Using GPUs (GPGPU).

Digital Library

[30]

Mahmoud Khairy, Zhesheng Shen, Tor M. Aamodt, and Timothy G. Rogers. 2020. Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling. In Proc. 47th IEEE/ACM International Symposium on Computer Architecture (ISCA).

[31]

Mohsen Kiani and Amir Rajabzadeh. 2018. Efficient Cache Performance Modeling in GPUs Using Reuse Distance Analysis. ACM Transactions on Architecture and Code Optimization (TACO) 15 (2018).

[32]

Jaewon Lee, Hanhwi Jang, and Jangwoo Kim. 2014. RpStacks: Fast and Accurate Processor Design Space Exploration Using Representative Stall-Event Stacks. In Proc. 47th IEEE/ACM International Symposium on Microarchitecture (MICRO).

Digital Library

[33]

Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debo Dutta, Udit Gupta, Kim Hazelwood, Andy Hock, Xinyuan Huang, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St John, Carole-Jean Wu, Lingjie Xu, Cliff Young, and Matei Zaharia. 2020. MLPerf Training Benchmark. In Proc. Machine Learning and Systems 2 (MLSys).

[34]

Aaftab Munshi. 2009. The OpenCL specification. In Proc. IEEE Hot Chips 21 Symposium (HCS).

[35]

Sharan Narang. 2016. DeepBench. https://svail.github.io/DeepBench/.

[36]

Cedric Nugteren, Gert-Jan van den Braak, Henk Corporaal, and Henri Bal. 2014. A Detailed GPU Cache Model Based on Reuse Distance Theory. In Proc. 20th IEEE International Symposium on High Performance Computer Architecture (HPCA).

[37]

NVIDIA Corporation. 2020. Nsight Compute CLI.

[38]

NVIDIA Corporation. 2021. NVIDIA Ampere GA102 GPU Architecture. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf.

[39]

NVIDIA Corporation. 2021. Parallel Thread Execution ISA: Application Guide (v7.4).

[40]

Kenneth O'neal, Philip Brisk, Ahmed Abousamra, Zack Waters, and Emily Shriver. 2017. GPU Performance Estimation Using Software Rasterization and Machine Learning. ACM Trans. Embed. Comput. Syst. 16 (2017).

[41]

Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. 2020. MLPerf Inference Benchmark. In Proc. 47th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[42]

Timothy G. Rogers, Mike O'Connor, and Tor M. Aamodt. 2012. Cache-Conscious Wavefront Scheduling. In Proc. 45th IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43]

Ali G. Saidi, Nathan L. Binkert, Steven K. Reinhardt, and Trevor Mudge. 2008. Full-System Critical Path Analysis. In Proc. 2008 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

Digital Library

[44]

Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, and Richard Vuduc. 2012. A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications. In Proc. 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).

Digital Library

[45]

John E. Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. Computing in Science & Engineering 12 (2010).

[46]

Yifan Sun, Trinayan Baruah, Saiful A. Mojumder, Shi Dong, Xiang Gong, Shane Treadway, Yuhui Bao, Spencer Hance, Carter McCardwell, Vincent Zhao, Harrison Barclay, Amir Kavyan Ziabari, Zhongliang Chen, Rafael Ubal, José Abellán, John Kim, Ajay Joshi, and David Kaeli. 2019. MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization. In Proc. 46th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[47]

Teruo Tanimoto, Takatsugu Ono, Koji Inoue, and Hiroshi Sasaki. 2017. Enhanced Dependence Graph Model for Critical Path Analysis on Modern Out-of-Order Processors. IEEE Computer Architecture Letters (CAL) 16 (2017).

[48]

Oreste Villa, Daniel Lustig, Zi Yan, Evgeny Bolotin, Yaosheng Fu, Niladrish Chatterjee, Nan Jiang, and David Nellans. 2021. Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator. In Proc. 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[49]

Oreste Villa, Mark Stephenson, David Nellans, and Stephen W. Keckler. 2019. NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs. In Proc. 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO).

[50]

Lu Wang, Magnus Jahre, Almutaz Adileh, and Lieven Eeckhout. 2020. MDM: The GPU Memory Divergence Model. In Proc. 53rd IEEE/ACM International Symposium on Microarchitecture (MICRO).

[51]

Lu Wang, Magnus Jahre, Almutaz Adileh, Zhiying Wang, and Lieven Eeckhout. 2019. Modeling Emerging Memory-Divergent GPU Applications. IEEE Computer Architecture Letters (CAL) 18 (2019).

Digital Library

[52]

Xiebing Wang, Kai Huang, Alois Knoll, and Xuehai Qian. 2019. A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation. In Proc. 25th IEEE International Symposium on High Performance Computer Architecture (HPCA).

[53]

Gene Wu, Joseph L. Greathouse, Alexander Lyashevsky, Nuwan Jayasena, and Derek Chiou. 2015. GPGPU performance and power estimation using machine learning. In Proc. 21st IEEE International Symposium on High Performance Computer Architecture (HPCA).

[54]

Zhibin Yu, Lieven Eeckhout, Nilanjan Goswami, Tao Li, Lizy John, Hai Jin, and Chengzhong Xu. 2013. Accelerating GPGPU Architecture Simulation. In Proc. ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS).

Digital Library

[55]

Zhibin Yu, Lieven Eeckhout, Nilanjan Goswami, Tao Li, Lizy K John, Hai Jin, Chengzhong Xu, and Junmin Wu. 2015. GPGPU-MiniBench: Accelerating GPGPU Micro-Architecture Simulation. IEEE Trans. Comput. 64 (2015).

[56]

Yao Zhang and John D. Owens. 2011. A Quantitative Performance Analysis Model for GPU Architectures. In Proc. 17th IEEE International Symposium on High Performance Computer Architecture (HPCA).

Cited By

Yang JWen MChen DChen ZXue ZLi YShen JShi Y(2024)HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00022(168-185)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00022
Cha HLee SHa YJang HKim JKim Y(2024)GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU PerformanceIEEE Computer Architecture Letters10.1109/LCA.2024.347690923:2(235-238)Online publication date: Jul-2024
https://doi.org/10.1109/LCA.2024.3476909
Grigoryan DChou YAamodt T(2024)Zatel: Sample Complexity-Aware Scale-Model Simulation for Ray Tracing2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00024(156-166)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00024
Show More Cited By

Index Terms

GCoM: a detailed GPU core model for accurate analytical modeling of modern GPUs
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
  2. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

Algorithmic performance studies on graphics processing units

We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floating-point co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear ...
Beyond the socket: NUMA-aware GPUs
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

GPUs achieve high throughput and power efficiency by employing many small single instruction multiple thread (SIMT) cores. To minimize scheduling logic and performance variance they utilize a uniform memory system and leverage strong data parallelism ...
Massively LDPC Decoding on Multicore Architectures

Unlike usual VLSI approaches necessary for the computation of intensive Low-Density Parity-Check (LDPC) code decoders, this paper presents flexible software-based LDPC decoders. Algorithms and data structures suitable for parallel computing are proposed ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

June 2022

1097 pages

ISBN:9781450386104

DOI:10.1145/3470496

General Chairs:
Valentina Salapura
Google
,
Mohamed Zahran
New York University
,
Program Chairs:
Fred Chong
The University of Chicago
,
Lingjia Tang
The University of Michigan

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Yonsei Signature Research Cluster Program (2022-22-0002)
National Research Foundation of Korea (NRF)
Institute of Information & communications Technology Planning & Evaluation (IITP)
Ministry of Education (MOE) of Korea

Conference

ISCA '22

Sponsor:

SIGARCH

ISCA '22: The 49th Annual International Symposium on Computer Architecture

June 18 - 22, 2022

New York, New York

Acceptance Rates

ISCA '22 Paper Acceptance Rate 67 of 400 submissions, 17%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
1,858
Total Downloads

Downloads (Last 12 months)366
Downloads (Last 6 weeks)46

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang JWen MChen DChen ZXue ZLi YShen JShi Y(2024)HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00022(168-185)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00022
Cha HLee SHa YJang HKim JKim Y(2024)GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU PerformanceIEEE Computer Architecture Letters10.1109/LCA.2024.347690923:2(235-238)Online publication date: Jul-2024
https://doi.org/10.1109/LCA.2024.3476909
Grigoryan DChou YAamodt T(2024)Zatel: Sample Complexity-Aware Scale-Model Simulation for Ray Tracing2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00024(156-166)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00024
Rogers JSoliman TJahre M(2024)AIO: An Abstraction for Performance Analysis Across Diverse Accelerator Architectures2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00043(487-500)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00043
Chaturvedi IGodala BWu YXu ZIliakis KEleftherakis PXydis SSoudris DSorensen TCampanoni SAamodt TAugust D(2024)GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00011(1-16)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00011
SeyyedAghaei HNaderan-Tahan MEeckhout L(2024)GPU Scale-Model Simulation2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00088(1125-1140)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00088
Naderan-Tahan MSeyyedAghaei HEeckhout L(2023)Sieve: Stratified GPU-Compute Workload Sampling2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00030(224-234)Online publication date: Apr-2023
https://doi.org/10.1109/ISPASS57527.2023.00030

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten