skip to main content
10.1145/3545008.3545091acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus

Published: 13 January 2023 Publication History

Abstract

High-throughput RTL simulation is critical for verifying today’s highly complex SoCs. Recent research has explored accelerating RTL simulation by leveraging event-driven approaches or partitioning heuristics to speed up simulation on a single stimulus. To further accelerate throughput performance, industry-quality functional verification signoff must explore running multiple stimulus (i.e., batch stimulus) simultaneously, either with directed tests or random inputs. In this paper, we propose RTLFlow, a GPU-accelerated RTL simulation flow with batch stimulus. RTLflow first transpiles RTL into CUDA kernels that each simulates a partition of the RTL simultaneously across multiple stimulus. It also leverages CUDA Graph and pipeline scheduling for efficient runtime execution. Measuring experimental results on a large industrial design (NVDLA) with 65536 stimulus, we show that RTLflow running on a single A6000 GPU can achieve a 40 × runtime speed-up when compared to an 80-thread multi-core CPU baseline.

References

[1]
2012. Nvidia System Management Interface. https://developer.nvidia.com/nvidia-system-management-interface.
[2]
2012. Yosys. https://yosyshq.net/yosys/.
[3]
2016. Spinal. https://github.com/SpinalHDL/VexRiscv.
[4]
2017. Nvidia Deep Learning Accelerator Design (NVDLA). http://nvdla.org/.
[5]
2017. Nvidia Nsight Systems. https://developer.nvidia.com/nsight-systems.
[6]
2018. riscv-mini. https://github.com/ucb-bar/riscv-mini.
[7]
2019. CUDA Graph. https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__GRAPH.html.
[8]
2022. RTLflow. https://github.com/dian-lun-lin/RTLflow.
[9]
Scott Beamer and David Donofrio. 2020. Efficiently exploiting low activity factors to accelerate RTL simulation. In ACM/IEEE DAC. 1–6.
[10]
Debapriya Chatterjee, Andrew Deorio, and Valeria Bertacco. 2011. Gate-Level Simulation with GPU Computing. ACM TODAES 16, 3.
[11]
Cheng-Hsiang Chiu and Tsung-Wei Huang. 2022. Composing Pipeline Parallelism Using Control Taskflow Graph. In ACM HPDC. 283––284.
[12]
Cheng-Hsiang Chiu and Tsung-Wei Huang. 2022. Efficient Timing Propagation with Simultaneous Structural and Pipeline Parallelisms. In ACM/IEEE DAC.
[13]
Walter R Gilks, Sylvia Richardson, and David Spiegelhalter. 1995. Markov chain Monte Carlo in practice. CRC press.
[14]
Guannan Guo, Tsung-Wei Huang, Yibo Lin, and Martin Wong. 2021. GPU-accelerated Pash-based Timing Analysis. In ACM/IEEE DAC.
[15]
Zizheng Guo, Tsung-Wei Huang, and Yibo Lin. 2020. GPU-accelerated Static Timing Analysis. In IEEE/ACM ICCAD. 1–8.
[16]
W Keith Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. Oxford University Press.
[17]
Tsung-Wei Huang, Guannan Guo, Chun-Xun Lin, and Martin Wong. 2021. OpenTimer 2.0: A New Parallel Incremental Timing Analysis Engine. IEEE TCAD 40, 4 (2021), 776–789.
[18]
Tsung-Wei Huang, Chun-Xun Lin, Guannan Guo, and Martin Wong. 2019. Cpp-Taskflow: Fast Task-based Parallel Programming using Modern C++. In IEEE IPDPS. 974–983.
[19]
Tsung-Wei Huang, Dian-Lun Lin, Chun-Xun Lin, and Yibo Lin. 2021. Taskflow: A lightweight parallel and heterogeneous task graph computing system. IEEE Transactions on Parallel and Distributed Systems 33, 6, 1303–1320.
[20]
Tsung-Wei Huang and Martin Wong. 2015. OpenTimer: A high-performance timing analysis tool. In IEEE/ACM ICCAD. 895–902.
[21]
Tsung-Wei Huang, Martin D. F. Wong, Debjit Sinha, Kerim Kalafala, and Natesan Venkateswaran. 2016. A Distributed Timing Analysis Framework for Large Designs. In ACM/IEEE DAC. 116:1–116:6.
[22]
Chun-Xun Lin, Tsung-Wei Huang, and Martin D. F. Wong. 2020. An Efficient Work-Stealing Scheduler for Task Dependency Graph. In IEEE ICPADS. 64–71.
[23]
Dian-Lun Lin and Tsung-Wei Huang. 2021. Efficient GPU Computation using Task Graph Parallelism. In Euro-Par. 435–450.
[24]
Dian-Lun Lin and Tsung-Wei Huang. 2022. Accelerating Large Sparse Neural Network Inference Using GPU Task Graph Parallelism. IEEE TPDS 33, 11 (2022), 3041–3052.
[25]
Lingyi Liu and Shobha Vasudevan. 2011. Efficient validation input generation in RTL by hybridized source code analysis. In 2011 Design, Automation Test in Europe. 1–6. https://doi.org/10.1109/DATE.2011.5763253
[26]
Hao Qian and Yangdong Deng. 2011. Accelerating RTL simulation with GPUs. In IEEE/ACM ICCAD. 687–693.
[27]
Vivek Sarkar. 1987. Partitioning and scheduling parallel programs for execution on multiprocessors. Ph. D. Dissertation. Stanford University.
[28]
Wilson Snyder. 2018. Verilator 4.0: open simulation goes multithreaded. https://veripool.org/papers/Verilator_v4_Multithreaded_OrConf2018.pdf.
[29]
Uri Tal. 2013. RocketSim: A GPU-based Simulation Accelerator for Chip Verification. https://on-demand-gtc.gputechconf.com/gtcnew/speakerName.php?speaker=Uri+Tal.
[30]
Laung-Terng Wang, Yao-Wen Chang, and Kwang-Ting (Tim) Cheng. 2009. Electronic Design Automation: Synthesis, Verification, and Test. Morgan Kaufmann Publishers Inc.
[31]
Yanqing Zhang, Haoxing Ren, and Brucek Khailany. 2020. Opportunities for RTL and Gate Level Simulation using GPUs. In IEEE/ACM ICCAD. 1–5.
[32]
Yanqing Zhang, Haoxing Ren, Akshay Sridharan, and Brucek Khailany. 2022. GATSPI: GPU Accelerated Gate-Level Simulation for Power Improvement. In IEEE/ACM DAC.
[33]
Yuhao Zhu, Bo Wang, and Yangdong Deng. 2011. Massively Parallel Logic Simulation with GPUs. ACM TODAES 16, 3.

Cited By

View all
  • (2025)Open-source ROS-based simulation for verification of FPGA robotics applicationsMicroprocessors and Microsystems10.1016/j.micpro.2025.105143113(105143)Online publication date: Mar-2025
  • (2024)GSAP: A GPU-Accelerated Stochastic Graph PartitionerProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673117(565-575)Online publication date: 12-Aug-2024
  • (2024)Hestia: An Efficient Cross-Level Debugger for High-Level Synthesis2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00062(765-779)Online publication date: 2-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
August 2022
976 pages
ISBN:9781450397339
DOI:10.1145/3545008
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP '22
ICPP '22: 51st International Conference on Parallel Processing
August 29 - September 1, 2022
Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,284
  • Downloads (Last 6 weeks)192
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Open-source ROS-based simulation for verification of FPGA robotics applicationsMicroprocessors and Microsystems10.1016/j.micpro.2025.105143113(105143)Online publication date: Mar-2025
  • (2024)GSAP: A GPU-Accelerated Stochastic Graph PartitionerProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673117(565-575)Online publication date: 12-Aug-2024
  • (2024)Hestia: An Efficient Cross-Level Debugger for High-Level Synthesis2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00062(765-779)Online publication date: 2-Nov-2024
  • (2024)BatchSim: Parallel RTL Simulation Using Inter-Cycle Batching and Task Graph Parallelism2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI61997.2024.00155(789-793)Online publication date: 1-Jul-2024
  • (2024)A Resource-Efficient Task Scheduling System Using Reinforcement LearningProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473960(89-95)Online publication date: 22-Jan-2024
  • (2024)Viper: Utilizing Hierarchical Program Structure to Accelerate Multi-Core SimulationIEEE Access10.1109/ACCESS.2024.335406912(17669-17678)Online publication date: 2024
  • (2024)TaroRTL: Accelerating RTL Simulation Using Coroutine-Based Heterogeneous Task Graph SchedulingEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_11(151-166)Online publication date: 26-Aug-2024
  • (2023)Khronos: Fusing Memory Access for Improved Hardware RTL SimulationProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614301(180-193)Online publication date: 28-Oct-2023
  • (2023)Neural Network Compiler for Parallel High-Throughput Simulation of Digital Circuits2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00067(613-623)Online publication date: May-2023
  • (2023)ERAS: A Flexible and Scalable Framework for Seamless Integration of RTL Models with Structural Simulation Toolkit2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00038(196-200)Online publication date: 1-Oct-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media