skip to main content
10.1145/3400302.3415631acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

GPU-accelerated static timing analysis

Published: 17 December 2020 Publication History

Abstract

The ever-increasing power of graphics processing units (GPUs) has opened new opportunities for accelerating static timing analysis (STA) to a new milestone. Developing a CPU-GPU parallel STA engine is an extremely challenging job. We need to consider the unique problem characteristics of STA and distinct performance models between CPU and GPU, both of which require very strategic decomposition to benefit from heterogeneous parallelism. In this paper, we propose an efficient implementation for accelerating STA on a GPU. We leverage task-based approaches to decompose the STA workload into CPU-GPU dependent tasks where kernel computation and data processing overlap effectively. We develop GPU-efficient data structures and high-performance kernels to speed up various tasks of STA including levelization, delay calculation, and graph update. Our acceleration framework is flexible and adaptive. When tasks are scarce such as incremental timing, we run the normal CPU mode, and we enable GPU when tasks are massive. We have implemented our algorithms on top of OpenTimer and demonstrated promising performance speed-up on large designs. As an example, we achieved up to 3.69× speed-up on a large design of 1.6M gates and 1.6M nets using one GPU.

References

[1]
J. Bhasker and R. Chadha, Static Timing Analysis for Nanometer Designs: A Practical Approach, 1st ed. Springer Publishing Company, Incorporated, 2009.
[2]
J. Hu, G. Schaeffer, and V. Garg, "TAU 2015 contest on incremental timing analysis," in 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2015, pp. 882--889.
[3]
W. E. Donath and D. J. Hathaway, "Distributed static timing analysis," Apr. 29 2003, uS Patent 6,557,151.
[4]
T. Huang, G. Guo, C. Lin, and M. D. F. Wong, "OpenTimer v2: A New Parallel Incremental Timing Analysis Engine," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1--1, 2020.
[5]
T.-W. Huang and M. D. Wong, "OpenTimer: A high-performance timing analysis tool," in 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2015, pp. 895--902.
[6]
T.-W. Huang, C.-X. Lin, G. Guo, and M. D. F. Wong, "Cpp-Taskflow: Fast Task-based Parallel Programming using Modern C++," in IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2019, pp. 974--983.
[7]
T.-W. Huang, M. D. Wong, D. Sinha, K. Kalafala, and N. Venkateswaran, "A distributed timing analysis framework for large designs," in 2016 53nd ACM/IEEE Design Automation Conference (DAC). IEEE, 2016, pp. 1--6.
[8]
K. E. Murray and V. Betz, "Tatum: Parallel timing analysis for faster design cycles and improved optimization," in 2018 International Conference on Field-Programmable Technology (FPT). IEEE, 2018, pp. 110--117.
[9]
Y.-M. Yang, Y.-W. Chang, and I. H.-R. Jiang, "iTimerC: Common path pessimism removal using effective reduction methods," in 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2014, pp. 600--605.
[10]
T.-W. Huang and M. D. Wong, "UI-timer 1.0: An ultrafast path-based timing analysis algorithm for CPPR," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 35, no. 11, pp. 1862--1875, 2016.
[11]
P.-Y. Lee, I. H.-R. Jiang, and T.-C. Chen, "Fastpass: fast timing path search for generalized timing exception handling," in 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2018, pp. 172--177.
[12]
"OpenSTA," https://github.com/abk-openroad/OpenSTA.
[13]
J. Hu, D. Sinha, and I. Keller, "TAU 2014 contest on removing common path pessimism during timing analysis," in Proceedings of the 2014 on International Symposium on physical design, 2014, pp. 153--160.
[14]
J. Hu, S. Chen, X. Zhao, and X. Chen, "TAU 2016 contest on macro modeling," in 2016 ACM International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems. ACM, 2016.
[15]
T.-Y. Lai, T.-W. Huang, and M. D. Wong, "LibAbs: An efficient and accurate timing macro-modeling algorithm for large hierarchical designs," in Proceedings of the 54th Annual Design Automation Conference 2017, 2017, pp. 1--6.
[16]
P.-Y. Lee and I. H.-R. Jiang, "iTimerM: A compact and accurate timing macro model for efficient hierarchical timing analysis," ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 23, no. 4, pp. 1--21, 2018.
[17]
H. H.-W. Wang, L. Y.-Z. Lin, R. H.-M. Huang, and C. H.-P. Wen, "Casta: Cuda-accelerated static timing analysis for VLSI designs," in 2014 43rd International Conference on Parallel Processing. IEEE, 2014, pp. 192--200.
[18]
K. Gulati and S. P. Khatri, "Accelerating statistical static timing analysis using graphics processing units," in 2009 Asia and South Pacific Design Automation Conference. IEEE, 2009, pp. 260--265.
[19]
Y. Shen and J. Hu, "GPU acceleration for PCA-based statistical static timing analysis," in 2015 33rd IEEE International Conference on Computer Design (ICCD). IEEE, 2015, pp. 674--679.
[20]
J. Cong, K. Gururaj, W. Jiang, B. Liu, K. Minkovich, B. Yuan, and Y. Zou, "Accelerating Monte Carlo based SSTA using FPGA," in Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays, 2010, pp. 111--114.
[21]
M. Kim, J. Hu, J. Li, and N. Viswanathan, "ICCAD-2015 CAD contest in incremental timing-driven placement and benchmark suite," in 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2015, pp. 921--926.
[22]
N. Viswanathan, G.-J. Nam, J. A. Roy, Z. Li, C. J. Alpert, S. Ramji, and C. Chu, "ITOP: Integrating timing optimization within placement," in ISPD, 2010, pp. 83--90.
[23]
D. Liu, B. Yu, S. Chowdhury, and D. Z. Pan, "TILA-S: Timing-driven incremental layer assignment avoiding slew violations," IEEE TCAD, 2017.
[24]
K.-M. Lai, T.-W. Huang, and T.-Y. Ho, "A general cache framework for efficient generation of timing critical paths," in Proceedings of the 56th Annual Design Automation Conference 2019, 2019.
[25]
G. Guo, T.-W. Huang, C.-X. Lin, and M. Wong, "A general cache framework for efficient generation of timing critical paths," in Proceedings of the 57th Annual Design Automation Conference 2020, 2020.
[26]
Y. Lin, Z. Jiang, J. Gu, W. Li, S. Dhar, H. Ren, B. Khailany, and D. Z. Pan, "DREAMPlace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement," IEEE TCAD, 2020.
[27]
Y. Lin, W. Li, J. Gu, H. Ren, B. Khailany, and D. Z. Pan, "ABCDPlace: Accelerated batch-based concurrent detailed placement on multi-threaded cpus and gpus," IEEE TCAD, 2020.

Cited By

View all
  • (2025)A Unified Parallel Framework for LUT Mapping and Logic OptimizationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.342907944:1(214-226)Online publication date: Jan-2025
  • (2024)Heterogeneous Static Timing Analysis with Advanced Delay Calculator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546507(1-6)Online publication date: 25-Mar-2024
  • (2024)Massively Parallel AIG ResubstitutionProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655987(1-6)Online publication date: 23-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design
November 2020
1396 pages
ISBN:9781450380263
DOI:10.1145/3400302
  • General Chair:
  • Yuan Xie
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CAS
  • IEEE CEDA
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICCAD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)160
  • Downloads (Last 6 weeks)17
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Unified Parallel Framework for LUT Mapping and Logic OptimizationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.342907944:1(214-226)Online publication date: Jan-2025
  • (2024)Heterogeneous Static Timing Analysis with Advanced Delay Calculator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546507(1-6)Online publication date: 25-Mar-2024
  • (2024)Massively Parallel AIG ResubstitutionProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655987(1-6)Online publication date: 23-Jun-2024
  • (2024)GCS-Timer: GPU-Accelerated Current Source Model Based Static Timing AnalysisProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655983(1-6)Online publication date: 23-Jun-2024
  • (2024)An Experimental Study of Dynamic Task Graph Parallelism for Large-Scale Circuit Analysis Workloads2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI61997.2024.00149(766-770)Online publication date: 1-Jul-2024
  • (2024)Analyzing Timing in Shorter Time: A Journey Through Heterogeneous Parallelism for Static Timing Analysis2024 IEEE 17th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)10.1109/ICSICT62049.2024.10831648(1-4)Online publication date: 22-Oct-2024
  • (2024)A Resource-efficient Task Scheduling System using Reinforcement Learning : Invited Paper2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473960(89-95)Online publication date: 22-Jan-2024
  • (2024)Performance (Timing) AnalysisFPGA EDA10.1007/978-981-99-7755-0_5(73-78)Online publication date: 1-Feb-2024
  • (2023)Fast STA Graph Partitioning Framework for Multi-GPU Acceleration2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137050(1-6)Online publication date: Apr-2023
  • (2023)CRP2.0: A Fast and Robust Cooperation between Routing and Placement in Advanced Technology NodesACM Transactions on Design Automation of Electronic Systems10.1145/359096228:5(1-42)Online publication date: 5-Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media