research-article

GPU-accelerated static timing analysis

Authors:

Tsung-Wei Huang,

Yibo LinAuthors Info & Claims

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

Article No.: 147, Pages 1 - 9

https://doi.org/10.1145/3400302.3415631

Published: 17 December 2020 Publication History

Abstract

The ever-increasing power of graphics processing units (GPUs) has opened new opportunities for accelerating static timing analysis (STA) to a new milestone. Developing a CPU-GPU parallel STA engine is an extremely challenging job. We need to consider the unique problem characteristics of STA and distinct performance models between CPU and GPU, both of which require very strategic decomposition to benefit from heterogeneous parallelism. In this paper, we propose an efficient implementation for accelerating STA on a GPU. We leverage task-based approaches to decompose the STA workload into CPU-GPU dependent tasks where kernel computation and data processing overlap effectively. We develop GPU-efficient data structures and high-performance kernels to speed up various tasks of STA including levelization, delay calculation, and graph update. Our acceleration framework is flexible and adaptive. When tasks are scarce such as incremental timing, we run the normal CPU mode, and we enable GPU when tasks are massive. We have implemented our algorithms on top of OpenTimer and demonstrated promising performance speed-up on large designs. As an example, we achieved up to 3.69× speed-up on a large design of 1.6M gates and 1.6M nets using one GPU.

References

[1]

J. Bhasker and R. Chadha, Static Timing Analysis for Nanometer Designs: A Practical Approach, 1st ed. Springer Publishing Company, Incorporated, 2009.

Digital Library

[2]

J. Hu, G. Schaeffer, and V. Garg, "TAU 2015 contest on incremental timing analysis," in 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2015, pp. 882--889.

[3]

W. E. Donath and D. J. Hathaway, "Distributed static timing analysis," Apr. 29 2003, uS Patent 6,557,151.

[4]

T. Huang, G. Guo, C. Lin, and M. D. F. Wong, "OpenTimer v2: A New Parallel Incremental Timing Analysis Engine," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1--1, 2020.

[5]

T.-W. Huang and M. D. Wong, "OpenTimer: A high-performance timing analysis tool," in 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2015, pp. 895--902.

[6]

T.-W. Huang, C.-X. Lin, G. Guo, and M. D. F. Wong, "Cpp-Taskflow: Fast Task-based Parallel Programming using Modern C++," in IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2019, pp. 974--983.

[7]

T.-W. Huang, M. D. Wong, D. Sinha, K. Kalafala, and N. Venkateswaran, "A distributed timing analysis framework for large designs," in 2016 53nd ACM/IEEE Design Automation Conference (DAC). IEEE, 2016, pp. 1--6.

[8]

K. E. Murray and V. Betz, "Tatum: Parallel timing analysis for faster design cycles and improved optimization," in 2018 International Conference on Field-Programmable Technology (FPT). IEEE, 2018, pp. 110--117.

[9]

Y.-M. Yang, Y.-W. Chang, and I. H.-R. Jiang, "iTimerC: Common path pessimism removal using effective reduction methods," in 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2014, pp. 600--605.

[10]

T.-W. Huang and M. D. Wong, "UI-timer 1.0: An ultrafast path-based timing analysis algorithm for CPPR," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 35, no. 11, pp. 1862--1875, 2016.

Digital Library

[11]

P.-Y. Lee, I. H.-R. Jiang, and T.-C. Chen, "Fastpass: fast timing path search for generalized timing exception handling," in 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2018, pp. 172--177.

[12]

"OpenSTA," https://github.com/abk-openroad/OpenSTA.

[13]

J. Hu, D. Sinha, and I. Keller, "TAU 2014 contest on removing common path pessimism during timing analysis," in Proceedings of the 2014 on International Symposium on physical design, 2014, pp. 153--160.

[14]

J. Hu, S. Chen, X. Zhao, and X. Chen, "TAU 2016 contest on macro modeling," in 2016 ACM International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems. ACM, 2016.

[15]

T.-Y. Lai, T.-W. Huang, and M. D. Wong, "LibAbs: An efficient and accurate timing macro-modeling algorithm for large hierarchical designs," in Proceedings of the 54th Annual Design Automation Conference 2017, 2017, pp. 1--6.

[16]

P.-Y. Lee and I. H.-R. Jiang, "iTimerM: A compact and accurate timing macro model for efficient hierarchical timing analysis," ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 23, no. 4, pp. 1--21, 2018.

Digital Library

[17]

H. H.-W. Wang, L. Y.-Z. Lin, R. H.-M. Huang, and C. H.-P. Wen, "Casta: Cuda-accelerated static timing analysis for VLSI designs," in 2014 43rd International Conference on Parallel Processing. IEEE, 2014, pp. 192--200.

[18]

K. Gulati and S. P. Khatri, "Accelerating statistical static timing analysis using graphics processing units," in 2009 Asia and South Pacific Design Automation Conference. IEEE, 2009, pp. 260--265.

[19]

Y. Shen and J. Hu, "GPU acceleration for PCA-based statistical static timing analysis," in 2015 33rd IEEE International Conference on Computer Design (ICCD). IEEE, 2015, pp. 674--679.

[20]

J. Cong, K. Gururaj, W. Jiang, B. Liu, K. Minkovich, B. Yuan, and Y. Zou, "Accelerating Monte Carlo based SSTA using FPGA," in Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays, 2010, pp. 111--114.

[21]

M. Kim, J. Hu, J. Li, and N. Viswanathan, "ICCAD-2015 CAD contest in incremental timing-driven placement and benchmark suite," in 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2015, pp. 921--926.

[22]

N. Viswanathan, G.-J. Nam, J. A. Roy, Z. Li, C. J. Alpert, S. Ramji, and C. Chu, "ITOP: Integrating timing optimization within placement," in ISPD, 2010, pp. 83--90.

[23]

D. Liu, B. Yu, S. Chowdhury, and D. Z. Pan, "TILA-S: Timing-driven incremental layer assignment avoiding slew violations," IEEE TCAD, 2017.

[24]

K.-M. Lai, T.-W. Huang, and T.-Y. Ho, "A general cache framework for efficient generation of timing critical paths," in Proceedings of the 56th Annual Design Automation Conference 2019, 2019.

[25]

G. Guo, T.-W. Huang, C.-X. Lin, and M. Wong, "A general cache framework for efficient generation of timing critical paths," in Proceedings of the 57th Annual Design Automation Conference 2020, 2020.

[26]

Y. Lin, Z. Jiang, J. Gu, W. Li, S. Dhar, H. Ren, B. Khailany, and D. Z. Pan, "DREAMPlace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement," IEEE TCAD, 2020.

[27]

Y. Lin, W. Li, J. Gu, H. Ren, B. Khailany, and D. Z. Pan, "ABCDPlace: Accelerated batch-based concurrent detailed placement on multi-threaded cpus and gpus," IEEE TCAD, 2020.

Cited By

Liu TSun YChen LLi XYuan MYoung E(2025)A Unified Parallel Framework for LUT Mapping and Logic OptimizationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.342907944:1(214-226)Online publication date: Jan-2025
https://doi.org/10.1109/TCAD.2024.3429079
Guo ZHuang TJin ZZhuo CLin YWang RHuang R(2024)Heterogeneous Static Timing Analysis with Advanced Delay Calculator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546507(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546507
Sun YLiu TWong MYoung EDe V(2024)Massively Parallel AIG ResubstitutionProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655987(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3655987
Show More Cited By

Recommendations

Optimizing linpack benchmark on GPU-accelerated petascale supercomputer
Special issue on Community Analysis and Information Recommendation

In this paper we present the programming of the Linpack benchmark on TianHe-1 system, the first petascale supercomputer system of China, and the largest GPU-accelerated heterogeneous system ever attempted before. A hybrid programming model consisting of ...
A GPU accelerated storage system
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

Massively multicore processors, like, for example, Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude ...
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights
- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
Abstract
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

November 2020

1396 pages

ISBN:9781450380263

DOI:10.1145/3400302

General Chair:
Yuan Xie
Univ. of California, Santa Barbara, CA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE CAS
IEEE CEDA
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICCAD '20

Sponsor:

SIGDA

ICCAD '20: IEEE/ACM International Conference on Computer-Aided Design

November 2 - 5, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
540
Total Downloads

Downloads (Last 12 months)160
Downloads (Last 6 weeks)17

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu TSun YChen LLi XYuan MYoung E(2025)A Unified Parallel Framework for LUT Mapping and Logic OptimizationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.342907944:1(214-226)Online publication date: Jan-2025
https://doi.org/10.1109/TCAD.2024.3429079
Guo ZHuang TJin ZZhuo CLin YWang RHuang R(2024)Heterogeneous Static Timing Analysis with Advanced Delay Calculator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546507(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546507
Sun YLiu TWong MYoung EDe V(2024)Massively Parallel AIG ResubstitutionProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655987(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3655987
Lin SGuo GHuang TSheng WYoung EWong MDe V(2024)GCS-Timer: GPU-Accelerated Current Source Model Based Static Timing AnalysisProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655983(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3655983
Chiu CHuang T(2024)An Experimental Study of Dynamic Task Graph Parallelism for Large-Scale Circuit Analysis Workloads2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI61997.2024.00149(766-770)Online publication date: 1-Jul-2024
https://doi.org/10.1109/ISVLSI61997.2024.00149
Guo ZLin YWang RHuang R(2024)Analyzing Timing in Shorter Time: A Journey Through Heterogeneous Parallelism for Static Timing Analysis2024 IEEE 17th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)10.1109/ICSICT62049.2024.10831648(1-4)Online publication date: 22-Oct-2024
https://doi.org/10.1109/ICSICT62049.2024.10831648
Morchdi CChiu CZhou YHuang T(2024)A Resource-efficient Task Scheduling System using Reinforcement Learning : Invited Paper2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473960(89-95)Online publication date: 22-Jan-2024
https://doi.org/10.1109/ASP-DAC58780.2024.10473960
Tu KTang XYu CJosipović LChu ZTu KTang XYu CJosipović LChu Z(2024)Performance (Timing) AnalysisFPGA EDA10.1007/978-981-99-7755-0_5(73-78)Online publication date: 1-Feb-2024
https://doi.org/10.1007/978-981-99-7755-0_5
Guo GHuang TWong M(2023)Fast STA Graph Partitioning Framework for Multi-GPU Acceleration2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137050(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10137050
Aghaeekiasaraee ETabrizi AFontana TNetto RAlmeida SGandhi UGüntzel JWestwick DBehjat L(2023)CRP2.0: A Fast and Robust Cooperation between Routing and Placement in Advanced Technology NodesACM Transactions on Design Automation of Electronic Systems10.1145/359096228:5(1-42)Online publication date: 5-Apr-2023
https://dl.acm.org/doi/10.1145/3590962
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten