research-article

A Formal Instruction-level GPU Model for Scalable Verification

Authors:

Sharad MalikAuthors Info & Claims

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pages 1 - 8

https://doi.org/10.1145/3240765.3240771

Published: 05 November 2018 Publication History

Abstract

GPUs have been widely used to accelerate big-data inference applications and scientific computing through their parallelized hardware resources and programming model. Their extreme parallelism increases the possibility of bugs such as data races and un-coalesced memory accesses, and thus verifying program correctness is critical. State-of-the-art GPU program verification efforts mainly focus on analyzing application-level programs, e.g., in C, and suffer from the following limitations: (1) high false-positive rate due to coarse-grained abstraction of synchronization primitives, (2) high complexity of reasoning about pointer arithmetic, and (3) keeping up with an evolving API for developing application-level programs. In this paper, we address these limitations by modeling GPUs and reasoning about programs at the instruction level. We formally model the Nvidia GPU at the parallel execution thread (PTX) level using the recently proposed Instruction-Level Abstraction (ILA) model for accelerators. PTX is analogous to the Instruction-Set Architecture (ISA) of a general-purpose processor. Our formal ILA model of the GPU includes non-synchronization instructions as well as all synchronization primitives, enabling us to verify multithreaded programs. We demonstrate the applicability of our ILA model in scalable GPU program verification of data-race checking. The evaluation shows that our checker outperforms state-of-the-art GPU data race checkers with fewer false-positives and improved scalability.

References

[1]

Rajeev Alur, Joseph Devietti, Omar S Navarro Leija, and Nimit Singhania. 2017. GPUDrano: Detecting Uncoalesced Accesses in GPU Programs. In International Conference on Computer Aided Verification. Springer, 507–525.

[2]

Ali Bakhoda, George L Yuan, Wilson WL Fung, Henry Wong, and Tor M Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on. IEEE, 163–174.

[3]

Adam Betts, Nathan Chong, Alastair Donaldson, Shaz Qadeer, and Paul Thomson. 2012. GPUVerify: a verifier for GPU kernels. In ACM SIGPLAN Notices, Vol. 47. ACM, 113–132.

[4]

Armin Biere, Alessandro Cimatti, Edmund M Clarke, Ofer Strichman, Yunshan Zhu, et al. 2003. Bounded model checking. Advances in computers 58, 11 (2003), 117–148.

[5]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. Ieee, 44–54.

[6]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340.

[7]

Ariel Eizenberg, Yuanfeng Peng, Toma Pigli, William Mansky, and Joseph Devietti. 2017. BARRACUDA: Binary-level Analysis of Runtime RAces in CUDA programs. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 126–140.

[8]

Wilson WL Fung, Ivan Sham, George Yuan, and Tor M Aamodt. 2007. Dynamic warp formation and scheduling for efficient GPU control flow. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 407–420.

[9]

Bo-Yuan Huang Hongce Zhang, Pramod Subramanyan, Yakir Vizel, Aarti Gupta and Sharad Malik. 2018. Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification. arXiv preprint arXiv: (2018).

[10]

Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (1978), 558–565.

Digital Library

[11]

Guodong Li and Ganesh Gopalakrishnan. 2010. Scalable SMT-based verification of GPU kernel functions. In Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering. ACM, 187–196.

[12]

Nvidia. 2017. CUDA C Programming Guide 9.1. (2017). https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

[13]

Nvidia. 2017. CUDA Parallel Thread Execution ISA 6.1. (2017). https://docs.nvidia.com/cuda/parallel-thread-execution/index.html

[14]

Nvidia. 2017. CUDA Toolkit 9.1. (2017). https://developer.nvidia.com/cuda-downloads

[15]

Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial Sketching for Finite Programs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, Vol. 41. ACM, 404–415.

[16]

Pramod Subramanyan, Bo-Yuan Huang Yakir Vizel, Aarti Gupta and Sharad Malik. 2017. Template-based Parameterized Synthesis of Uniform Instruction-Level Abstractions for SoC Verification. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2017).

[17]

Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: a simulation framework for CPU-GPU computing. In Parallel Architectures and Compilation Techniques (PACT), 2012 21st International Conference on. IEEE, 335–344.

[18]

Shucai Xiao and Wu-chun Feng. 2010. Inter-block GPU communication via fast barrier synchronization. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 1–12.

[19]

Mai Zheng, Vignesh T Ravi, Feng Qin, and Gagan Agrawal. 2011. GRace: a low-overhead mechanism for detecting data races in GPU programs. In ACM SIGPLAN Notices, Vol. 46. ACM, 135–146.

[20]

Mai Zheng, Vignesh T Ravi, Feng Qin, and Gagan Agrawal. 2014. Gmrace: Detecting data races in gpu programs via a low-overhead scheme. IEEE Transactions on Parallel and Distributed Systems 25, 1 (2014), 104–115.

Digital Library

Cited By

Wittingen EHuisman MŞakar Ö(2024)Deductive Verification of SYCL in VerCorsSoftware Engineering and Formal Methods10.1007/978-3-031-77382-2_11(182-199)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1007/978-3-031-77382-2_11
Luz HSouza PSouza S(2024)Structural testing for CUDA programming modelConcurrency and Computation: Practice and Experience10.1002/cpe.810536:14Online publication date: 9-Apr-2024
https://doi.org/10.1002/cpe.8105
Hu XHe DLuo MPeng CFeng QHuang X(2023)High-Performance Implementation of the Identity-Based Signature Scheme in IEEE P1363 on GPUACM Transactions on Embedded Computing Systems10.1145/356478422:2(1-35)Online publication date: 24-Jan-2023
https://dl.acm.org/doi/10.1145/3564784
Show More Cited By

Index Terms

A Formal Instruction-level GPU Model for Scalable Verification

Index terms have been assigned to the content through auto-classification.

Recommendations

POIGEM: A Programming-Oriented Instruction Level GPU Energy Model for CUDA Program
Algorithms and Architectures for Parallel Processing
Abstract
GPU architectures tend to be increasingly important in multi-core era nowadays due to their formidable computational horsepower. With the assistant of effective programming paradigms as CUDA, GPUs are widely adopted to accelerate scientific ...
A scalable framework for heterogeneous GPU-based clusters
SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

GPU-based heterogeneous clusters continue to draw attention from vendors and HPC users due to their high energy efficiency and much improved single-node computational performance, however, there is little parallel software available that can utilize all ...
Optimized two-level parallelization for GPU accelerators using the polyhedral model
CC 2017: Proceedings of the 26th International Conference on Compiler Construction

While GPUs play an increasingly important role in today's high-performance computers, optimizing GPU performance continues to impose large burdens upon programmers. A major challenge in optimizing codes for GPUs stems from the two levels of hardware ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Nov 2018

939 pages

Copyright © 2018.

Publisher

IEEE Press

Publication History

Published: 05 November 2018

Permissions

Request permissions for this article.

Request Permissions

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
320
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wittingen EHuisman MŞakar Ö(2024)Deductive Verification of SYCL in VerCorsSoftware Engineering and Formal Methods10.1007/978-3-031-77382-2_11(182-199)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1007/978-3-031-77382-2_11
Luz HSouza PSouza S(2024)Structural testing for CUDA programming modelConcurrency and Computation: Practice and Experience10.1002/cpe.810536:14Online publication date: 9-Apr-2024
https://doi.org/10.1002/cpe.8105
Hu XHe DLuo MPeng CFeng QHuang X(2023)High-Performance Implementation of the Identity-Based Signature Scheme in IEEE P1363 on GPUACM Transactions on Embedded Computing Systems10.1145/356478422:2(1-35)Online publication date: 24-Jan-2023
https://dl.acm.org/doi/10.1145/3564784
Mego RFryza T(2022)Instruction mapping techniques for processors with very long instruction word architecturesJournal of Electrical Engineering10.2478/jee-2022-005373:6(387-395)Online publication date: 24-Dec-2022
https://doi.org/10.2478/jee-2022-0053
Zhang XShcherbakov E(2020)DELTA: Validate GPU Memory Profiling with MicrobenchmarksProceedings of the International Symposium on Memory Systems10.1145/3422575.3422784(97-104)Online publication date: 28-Sep-2020
https://dl.acm.org/doi/10.1145/3422575.3422784
van den Haak LWijs Avan den Brand MHuisman M(2020)Formal Methods for GPGPU Programming: Is the Demand Met?Integrated Formal Methods10.1007/978-3-030-63461-2_9(160-177)Online publication date: 13-Nov-2020
https://doi.org/10.1007/978-3-030-63461-2_9
Huang BZhang HGupta AMalik S(2019)ILAng: A Modeling and Verification Platform for SoCs Using Instruction-Level AbstractionsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-030-17462-0_21(351-357)Online publication date: 4-Apr-2019
https://doi.org/10.1007/978-3-030-17462-0_21

View Options

View options

Figures

Tables

Media

View Table of Conten