research-article

Scalable SMT-based verification of GPU kernel functions

Authors:
Guodong Li

University of Utah, Salt Lake City, UT, USA

University of Utah, Salt Lake City, UT, USA
View Profile

,
Ganesh Gopalakrishnan

University of Utah, Salt Lake City, UT, USA

University of Utah, Salt Lake City, UT, USA
View Profile

FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineeringNovember 2010Pages 187–196https://doi.org/10.1145/1882291.1882320

Published:07 November 2010Publication History

FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering

Pages 187–196

ABSTRACT

Interest in Graphical Processing Units (GPUs) is skyrocketing due to their potential to yield spectacular performance on many important computing applications. Unfortunately, writing such efficient GPU kernels requires painstaking manual optimization effort which is very error prone. We contribute the first comprehensive symbolic verifier for kernels written in CUDA C. Called the 'Prover of User GPU programs (PUG),' our tool efficiently and automatically analyzes real-world kernels using Satisfiability Modulo Theories (SMT) tools, detecting bugs such as data races, incorrectly synchronized barriers, bank conflicts, and wrong results. PUG's innovative ideas include a novel approach to symbolically encode thread interleavings, exact analysis for correct barrier placement, special methods for avoiding interleaving generation, dividing up the analysis over barrier intervals, and handling loops through three approaches: loop normalization, overapproximation, and invariant finding. PUG has analyzed over a hundred CUDA kernels from public distributions and in-house projects, finding bugs as well as subtle undocumented assumptions.

References

Aiken, A., and Gay, D. Barrier inference. In Symposium on the Principles of Programming Languages (POPL) (1998). Google ScholarDigital Library
Allen, R., and Kennedy, K. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, 2001. Google ScholarDigital Library
Boyer, M., Skadron, K., and Weimer, W. Automated dynamic analysis of CUDA programs. In Third Workshop on Software Tools for MultiCore Systems (2008).Google Scholar
Clarke, E. M., Grumberg, O., and Peled, D. A. Model Checking. MIT Press, 2000.Google ScholarDigital Library
Cobleigh, J. M., Clarke, L. A., and Osterweil, L. J. Flavers: A finite state verification technique for software systems. IBM Systems Journal 41, 1 (2002). Google ScholarDigital Library
Csallner, C., Tillmann, N., and Smaragdakis, Y. DySy: Dynamic symbolic execution for invariant inference. In International Conference on Software Engineering (ICSE) (2008), pp. 281--290. Google ScholarDigital Library
Cuda programming guide version 1.1. http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf.Google Scholar
Emerson, E. A., and Kahlon, V. Reducing model checking of the many to the few. In CADE (2000), pp. 236--254. Google ScholarDigital Library
Feng, M., and Leiserson, C. E. Efficient detection of determinacy races in cilk programs. In Parallel Algorithms and Architectures (SPAA) (1997). Google ScholarDigital Library
Fermi. http://www.nvidia.com/object/fermiarchitecture.html.Google Scholar
Flanagan, C., and Freund, S. N. Type-based race detection for Java. In Programming Language Design and Implementation (PLDI) (2000). Google ScholarDigital Library
Flanagan, C., and Godefroid, P. Dynamic partial-order reduction for model checking software. In Symposium on the Principles of Programming Languages (POPL) (2005), pp. 110--121. Google ScholarDigital Library
Gulwani, S. Speed: Symbolic complexity bound analysis. In Computer Aided Verification (CAV) (2009), pp. 51--62. Google ScholarDigital Library
Kirk, D. B., and mei W. Hwu, W. Programming Massively Parallel Processors. Morgan Kauffman, 2010. Google ScholarDigital Library
Lahiri, S. K., Qadeer, S., and Rakamaric, Z. Static and precise detection of concurrency errors in systems code using SMT solvers. In Computer Aided Verification (CAV) (2009), pp. 509--524. Google ScholarDigital Library
Li, G., and Gopalakrishnan, G. Technical Report and PUG Tool Download: http://www.cs.utah.edu/fv/PUG.Google Scholar
Li, G., Gopalakrishnan, G., Kirby, R. M., and Quinlan, D. A symbolic verifier for CUDA programs. In PPoPP, Poster Session (2010), pp. 357--358. Google ScholarDigital Library
Li, G., Palmer, R., DeLisi, M., Gopalakrishnan, G., and Kirby, R. M. Formal specification of MPI 2.0: Case study in specifying a practical concurrent programming API. Sci. Comp. Prog. 75 (2010). Google ScholarDigital Library
Nielson, F., Nielson, H. R., and Hankin, C. Principles of Program Analysis. Springer-Verlag, 1999. Google ScholarDigital Library
OpenCL. http://www.khronos.org/opencl.Google Scholar
The ROSE compiler. http://www.rosecompiler.org/.Google Scholar
Satisfiability Modulo Theories Competition (SMT-COMP). http://www.smtcomp.org/2009.Google Scholar
Tripakis, S., Stergiou, C., and Lublinerman, R. Checking non-interference in SPMD programs. In 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar) (2010).Google Scholar
Yices: An SMT solver. http://yices.csl.sri.com.Google Scholar

Index Terms

Scalable SMT-based verification of GPU kernel functions
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Formal software verification
  2. Software organization and properties
    1. Software functional properties
      1. Formal methods

Recommendations

Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems

The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...
Read More
SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

The efficiency of a pleasingly parallel application is studied for several computing platforms. A real world problem, i.e., Monte-Carlo numerical simulations of stratospheric balloon envelope drift descent is considered. We detail the optimization of ...
Read More
Pervasive massively multithreaded GPU processors
CF '09: Proceedings of the 6th ACM conference on Computing frontiers

This talk presents an overview of NVIDIA's SIMT architecture and some brief insights on how some CUDA programming paradigms map onto it. A brief history of SIMT is provided to explain how NVIDIA ended up implementing a unified SIMT processor core in its ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
November 2010
302 pages
ISBN:9781605587912
DOI:10.1145/1882291
General Chair:
Gruia-Catalin Roman
Washington University in St. Louis, USA
,
Program Chair:
André van der Hoek
University of California, Irvine, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
concurrency
cuda
formal verification
gpu
satisfiability modulo theories (decision procedures)
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate17of128submissions,13%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 99
  Total Citations
  View Citations
- 494
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable SMT-based verification of GPU kernel functions

FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

Pervasive massively multithreaded GPU processors