skip to main content
10.1145/1882291.1882320acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Scalable SMT-based verification of GPU kernel functions

Published:07 November 2010Publication History

ABSTRACT

Interest in Graphical Processing Units (GPUs) is skyrocketing due to their potential to yield spectacular performance on many important computing applications. Unfortunately, writing such efficient GPU kernels requires painstaking manual optimization effort which is very error prone. We contribute the first comprehensive symbolic verifier for kernels written in CUDA C. Called the 'Prover of User GPU programs (PUG),' our tool efficiently and automatically analyzes real-world kernels using Satisfiability Modulo Theories (SMT) tools, detecting bugs such as data races, incorrectly synchronized barriers, bank conflicts, and wrong results. PUG's innovative ideas include a novel approach to symbolically encode thread interleavings, exact analysis for correct barrier placement, special methods for avoiding interleaving generation, dividing up the analysis over barrier intervals, and handling loops through three approaches: loop normalization, overapproximation, and invariant finding. PUG has analyzed over a hundred CUDA kernels from public distributions and in-house projects, finding bugs as well as subtle undocumented assumptions.

References

  1. Aiken, A., and Gay, D. Barrier inference. In Symposium on the Principles of Programming Languages (POPL) (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Allen, R., and Kennedy, K. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Boyer, M., Skadron, K., and Weimer, W. Automated dynamic analysis of CUDA programs. In Third Workshop on Software Tools for MultiCore Systems (2008).Google ScholarGoogle Scholar
  4. Clarke, E. M., Grumberg, O., and Peled, D. A. Model Checking. MIT Press, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cobleigh, J. M., Clarke, L. A., and Osterweil, L. J. Flavers: A finite state verification technique for software systems. IBM Systems Journal 41, 1 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Csallner, C., Tillmann, N., and Smaragdakis, Y. DySy: Dynamic symbolic execution for invariant inference. In International Conference on Software Engineering (ICSE) (2008), pp. 281--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cuda programming guide version 1.1. http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf.Google ScholarGoogle Scholar
  8. Emerson, E. A., and Kahlon, V. Reducing model checking of the many to the few. In CADE (2000), pp. 236--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Feng, M., and Leiserson, C. E. Efficient detection of determinacy races in cilk programs. In Parallel Algorithms and Architectures (SPAA) (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fermi. http://www.nvidia.com/object/fermiarchitecture.html.Google ScholarGoogle Scholar
  11. Flanagan, C., and Freund, S. N. Type-based race detection for Java. In Programming Language Design and Implementation (PLDI) (2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Flanagan, C., and Godefroid, P. Dynamic partial-order reduction for model checking software. In Symposium on the Principles of Programming Languages (POPL) (2005), pp. 110--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gulwani, S. Speed: Symbolic complexity bound analysis. In Computer Aided Verification (CAV) (2009), pp. 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kirk, D. B., and mei W. Hwu, W. Programming Massively Parallel Processors. Morgan Kauffman, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lahiri, S. K., Qadeer, S., and Rakamaric, Z. Static and precise detection of concurrency errors in systems code using SMT solvers. In Computer Aided Verification (CAV) (2009), pp. 509--524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Li, G., and Gopalakrishnan, G. Technical Report and PUG Tool Download: http://www.cs.utah.edu/fv/PUG.Google ScholarGoogle Scholar
  17. Li, G., Gopalakrishnan, G., Kirby, R. M., and Quinlan, D. A symbolic verifier for CUDA programs. In PPoPP, Poster Session (2010), pp. 357--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Li, G., Palmer, R., DeLisi, M., Gopalakrishnan, G., and Kirby, R. M. Formal specification of MPI 2.0: Case study in specifying a practical concurrent programming API. Sci. Comp. Prog. 75 (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nielson, F., Nielson, H. R., and Hankin, C. Principles of Program Analysis. Springer-Verlag, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. OpenCL. http://www.khronos.org/opencl.Google ScholarGoogle Scholar
  21. The ROSE compiler. http://www.rosecompiler.org/.Google ScholarGoogle Scholar
  22. Satisfiability Modulo Theories Competition (SMT-COMP). http://www.smtcomp.org/2009.Google ScholarGoogle Scholar
  23. Tripakis, S., Stergiou, C., and Lublinerman, R. Checking non-interference in SPMD programs. In 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar) (2010).Google ScholarGoogle Scholar
  24. Yices: An SMT solver. http://yices.csl.sri.com.Google ScholarGoogle Scholar

Index Terms

  1. Scalable SMT-based verification of GPU kernel functions

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
        November 2010
        302 pages
        ISBN:9781605587912
        DOI:10.1145/1882291

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 November 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate17of128submissions,13%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader