Skip to main content

Block-Size Independence for GPU Programs

  • Conference paper
  • First Online:
Static Analysis (SAS 2018)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11002))

Included in the following conference series:

Abstract

Optimizing GPU programs by tuning execution parameters is essential to realizing the full performance potential of GPU hardware. However, many of these optimizations do not ensure correctness and subtle errors can enter while optimizing a GPU program. Further, lack of formal models and the presence of non-trivial transformations prevent verification of optimizations.

In this work, we verify transformations involved in tuning the execution parameter, block-size. First, we present a formal programming and execution model for GPUs, and then formalize block-size independence of GPU programs, which ensures tuning block-size preserves program semantics. Next, we present an inter-procedural analysis to verify block-size independence for synchronization-free GPU programs. Finally, we evaluate the analysis on the Nvidia CUDA SDK samples, where 35 global kernels are verified to be block-size independent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Reads can be ignored because our __syncthreads()-free and race-free assumptions permit a thread to only read values it has written itself or are part of the initial state.

References

  1. Bergstra, J., Pinto, N., Cox, D.: Machine learning for predictive auto-tuning with boosted regression trees. In: 2012 Innovative Parallel Computing (InPar), pp. 1–9, May 2012

    Google Scholar 

  2. Betts, A., Chong, N., Donaldson, A., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. SIGPLAN Not. 47(10), 113–132 (2012). https://doi.org/10.1145/2398857.2384625

    Article  Google Scholar 

  3. Boyer, R.S., Elspas, B., Levitt, K.N.: SELECT - a formal system for testing and debugging programs by symbolic execution. In: Proceedings of the International Conference on Reliable Software, pp. 234–245. ACM, New York 1975). https://doi.org/10.1145/800027.808445

  4. Chen, G., Wu, B., Li, D., Shen, X.: PORPLE: an extensible optimizer for portable data placement on GPU. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 88–100. MICRO-47. IEEE Computer Society, Washington (2014). https://doi.org/10.1109/MICRO.2014.20

  5. Choi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 115–126. ACM, New York (2010). https://doi.org/10.1145/1693453.1693471

  6. Collingbourne, P., Cadar, C., Kelly, P.H.J.: Symbolic testing of OpenCL code. In: Eder, K., Lourenço, J., Shehory, O. (eds.) HVC 2011. LNCS, vol. 7261, pp. 203–218. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34188-5_18

    Chapter  Google Scholar 

  7. Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL 1977, pp. 238–252. ACM, New York (1977). https://doi.org/10.1145/512950.512973

  8. King, J.C.: A new approach to program testing. In: Proceedings of the International Conference on Reliable Software, pp. 228–233. ACM, New York (1975). https://doi.org/10.1145/800027.808444

  9. Kofler, K., Cosenza, B., Fahringer, T.: Automatic data layout optimizations for GPUs. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 263–274. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48096-0_21

    Chapter  Google Scholar 

  10. Leung, A., Gupta, M., Agarwal, Y., Gupta, R., Jhala, R., Lerner, S.: Verifying GPU kernels by test amplification. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2012, pp. 383–394. ACM, New York (2012). https://doi.org/10.1145/2254064.2254110

  11. Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2010, pp. 187–196. ACM, New York (2010). https://doi.org/10.1145/1882291.1882320

  12. Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2012, pp. 215–224. ACM, New York (2012). https://doi.org/10.1145/2145816.2145844

  13. Liu, Y., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for GPU program optimizations. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–10, May 2009

    Google Scholar 

  14. Magni, A., Dubach, C., O’Boyle, M.: Automatic optimization of thread-coarsening for graphics processors. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT 2014, pp. 455–466. ACM, New York (2014). https://doi.org/10.1145/2628071.2628087

  15. Miné, A.: Symbolic methods to enhance the precision of numerical abstract domains. In: Emerson, E.A., Namjoshi, K.S. (eds.) VMCAI 2006. LNCS, vol. 3855, pp. 348–363. Springer, Heidelberg (2005). https://doi.org/10.1007/11609773_23

    Chapter  Google Scholar 

  16. Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 111–125. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11515-8_10

    Chapter  Google Scholar 

  17. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. Queue 6(2), 40–53 (2008). https://doi.org/10.1145/1365490.1365500

    Article  Google Scholar 

  18. Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Heidelberg (2010)

    MATH  Google Scholar 

  19. Nvidia: Nvidia CUDA SDK. https://developer.nvidia.com/cuda-code-samples/

  20. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, pp. 519–530. ACM, New York (2013). https://doi.org/10.1145/2491956.2462176

  21. Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.Z., Stratton, J.A., Hwu, W.m.W.: Program optimization space pruning for a multithreaded GPU. In: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2008, pp. 195–204. ACM, New York (2008). https://doi.org/10.1145/1356058.1356084

  22. Sankaranarayanan, S., Ivančić, F., Gupta, A.: Program analysis using symbolic ranges. In: Nielson, H.R., Filé, G. (eds.) SAS 2007. LNCS, vol. 4634, pp. 366–383. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74061-2_23. http://dl.acm.org/citation.cfm?id=2391451.2391476

    Chapter  MATH  Google Scholar 

  23. Sørensen, H.H.B.: Auto-tuning dense vector and matrix-vector operations for Fermi GPUs. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7203, pp. 619–629. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31464-3_63

    Chapter  Google Scholar 

  24. Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. IEEE Des. Test 12(3), 66–73 (2010). https://doi.org/10.1109/MCSE.2010.69

    Article  Google Scholar 

  25. Venet, A.J.: The gauge domain: scalable analysis of linear inequality invariants. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 139–154. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31424-7_15

    Chapter  Google Scholar 

  26. Weber, N., Goesele, M.: MATOG: array layout auto-tuning for CUDA. ACM Trans. Archit. Code Optim. 14(3), 28:1–28:26 (2017). https://doi.org/10.1145/3106341

    Article  Google Scholar 

  27. Yang, Y., Xiang, P., Kong, J., Mantor, M., Zhou, H.: A unified optimizing compiler framework for different GPGPU architectures. ACM Trans. Archit. Code Optim. 9(2), 9:1–9:33 (2012). https://doi.org/10.1145/2207222.2207225

    Article  Google Scholar 

  28. Zhang, Y., Mueller, F.: Auto-generation and auto-tuning of 3D stencil codes on GPU clusters. In: Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO 2012, pp. 155–164. ACM, New York (2012). https://doi.org/10.1145/2259016.2259037

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nimit Singhania .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alur, R., Devietti, J., Singhania, N. (2018). Block-Size Independence for GPU Programs. In: Podelski, A. (eds) Static Analysis. SAS 2018. Lecture Notes in Computer Science(), vol 11002. Springer, Cham. https://doi.org/10.1007/978-3-319-99725-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99725-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99724-7

  • Online ISBN: 978-3-319-99725-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics