Block-Size Independence for GPU Programs

Alur, Rajeev; Devietti, Joseph; Singhania, Nimit

doi:10.1007/978-3-319-99725-4_9

Rajeev Alur¹⁴,
Joseph Devietti¹⁴ &
Nimit Singhania¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11002))

Included in the following conference series:

International Static Analysis Symposium

990 Accesses
5 Citations

Abstract

Optimizing GPU programs by tuning execution parameters is essential to realizing the full performance potential of GPU hardware. However, many of these optimizations do not ensure correctness and subtle errors can enter while optimizing a GPU program. Further, lack of formal models and the presence of non-trivial transformations prevent verification of optimizations.

In this work, we verify transformations involved in tuning the execution parameter, block-size. First, we present a formal programming and execution model for GPUs, and then formalize block-size independence of GPU programs, which ensures tuning block-size preserves program semantics. Next, we present an inter-procedural analysis to verify block-size independence for synchronization-free GPU programs. Finally, we evaluate the analysis on the Nvidia CUDA SDK samples, where 35 global kernels are verified to be block-size independent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Reads can be ignored because our __syncthreads()-free and race-free assumptions permit a thread to only read values it has written itself or are part of the initial state.

References

Bergstra, J., Pinto, N., Cox, D.: Machine learning for predictive auto-tuning with boosted regression trees. In: 2012 Innovative Parallel Computing (InPar), pp. 1–9, May 2012
Google Scholar
Betts, A., Chong, N., Donaldson, A., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. SIGPLAN Not. 47(10), 113–132 (2012). https://doi.org/10.1145/2398857.2384625
Article Google Scholar
Boyer, R.S., Elspas, B., Levitt, K.N.: SELECT - a formal system for testing and debugging programs by symbolic execution. In: Proceedings of the International Conference on Reliable Software, pp. 234–245. ACM, New York 1975). https://doi.org/10.1145/800027.808445
Chen, G., Wu, B., Li, D., Shen, X.: PORPLE: an extensible optimizer for portable data placement on GPU. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 88–100. MICRO-47. IEEE Computer Society, Washington (2014). https://doi.org/10.1109/MICRO.2014.20
Choi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 115–126. ACM, New York (2010). https://doi.org/10.1145/1693453.1693471
Collingbourne, P., Cadar, C., Kelly, P.H.J.: Symbolic testing of OpenCL code. In: Eder, K., Lourenço, J., Shehory, O. (eds.) HVC 2011. LNCS, vol. 7261, pp. 203–218. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34188-5_18
Chapter Google Scholar
Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL 1977, pp. 238–252. ACM, New York (1977). https://doi.org/10.1145/512950.512973
King, J.C.: A new approach to program testing. In: Proceedings of the International Conference on Reliable Software, pp. 228–233. ACM, New York (1975). https://doi.org/10.1145/800027.808444
Kofler, K., Cosenza, B., Fahringer, T.: Automatic data layout optimizations for GPUs. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 263–274. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48096-0_21
Chapter Google Scholar
Leung, A., Gupta, M., Agarwal, Y., Gupta, R., Jhala, R., Lerner, S.: Verifying GPU kernels by test amplification. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2012, pp. 383–394. ACM, New York (2012). https://doi.org/10.1145/2254064.2254110
Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2010, pp. 187–196. ACM, New York (2010). https://doi.org/10.1145/1882291.1882320
Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2012, pp. 215–224. ACM, New York (2012). https://doi.org/10.1145/2145816.2145844
Liu, Y., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for GPU program optimizations. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–10, May 2009
Google Scholar
Magni, A., Dubach, C., O’Boyle, M.: Automatic optimization of thread-coarsening for graphics processors. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT 2014, pp. 455–466. ACM, New York (2014). https://doi.org/10.1145/2628071.2628087
Miné, A.: Symbolic methods to enhance the precision of numerical abstract domains. In: Emerson, E.A., Namjoshi, K.S. (eds.) VMCAI 2006. LNCS, vol. 3855, pp. 348–363. Springer, Heidelberg (2005). https://doi.org/10.1007/11609773_23
Chapter Google Scholar
Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 111–125. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11515-8_10
Chapter Google Scholar
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. Queue 6(2), 40–53 (2008). https://doi.org/10.1145/1365490.1365500
Article Google Scholar
Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Heidelberg (2010)
MATH Google Scholar
Nvidia: Nvidia CUDA SDK. https://developer.nvidia.com/cuda-code-samples/
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, pp. 519–530. ACM, New York (2013). https://doi.org/10.1145/2491956.2462176
Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.Z., Stratton, J.A., Hwu, W.m.W.: Program optimization space pruning for a multithreaded GPU. In: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2008, pp. 195–204. ACM, New York (2008). https://doi.org/10.1145/1356058.1356084
Sankaranarayanan, S., Ivančić, F., Gupta, A.: Program analysis using symbolic ranges. In: Nielson, H.R., Filé, G. (eds.) SAS 2007. LNCS, vol. 4634, pp. 366–383. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74061-2_23. http://dl.acm.org/citation.cfm?id=2391451.2391476
Chapter MATH Google Scholar
Sørensen, H.H.B.: Auto-tuning dense vector and matrix-vector operations for Fermi GPUs. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7203, pp. 619–629. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31464-3_63
Chapter Google Scholar
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. IEEE Des. Test 12(3), 66–73 (2010). https://doi.org/10.1109/MCSE.2010.69
Article Google Scholar
Venet, A.J.: The gauge domain: scalable analysis of linear inequality invariants. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 139–154. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31424-7_15
Chapter Google Scholar
Weber, N., Goesele, M.: MATOG: array layout auto-tuning for CUDA. ACM Trans. Archit. Code Optim. 14(3), 28:1–28:26 (2017). https://doi.org/10.1145/3106341
Article Google Scholar
Yang, Y., Xiang, P., Kong, J., Mantor, M., Zhou, H.: A unified optimizing compiler framework for different GPGPU architectures. ACM Trans. Archit. Code Optim. 9(2), 9:1–9:33 (2012). https://doi.org/10.1145/2207222.2207225
Article Google Scholar
Zhang, Y., Mueller, F.: Auto-generation and auto-tuning of 3D stencil codes on GPU clusters. In: Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO 2012, pp. 155–164. ACM, New York (2012). https://doi.org/10.1145/2259016.2259037

Download references

Author information

Authors and Affiliations

University of Pennsylvania, Philadelphia, USA
Rajeev Alur, Joseph Devietti & Nimit Singhania

Authors

Rajeev Alur
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Devietti
View author publications
You can also search for this author in PubMed Google Scholar
Nimit Singhania
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nimit Singhania .

Editor information

Editors and Affiliations

Universität Freiburg, Freiburg, Germany
Andreas Podelski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alur, R., Devietti, J., Singhania, N. (2018). Block-Size Independence for GPU Programs. In: Podelski, A. (eds) Static Analysis. SAS 2018. Lecture Notes in Computer Science(), vol 11002. Springer, Cham. https://doi.org/10.1007/978-3-319-99725-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-99725-4_9
Published: 29 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99724-7
Online ISBN: 978-3-319-99725-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics