Abstract
GPU based computing has made significant strides in recent years. Unfortunately, GPU program optimizations can introduce subtle concurrency errors, and so incisive formal bug-hunting methods are essential. This paper presents a new formal bug-hunting method for GPU programs that combine barriers and atomics. We present an algorithm called c onflict-directed d elay-bounded scheduling algorithm (CD) that exploits the occurrence of conflicts among atomic synchronization commands to trigger the generation of alternate schedules; these alternate schedules are executed in a delay-bounded manner. We formally describe CD, and present two correctness checking methods, one based on final state comparison, and the other on user assertions. We evaluate our implementation on realistic GPU benchmarks, with encouraging results.
Supported by NSF CCF 1255776, OCI 1148127, and the Microsoft SEIF Award.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Attiya, H., Guerraoui, R., Hendler, D., Kuznetsov, P., Michael, M.M., Vechev, M.: Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated. In: POPL, pp. 487–498 (2011)
Betts, A., Chong, N., Donaldson, A.F., Qadeer, S., Thomson, P.: GPUVerify: a verifier for GPU kernels. In: OOPSLA, pp. 113–132 (2012)
Cadar, C., Dunbar, D., Engler, D.R.: KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: OSDI (2008)
Clarke, E.M., Grumberg, O., Peled, D.A.: Model Cheking. MIT Press (1999)
Collingbourne, P., Cadar, C., Kelly, P.H.J.: Symbolic testing of openCL code. In: Eder, K., Lourenço, J., Shehory, O. (eds.) HVC 2011. LNCS, vol. 7261, pp. 203–218. Springer, Heidelberg (2012)
Debattista, K., Dubla, P., dos Santos, L.P.P., Chalmers, A.: Wait-free shared-memory irradiance caching. Comp. Graphics and Applications 31(5), 66–78 (2011)
Emmi, M., Qadeer, S., Rakamarić, Z.: Delay-bounded scheduling. In: POPL, pp. 411–422 (2011)
Flanagan, C., Godefroid, P.: Dynamic partial-order reduction for model checking software. In: POPL, pp. 110–121 (2005)
Goetz, B., Bloch, J., Bowbeer, J., Lea, D., Holmes, D., Peierls, T.: Java Concurrency in Practice. Addison-Wesley Longman, Amsterdam (2006)
Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming (2008)
Hwu, W.-M.W.: GPU Computing Gems Emerald Edition (2011)
Lal, A., Reps, T.: Reducing concurrent analysis under a context bound to sequential analysis. Form. Methods Syst. Des. 35(1), 73–97 (2009)
Leung, A., Gupta, M., Agarwal, Y., Gupta, R., Jhala, R., Lerner, S.: Verifying GPU kernels by test amplification. In: PLDI, pp. 383–394 (2012)
Li, G., Gopalakrishnan, G.: Scalable SMT-based verification of GPU kernel functions. In: FSE, pp. 187–196 (2010)
Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.: GKLEE technical report, http://www.cs.utah.edu/fv/GKLEE/gklee_tr.pdf
Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I., Rajan, S.P.: GKLEE: concolic verification and test generation for GPUs. In: PPOPP (2012)
Li, P., Li, G., Gopalakrishnan, G.: Parametric flows: Automated behavior equivalencing for symbolic analysis of races in CUDA programs. In: SC (2012)
Méndez-Lojo, M., Burtscher, M., Pingali, K.: A GPU implementation of inclusion-based points-to analysis. In: PPOPP, pp. 107–116 (2012)
Musuvathi, M., Qadeer, S.: Iterative context bounding for systematic testing of multithreaded programs. In: PLDI, pp. 446–455 (2007)
CUDA implementation of the tree-based Barnes-Hut N-body algorithm, http://www.gpucomputing.net/?q=node/1314
Nguyen, H.: GPU Gems 3, 1st edn. Addison-Wesley Professional (2007)
Nvidia. CUDA parallel computing platform, http://www.nvidia.com/object/cuda_home_new.html
OpenCL. OpenCL - the open standard for parallel programming of heterogeneous systems, http://www.khronos.org/opencl
Sen, K.: Race directed random testing of concurrent programs. In: PLDI (2008)
Sen, K., Agha, G.: A race-detection and flipping algorithm for automated testing of multi-threaded programs. In: Bin, E., Ziv, A., Ur, S. (eds.) HVC 2006. LNCS, vol. 4383, pp. 166–182. Springer, Heidelberg (2007)
Sorin, D.J., Hill, M.D., Wood, D.A.: A Primer on Memory Consistency and Cache Coherence (2011)
Zheng, M., Ravi, V.T., Qin, F., Agrawal, G.: GRace: a low-overhead mechanism for detecting data races in GPU programs. In: PPOPP, pp. 135–146 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chiang, WF., Gopalakrishnan, G., Li, G., Rakamarić, Z. (2013). Formal Analysis of GPU Programs with Atomics via Conflict-Directed Delay-Bounding. In: Brat, G., Rungta, N., Venet, A. (eds) NASA Formal Methods. NFM 2013. Lecture Notes in Computer Science, vol 7871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38088-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-38088-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38087-7
Online ISBN: 978-3-642-38088-4
eBook Packages: Computer ScienceComputer Science (R0)