research-article

Pointer-Based Divergence Analysis for OpenCL 2.0 Programs

Authors:

Jenq-Kuen LeeAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 8, Issue 4

Article No.: 20, Pages 1 - 23

https://doi.org/10.1145/3470644

Published: 15 October 2021 Publication History

Get Access

Abstract

A modern GPU is designed with many large thread groups to achieve a high throughput and performance. Within these groups, the threads are grouped into fixed-size SIMD batches in which the same instruction is applied to vectors of data in a lockstep. This GPU architecture is suitable for applications with a high degree of data parallelism, but its performance degrades seriously when divergence occurs. Many optimizations for divergence have been proposed, and they vary with the divergence information about variables and branches. A previous analysis scheme viewed pointers and return values from functions as divergence directly, and only focused on OpenCL 1.x. In this article, we present a novel scheme that reports the divergence information for pointer-intensive OpenCL programs. The approach is based on extended static single assignment (SSA) and adds some special functions and annotations from memory SSA and gated SSA. The proposed scheme first constructs extended SSA, which is then used to build a divergence relation graph that includes all of the possible points-to relationships of the pointers and initialized divergence states. The divergence state of the pointers can be determined by propagating the divergence state of the divergence relation graph. The scheme is further extended for interprocedural cases by considering function-related statements. The proposed scheme was implemented in an LLVM compiler and can be applied to OpenCL programs. We analyzed 10 programs with 24 kernels, with a total analyzed program size of 1,306 instructions in an LLVM intermediate representation, with 885 variables, 108 branches, and 313 pointer-related statements. The total number of divergent pointers detected was 146 for the proposed scheme, 200 for the scheme in which the pointer was always divergent, and 155 for the current LLVM default scheme; the total numbers of divergent variables detected were 458, 519, and 482, respectively, with 31, 34, and 32 divergent branches. These experimental results indicate that the proposed scheme is more precise than both a scheme in which a pointer is always divergent and the current LLVM default scheme.

References

[1]

Jayvant Anantpur and R. Govindarajan.2014. Taming control divergence in GPUs through control flow linearization. In Compiler Construction, Albert Cohen (Ed.). Springer, Berlin, Germany, 133–153.

Abstract

References

Cited By

Index Terms

Recommendations

Semi-sparse flow-sensitive pointer analysis

Support of Probabilistic Pointer Analysis in the SSA Form

Semi-sparse flow-sensitive pointer analysis

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

HTML Format

Share

Share this Publication link

Share on social media

Affiliations