research-article

Open access

GPUHarbor: Testing GPU Memory Consistency at Large (Experience Paper)

Authors:

Tyler SorensenAuthors Info & Claims

ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 779 - 791

https://doi.org/10.1145/3597926.3598095

Published: 13 July 2023 Publication History

Abstract

Memory consistency specifications (MCSs) are a difficult, yet critical, part of a concurrent programming framework. Existing MCS testing tools are not immediately accessible, and thus, have only been applied to a limited number of devices. However, in the post-Dennard scaling landscape, there has been an explosion of new architectures and frameworks. Studying the shared memory behaviors of these new platforms is important to understand their behavior and ensure conformance to framework specifications.

In this paper, we present GPUHarbor, a widescale GPU MCS testing tool with a web interface and an Android app. Using GPUHarbor, we deployed a testing campaign that checks conformance and characterizes weak behaviors. We advertised GPUHarbor on forums and social media, allowing us to collect testing data from 106 devices, spanning seven vendors. In terms of devices tested, this constitutes the largest study on weak memory behaviors by at least 10×, and our conformance tests identified two new bugs on embedded Arm and NVIDIA devices. Analyzing our characterization data yields many insights, including quantifying and comparing weak behavior occurrence rates (e.g., AMD GPUs show 25.3× more weak behaviors on average than Intel). We conclude with a discussion of the impact our results have on software development for these performance-critical devices.

References

[1]

Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, and John Wickerson. 2015. GPU concurrency: Weak behaviours and programming assumptions. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (ASPLOS ’15). Association for Computing Machinery, 577–591. isbn:9781450328357 https://doi.org/10.1145/2694344.2694391

Digital Library

[2]

Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. 2010. Fences in weak memory models. In Computer Aided Verification, Tayssir Touili, Byron Cook, and Paul Jackson (Eds.). Springer Berlin Heidelberg, 258–272. isbn:978-3-642-14295-6

[3]

Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. 2011. Litmus: Running tests against hardware. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS). 6605, 41–44. isbn:978-3-642-19834-2 https://doi.org/10.1007/978-3-642-19835-9_5

[4]

Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. Herding cats: Modelling, simulation, testing, and data mining for weak memory. Trans. Program. Lang. Syst. (TOPLAS), 36, 2 (2014), Article 7, July, 74 pages. issn:0164-0925 https://doi.org/10.1145/2627752

Digital Library

[5]

David P. Anderson, Jeff Cobb, Eric Korpela, Matt Lebofsky, and Dan Werthimer. 2002. SETI@home: An experiment in public-resource computing. Commun. ACM, 45, 11 (2002), nov, 56–61. issn:0001-0782 https://doi.org/10.1145/581571.581573

Digital Library

[6]

Apple. 2023. Metal. https://developer.apple.com/documentation/metal/ Retrieved February 2023

[7]

Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. 2011. Mathematizing C++ concurrency. In Symposium on Principles of Programming Languages (POPL) (POPL ’11). Association for Computing Machinery, 55–66. isbn:9781450304900 https://doi.org/10.1145/1926385.1926394

Digital Library

[8]

Timothy A. Budd and Ajei S. Gopal. 1985. Program testing by specification mutation. Computer Languages, 10, 1 (1985), 63–73. issn:0096-0551 https://doi.org/10.1016/0096-0551(85)90011-6

Digital Library

[9]

C. Christensen, T. Aina, and D. Stainforth. 2005. The challenge of volunteer computing with lengthy climate model simulations. In First International Conference on e-Science and Grid Computing (e-Science’05). 8 pp.–15. https://doi.org/10.1109/E-SCIENCE.2005.76

Digital Library

[10]

Travis Desell. 2017. Large scale evolution of convolutional neural networks using volunteer computing. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO ’17). Association for Computing Machinery, 127–128. isbn:9781450349390 https://doi.org/10.1145/3067695.3076002

Digital Library

[11]

Alastair F. Donaldson, Hugues Evrard, Andrei Lascu, and Paul Thomson. 2017. Automated testing of graphics shader compilers. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 93, oct, 29 pages. https://doi.org/10.1145/3133917

Digital Library

[12]

Wu-chun Feng and Shucai Xiao. 2010. To GPU synchronize or not GPU synchronize? In 2010 IEEE International Symposium on Circuits and Systems (ISCAS). 3801–3804. https://doi.org/10.1109/ISCAS.2010.5537722

[13]

Esther Francis. 2014. Autonomous cars: no longer just science fiction.

[14]

Google. 2023. Android NDK. https://developer.android.com/ndk

[15]

Google. 2023. Clspv. https://github.com/google/clspv

[16]

Google. 2023. Dart. https://dart.dev/

[17]

Google. 2023. Flutter. https://flutter.dev/

[18]

S. Hangal, D. Vahia, C. Manovit, J.-Y.J. Lu, and S. Narayanan. 2004. TSOtool: A program for verifying memory systems using the memory consistency model. In International Symposium on Computer Architecture (ISCA), 2004. 114–123. https://doi.org/10.1109/ISCA.2004.1310768

[19]

Jeff Bolz. 2022. Vulkan memory model. https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#memory-model

[20]

Khronos Group. 2021. SPIR-V specification version 1.6, revision 1. https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html

[21]

Khronos Group. 2022. The OpenCL C Specification. https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html

[22]

Khronos Group. 2022. Vulkan 1.3 Core API.

[23]

Khronos Group. 2023. MoltenVK. https://github.com/KhronosGroup/MoltenVK

[24]

Khronos Group. 2023. SPIRV-Cross. https://github.com/KhronosGroup/SPIRV-Cross

[25]

Jake Kirkham, Tyler Sorensen, Esin Tureci, and Margaret Martonosi. 2020. Foundations of empirical memory consistency testing. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 226, Nov., 29 pages. https://doi.org/10.1145/3428294

Digital Library

[26]

Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21, 7 (1978), July, 558–565. issn:0001-0782 https://doi.org/10.1145/359545.359563

Digital Library

[27]

Reese Levine, Mingun Cho, Devon McKee, Andrew Quinn, and Tyler Sorensen. 2023. GPUHarbor: Testing GPU Memory Consistency At Large (Experience Paper): Artifact. https://doi.org/10.5281/zenodo.7922486

Digital Library

[28]

Reese Levine, Tianhao Guo, Mingun Cho, Alan Baker, Raph Levien, David Neto, Andrew Quinn, and Tyler Sorensen. 2023. MC mutants: Evaluating and improving testing for memory consistency specifications. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS 2023). Association for Computing Machinery, 473–488. isbn:9781450399166 https://doi.org/10.1145/3575693.3575750

Digital Library

[29]

Sela Mador-Haim, Rajeev Alur, and Milo M K. Martin. 2010. Generating litmus tests for contrasting memory consistency models. In Proceedings of the 22nd International Conference on Computer Aided Verification (CAV’10). Springer-Verlag, 273–287. isbn:364214294X https://doi.org/10.1007/978-3-642-14295-6_26

Digital Library

[30]

Yatin A. Manerkar, Daniel Lustig, Margaret Martonosi, and Michael Pellauer. 2017. RTLcheck: Verifying the memory consistency of RTL designs. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 ’17). Association for Computing Machinery, 463–476. isbn:9781450349529 https://doi.org/10.1145/3123939.3124536

Digital Library

[31]

Yatin A. Manerkar, Caroline Trippel, Daniel Lustig, Michael Pellauer, and Margaret Martonosi. 2016. Counterexamples and proof loophole for the C/C++ to POWER and ARMv7 trailing-sync compiler mappings. arxiv:1611.01507. 2016

[32]

Duane Merrill and Michael Garland. 2016. Single-pass parallel prefix scan with decoupled lookback. https://research.nvidia.com/publication/2016-03_single-pass-parallel-prefix-scan-decoupled-look-back

[33]

Microsoft. 2020. Programming guide for Direct3D 11. https://docs.microsoft.com/en-us/windows/win32/direct3d11/dx-graphics-overviews

[34]

Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, David A. Wood, and Natalie Enright Jerger. 2020. A primer on memory consistency and cache coherence (2nd ed.). Morgan & Claypool Publishers. isbn:1681737094

[35]

Feng Niu, Benjamin Recht, Christopher Re, and Stephen J. Wright. 2011. HOGWILD! A lock-free approach to parallelizing stochastic gradient descent. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11). Curran Associates Inc., 693–701. isbn:9781618395993

[36]

NVIDIA. 2023. CUDA C++ programming guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/

[37]

Özgün Özerk, Can Elgezen, Ahmet Can Mert, Erdinç Öztürk, and Erkay Savaş. 2022. Efficient number theoretic transform implementation on GPU for homomorphic encryption. J. Supercomput., 78, 2 (2022), feb, 2840–2872. issn:0920-8542 https://doi.org/10.1007/s11227-021-03980-5

Digital Library

[38]

Mohit Pandey, Michael Fernandez, Francesco Gentile, Olexandr Isayev, Alexander Tropsha, Abraham C Stern, and Artem Cherkasov. 2022. The transformational role of GPU computing and deep learning in drug discovery. Nature Machine Intelligence, 4, 3 (2022), 211–221.

[39]

S. K. Park and K. W. Miller. 1988. Random number generators: Good ones are hard to find. Commun. ACM, 31, 10 (1988), oct, 1192–1201. issn:0001-0782 https://doi.org/10.1145/63039.63042

Digital Library

[40]

S. Pellicer, N. Ahmed, Yi Pan, and Yao Zheng. 2005. Gene sequence alignment on a public computing platform. In 2005 International Conference on Parallel Processing Workshops (ICPPW’05). 95–102. https://doi.org/10.1109/ICPPW.2005.35

Digital Library

[41]

Lakshminarayanan Renganarayana, Vijayalakshmi Srinivasan, Ravi Nair, and Daniel Prener. 2012. Programming with relaxed synchronization. In Proceedings of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (RACES ’12). Association for Computing Machinery, 41–50. isbn:9781450316323 https://doi.org/10.1145/2414729.2414737

Digital Library

[42]

Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2019. Survey and benchmarking of machine learning accelerators. In 2019 IEEE High Performance Extreme Computing Conference (HPEC). 1–9. https://doi.org/10.1109/HPEC.2019.8916327

[43]

Mehrzad Samadi, Janghaeng Lee, D. Anoushe Jamshidi, Amir Hormati, and Scott Mahlke. 2013. SAGE: Self-tuning approximation for graphics engines. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). Association for Computing Machinery, 13–24. isbn:9781450326384 https://doi.org/10.1145/2540708.2540711

Digital Library

[44]

Susmit Sarkar, Peter Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant, Magnus O. Myreen, and Jade Alglave. 2009. The semantics of x86-CC multiprocessor machine code. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’09). Association for Computing Machinery, 379–391. isbn:9781605583792 https://doi.org/10.1145/1480881.1480929

Digital Library

[45]

Dennis Shasha and Marc Snir. 1988. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10, 2 (1988), April, 282–312. issn:0164-0925 https://doi.org/10.1145/42190.42277

Digital Library

[46]

Tyler Sorensen and Alastair F. Donaldson. 2016. Exposing errors related to weak memory in GPU applications. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, 100–113. isbn:9781450342612 https://doi.org/10.1145/2908080.2908114

Digital Library

[47]

Tyler Sorensen and Alastair F. Donaldson. 2016. The hitchhiker’s guide to cross-platform OpenCL application development. In Proceedings of the 4th International Workshop on OpenCL (IWOCL ’16). Association for Computing Machinery, Article 2, 12 pages. isbn:9781450343381 https://doi.org/10.1145/2909437.2909440

Digital Library

[48]

John Wickerson, Mark Batty, Tyler Sorensen, and George A. Constantinides. 2017. Automatically comparing memory consistency models. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL ’17). Association for Computing Machinery, 190–204. isbn:9781450346603 https://doi.org/10.1145/3009837.3009838

Digital Library

[49]

William W. Collier. 1994. ARCHTEST. http://www.mpdiag.com/archtest.html

[50]

World Wide Web Consortium (W3C). 2022. WebGPU shading language: Editor’s draft. https://gpuweb.github.io/gpuweb/wgsl/

[51]

World Wide Web Consortium (W3C). 2023. WebGPU: W3C working draft. https://www.w3.org/TR/webgpu/

[52]

World Wide Web Consortium (W3C). 2023. WebGPU: W3C working draft: Privacy considerations. https://www.w3.org/TR/webgpu/####privacy-considerations

Cited By

Lungu NTembo SSasmal GRout SGourisaria MPatra S(2024)Precision Countermeasures for GPU Side Channels Using Shader Execution Pattern Analysis2024 3rd International Conference on Artificial Intelligence For Internet of Things (AIIoT)10.1109/AIIoT58432.2024.10574735(1-6)Online publication date: 3-May-2024
https://doi.org/10.1109/AIIoT58432.2024.10574735

Index Terms

GPUHarbor: Testing GPU Memory Consistency at Large (Experience Paper)
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
  2. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Empirical software validation

Recommendations

Mixed-proxy extensions for the NVIDIA PTX memory consistency model: industrial product
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

In recent years, there has been a trend towards the use of accelerators and architectural specialization to continue scaling performance in spite of a slowing of Moore's Law. GPUs have always relied on dedicated hardware for graphics workloads, but ...
MC Mutants: Evaluating and Improving Testing for Memory Consistency Specifications
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

Shared memory platforms provide a memory consistency specification (MCS) so that developers can reason about the behaviors of their parallel programs. Unfortunately, ensuring that a platform conforms to its MCS is difficult, as is exemplified by ...
Foundations of empirical memory consistency testing

Modern memory consistency models are complex, and it is difficult to reason about the relaxed behaviors that current systems allow. Programming languages, such as C and OpenCL, offer a memory model interface that developers can use to safely write ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2023

1554 pages

ISBN:9798400702211

DOI:10.1145/3597926

General Chair:
René Just
University of Washington, USA
,
Program Chair:
Gordon Fraser
University of Passau, Germany

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Google

Conference

ISSTA '23

Sponsor:

SIGSOFT

ISSTA '23: 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

July 17 - 21, 2023

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
537
Total Downloads

Downloads (Last 12 months)324
Downloads (Last 6 weeks)36

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lungu NTembo SSasmal GRout SGourisaria MPatra S(2024)Precision Countermeasures for GPU Side Channels Using Shader Execution Pattern Analysis2024 3rd International Conference on Artificial Intelligence For Internet of Things (AIIoT)10.1109/AIIoT58432.2024.10574735(1-6)Online publication date: 3-May-2024
https://doi.org/10.1109/AIIoT58432.2024.10574735

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten