research-article

Open access

Detecting Data Races in OpenMP with Deep Learning and Large Language Models

Authors:

Liqiang WangAuthors Info & Claims

ICPP Workshops '24: Workshop Proceedings of the 53rd International Conference on Parallel Processing

Pages 96 - 103

https://doi.org/10.1145/3677333.3678160

Published: 12 August 2024 Publication History

All formats PDF

Abstract

Transformer-based neural network models are increasingly employed to handle software engineering issues, such as bug localization and program repair. These models, equipped with a self-attention mechanism, excel at understanding source code context and semantics. Recently, large language models (LLMs) have emerged as a promising alternative for analyzing and understanding code structure. In this paper, we propose two novel methods for detecting data race bugs in OpenMP programs. The first method is based on a transformer encoder trained from scratch. The second method leverages LLMs, specifically extending GPT-4 Turbo through the use of prompt engineering and fine-tuning techniques. For training and testing our approach, we utilized two datasets comprising different OpenMP directives. Our experiments show that the transformer encoder achieves competitive accuracy compared to LLMs, whether through fine-tuning or prompt engineering techniques. This performance may be attributed to the complexity of many OpenMP directives and the limited availability of labeled datasets.

References

[1]

Samuel F. Antao, Alexey Bataev, Arpith C. Jacob, Gheorghe-Teodor Bercea, Alexandre E. Eichenberger, Georgios Rokos, Matt Martineau, Tian Jin, Guray Ozen, Zehra Sura, Tong Chen, Hyojin Sung, Carlo Bertolli, and Kevin O’Brien. 2016. Offloading Support for OpenMP in Clang and LLVM. In 2016 Third Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). IEEE, 1–11. https://doi.org/10.1109/LLVM-HPC.2016.006

[2]

Simone Atzeni, Ganesh Gopalakrishnan, Zvonimir Rakamaric, Dong H. Ahn, Ignacio Laguna, Martin Schulz, Gregory L. Lee, Joachim Protze, and Matthias S. Müller. 2016. ARCHER: Effectively Spotting Data Races in Large OpenMP Applications. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 53–62. https://doi.org/10.1109/IPDPS.2016.68

[3]

Michael D. Bond, Katherine E. Coons, and Kathryn S. McKinley. 2010. PACER: proportional detection of data races. SIGPLAN Not. 45, 6 (jun 2010), 255–268. https://doi.org/10.1145/1809028.1806626

Digital Library

[4]

Jialun Cao, Meiziniu Li, Ming Wen, and Shing-Chi Cheung. 2023. A study on Prompt Design, Advantages and Limitations of ChatGPT for Deep Learning Program Repair. ArXiv abs/2304.08191 (04 2023). https://api.semanticscholar.org/CorpusID:258179639

[5]

J. Constine. 2013. NASDAQ’s Glitch Cost Facebook Investors $500M. Available: https://techcrunch.com/2013/03/25/ip-oh-my-gosh-all-that-money-just-disappeared/ [Accessed: 27-Dec-2017].

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/N19-1423

[7]

Elizabeth Dinella, Hanjun Dai, Ziyang Li, M. Naik, Le Song, and Ke Wang. 2020. Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs. In International Conference on Learning Representations. https://api.semanticscholar.org/CorpusID:213089769

[8]

D. Engler and K. Ashcraft. 2003. RacerX: Effective, Static Detection of Race Conditions and Deadlocks. Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ’03) (2003), 237–252. https://doi.org/10.1145/945445.945467

Digital Library

[9]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. ArXiv abs/2002.08155 (2020). https://api.semanticscholar.org/CorpusID:211171605

[10]

Q. Guo, J. Cao, X. Xie, S. Liu, X. Li, B. Chen, and X. Peng. 2023. Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study. Journal of Advanced Research in Artificial Intelligence and Machine Learning 8, 3 (2023).

[11]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (nov 1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Digital Library

[12]

Wei Hua, Yulei Sui, Yao Wan, Guangzhong Liu, and Guandong Xu. 2020. FCCA: Hybrid Code Representation for Functional Clone Detection Using Attention Networks. IEEE Transactions on Reliability 70 (2020), 304–318. https://api.semanticscholar.org/CorpusID:226421066

[13]

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv preprint arXiv:1909.09436 (2019).

[14]

Intel. [n. d.]. Intel® Inspector - Simplify Memory and Threading Error Debugging. https://www.intel.com/content/www/us/en/developer/tools/oneapi/inspector.html. Accessed: 2024-05-29.

[15]

A.T. Jamsaz, M. Khaleel, R. Akbari, and A. Jannesari. 2021. DeepRace: A Learning-Based Data Race Detector. In 2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 226–233.

[16]

Vineet Kahlon, Yu Yang, Sriram Sankaranarayanan, and Aarti Gupta. 2007. Fast and Accurate Static Data-Race Detection for Concurrent Programs. In Computer Aided Verification, Werner Damm and Holger Hermanns (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 226–239.

Digital Library

[17]

Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara G. Ryder. 2017. CCLearner: A Deep Learning-Based Clone Detection Approach. 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2017), 249–260. https://api.semanticscholar.org/CorpusID:1474148

[18]

Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 2008. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. SIGOPS Oper. Syst. Rev. 42, 2 (mar 2008), 329–339. https://doi.org/10.1145/1353535.1346323

Digital Library

[19]

Satish Narayanasamy, Zhenghao Wang, Jordan Tigani, Andrew Edwards, and Brad Calder. 2007. Automatically classifying benign and harmful data races using replay analysis. SIGPLAN Not. 42, 6 (jun 2007), 22–31. https://doi.org/10.1145/1273442.1250738

Digital Library

[20]

Robert O’Callahan and Jong-Deok Choi. 2003. Hybrid dynamic data race detection. SIGPLAN Not. 38, 10 (jun 2003), 167–178. https://doi.org/10.1145/966049.781528

Digital Library

[21]

Hao Peng, Lili Mou, Ge Li, Yuxuan Liu, Lu Zhang, and Zhi Jin. 2014. Building Program Vector Representations for Deep Learning. ArXiv abs/1409.3358 (2014). https://api.semanticscholar.org/CorpusID:13898232

[22]

Kevin Poulsen. 2004. Software bug contributed to blackout.

[23]

Yuan Yu, Tom Rodeheffer, and Wei Chen. 2005. RaceTrack: efficient detection of data race conditions via adaptive tracking. SIGOPS Oper. Syst. Rev. 39, 5 (oct 2005), 221–234. https://doi.org/10.1145/1095809.1095832

Digital Library

Index Terms

Detecting Data Races in OpenMP with Deep Learning and Large Language Models
1. Computing methodologies
  1. Artificial intelligence
  2. Parallel computing methodologies
    1. Parallel programming languages

Recommendations

An Empirical Analysis of Intel Thread Checker for Detecting Races in OpenMP Programs
ICIS '08: Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)

Races in OpenMP programs must be detected because they may cause unintended nondeterministic results of programs. The Intel Thread Checker can detect the races occurred in an execution of OpenMP program, but it has not been analyzed on the limitation of ...
Symbolic consistency checking of OpenMp parallel programs
LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems

We present a symbolic approach for checking consistency of OpenMP parallel programs. A parallel program is consistent if it yields the same result as its sequential version despite the execution order among threads. We find race conditions of an OpenMP ...
Mapping High-Level Concurrency from OpenMP and MPI to ThreadSanitizer Fibers
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

High-level parallel programming paradigms like MPI and OpenMP allow expressing concurrency independent from the execution unit finally executing the code. Most general-purpose data race detection tools perform thread-centric analyses with the operating ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP Workshops '24: Workshop Proceedings of the 53rd International Conference on Parallel Processing

August 2024

131 pages

ISBN:9798400718021

DOI:10.1145/3677333

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP Workshops '24

ICPP Workshops '24: The 53rd International Conference on Parallel Processing Workshops

August 12 - 15, 2024

Gotland, Sweden

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
115
Total Downloads

Downloads (Last 12 months)115
Downloads (Last 6 weeks)42

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents