research-article

Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects

Authors:

Balreet Grewal,

Cor-Paul BezemerAuthors Info & Claims

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

Pages 157 - 161

https://doi.org/10.1145/3643991.3645072

Published: 02 July 2024 Publication History

Abstract

The rapid development of large language models such as ChatGPT have made them particularly useful to developers in generating code snippets for their projects. To understand how ChatGPT's generated code is leveraged by developers, we conducted an empirical study of 3,044 ChatGPT-generated code snippets integrated within GitHub projects. A median of 54% of the generated lines of code is found in the project's code and this code typically remains unchanged once added. The modifications of the 76 code snippets that changed in a subsequent commit, consisted of minor functionality changes and code reorganizations that were made within a day. Our findings offer insights that help drive the development of AI-assisted programming tools. We highlight the importance of making changes in ChatGPT code before integrating it into a project.

References

[1]

2023. DevGPT: Studying Developer-ChatGPT Conversations. https://github.com/NAIST-SE/DevGPT/tree/35d906d957026f3db282b19dcc5074e399010725.

[2]

2023. LeetCode. https://leetcode.com

[3]

2023. Refactoring for interactivity. (#8). https://github.com/hoshotakamoto/banzukesurfing/commit/90e1d68ddc8d3a2caa076ee4d423484bf0a742f3. Accessed: December 1, 2023.

[4]

2023. simpler approach to replace all spotify embedded players. https://github.com/OKinane/spotify-to-youtube-chrome-extension/commit/5d8f6f8f5c2457348f5739888b5d5bd4260ac8cb. Accessed: December 1, 2023.

[5]

2023. Update Vagrantfile. https://github.com/fabricesemti80/work.ansible-prometheus-stack/commit/96c4f63bbdba293001c540f663337a0dec41e71c. Accessed: December 1, 2023.

[6]

Naser Al Madi. 2023. How Readable is Model-Generated Code? Examining Readability and Visual Inspection of GitHub Copilot. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE '22). Association for Computing Machinery, New York, NY, USA, Article 205, 5 pages.

Digital Library

[7]

Carlos Dantas, Adriano Rocha, and Marcelo Maia. 2023. Assessing the Readability of ChatGPT Code Snippet Recommendations: A Comparative Study. In Proceedings of the XXXVII Brazilian Symposium on Software Engineering (Campo Grande, Brazil) (SBES '23). Association for Computing Machinery, New York, NY, USA, 283--292.

Digital Library

[8]

Wido de Vries. 2017. python-Levenshtein. https://pypi.org/project/python-Levenshtein/

[9]

Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated Repair of Programs from Large Language Models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1469--1481.

Digital Library

[10]

Balreet Grewal, Wentao Lu, Sarah Nadi, and Cor-Paul Bezemer. 2023. Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects Replication Package.

[11]

Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. arXiv:2305.01210 [cs.SE]

[12]

Yue Liu, Thanh Le-Cong, Ratnadira Widyasari, Chakkrit Tantithamthavorn, Li Li, Xuan-Bach D. Le, and David Lo. 2023. Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues. arXiv:2307.12596 [cs.SE]

[13]

Zhijie Liu, Yutian Tang, Xiapu Luo, Yuming Zhou, and Liang Feng Zhang. 2023. No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT. arXiv:2308.04838 [cs.SE]

[14]

Nascimento Nathalia, Alencar Paulo, and Cowan Donald. 2023. Artificial Intelligence vs. Software Engineers: An Empirical Study on Performance and Efficiency Using ChatGPT. In Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering (Las Vegas, NV, USA) (CASCON '23). IBM Corp., USA, 24--33.

Digital Library

[15]

OpenAI. 2023. ChatGPT. https://www.openai.com/

[16]

Julian Aron Prenner, Hlib Babii, and Romain Robbes. 2022. Can OpenAI's Codex Fix Bugs? An Evaluation on QuixBugs. In Proceedings of the Third International Workshop on Automated Program Repair (Pittsburgh, Pennsylvania) (APR '22). Association for Computing Machinery, New York, NY, USA, 69--75.

Digital Library

[17]

Francisco Ribeiro. 2023. Large Language Models for Automated Program Repair. In Companion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (Cascais, Portugal) (SPLASH 2023). Association for Computing Machinery, New York, NY, USA, 7--9.

Digital Library

[18]

Fardin Ahsan Sakib, Saadat Hasan Khan, and A. H. M. Rezaul Karim. 2023. Extending the Frontier of ChatGPT: Code Generation and Debugging. arXiv:2307.08260 [cs.SE]

[19]

Nakatani Shuyo. 2011. langdetect. https://pypi.org/project/langdetect/

[20]

Mohammed Latif Siddiq, Joanna C. S. Santos, Ridwanul Hasan Tanvir, Noshin Ulfat, Fahmid Al Rifat, and Vinicius Carvalho Lopes. 2023. An Empirical Study of Using Large Language Models for Unit Test Generation. arXiv:2305.00418 [cs.SE]

[21]

Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 908--911.

Digital Library

[22]

Donna Spencer and Todd Warfel. 2004. Card sorting: a definitive guide. Boxes and arrows 2, 2004 (2004), 1--23.

[23]

Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA '22). Association for Computing Machinery, New York, NY, USA, Article 332, 7 pages.

Digital Library

[24]

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery.

Digital Library

[25]

Tao Xiao, Christoph Treude, Hideaki Hata, and Kenichi Matsumoto. 2024. DevGPT: Studying Developer-ChatGPT Conversations. In Proceedings of the International Conference on Mining Software Repositories (MSR 2024).

Digital Library

[26]

Burak Yetistiren, Isik Ozsoy, and Eray Tuzun. 2022. Assessing the quality of GitHub copilot's code generation. In Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering. 62--71.

Digital Library

[27]

Zhiqiang Yuan, Yiling Lou, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, and Xin Peng. 2023. No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation. arXiv:2305.04207 [cs.SE]

[28]

Quanjun Zhang, Tongke Zhang, Juan Zhai, Chunrong Fang, Bowen Yu, Weisong Sun, and Zhenyu Chen. 2023. A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair. arXiv:2310.08879 [cs.SE]

[29]

Li Zhong and Zilong Wang. 2023. Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation. arXiv:2308.10335 [cs.CL]

Recommendations

An Empirical Study on the Occurrences of Code Smells in Open Source and Industrial Projects
ESEM '22: Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Background: Reusing source code containing code smells can induce significant amount of maintenance time and cost. A list of code smells has been identified in the literature and developers are encouraged to avoid the smells from the very beginning ...
Studying the fix-time for bugs in large open source projects
Promise '11: Proceedings of the 7th International Conference on Predictive Models in Software Engineering

Background: Bug fixing lies at the core of most software maintenance efforts. Most prior studies examine the effort needed to fix a bug (fix-effort). However, the effort needed to fix a bug may not correlate with the calendar time needed to fix it (fix-...
An empirical analysis of reopened bugs based on open source projects
EASE '16: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering

Background: Bug fixing is a long-term and time-consuming activity. A software bug experiences a typical life cycle from newly reported to finally closed by developers, but it could be reopened afterwards for further actions due to reasons such as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

April 2024

788 pages

ISBN:9798400705878

DOI:10.1145/3643991

Chair:
Diomidis Spinellis,
Program Chair:
Alberto Bacchelli,
Program Co-chair:
Eleni Constantinou

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Qualifiers

Research-article

Conference

MSR '24

Sponsor:

SIGSOFT

MSR '24: 21st International Conference on Mining Software Repositories

April 15 - 16, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
58
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)5

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents