research-article

Distinguishing AI- and Human-Generated Code: A Case Study

Authors:
Sufiyan Bukhari

University of Calgary, Calgary, Canada

University of Calgary, Calgary, Canada

0009-0008-5103-9067
View Profile

,
Benjamin Tan

University of Calgary, Calgary, Canada

University of Calgary, Calgary, Canada

0000-0002-7642-3638
View Profile

,
Lorenzo De Carli

University of Calgary, Calgary, Canada

University of Calgary, Calgary, Canada

0000-0003-0432-3686
View Profile

SCORED '23: Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem DefensesNovember 2023Pages 17–25https://doi.org/10.1145/3605770.3625215

Published:26 November 2023Publication History

SCORED '23: Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses

Pages 17–25

ABSTRACT

While the use of AI assistants for code generation has the potential to revolutionize the way software is produced, assistants may generate insecure code, either by accident or as a result of poisoning attacks. They may also inadvertently violate copyright laws by mimicking code protected by restrictive licenses. We argue for the importance of tracking the provenance of AI-generated code in the software supply chain, so that adequate controls can be put in place to mitigate risks. For that, it is necessary to have techniques that can distinguish between human- and AI-generate code, and we conduct a case study in regards to whether such techniques can reliably work. We evaluate the effectiveness of lexical and syntactic features for distinguishing AI- and human-generated code on a standardized task. Results show accuracy up to 92%, suggesting that the problem deserves further investigation.

References

2021. Executive Order on Improving the Nation's Cybersecurity. https://www.whitehouse.gov/briefing-room/presidentialactions/2021/05/12/executive-order-on-improving-the-nations-cybersecurity/.Google Scholar
2022. Software Security in Supply Chains. https://www.nist.gov/itl/executiveorder-14028-improving-nations-cybersecurity/software-security-supplychainsGoogle Scholar
2023. Networkx 3.1 Documentation. https://networkx.org/documentation/stable/index.html.Google Scholar
2023. Scikit-Learn: Machine Learning in Python - Scikit-Learn 1.2.2 Documentation. https://scikit-learn.org/stable/.Google Scholar
2023. SigStore - Open Source Security Foundation. https://openssf.org/ community/sigstore/Google Scholar
2023. Tree-sitter | Introduction. https://tree-sitter.github.io/tree-sitter/.Google Scholar
Adriana Sejfia and Max Schafer. 2022. Practical Automated Detection of Malicious Npm Packages. In ICSE.Google Scholar
Paul E. Black, Vadim Okun, and Barbara Guttman. 2021. Guidelines on Minimum Standards for Developer Verification of Software. https://doi.org/10.6028/NIST. IR.8397Google ScholarCross Ref
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]Google Scholar
Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. 2018. When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries. In NDSS. https://www.ndss-symposium.org/wp-content/uploads/sites/25/2018/ 02/ndss2018_06B-2_Caliskan_paper.pdfGoogle Scholar
Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-Anonymizing Programmers via Code Stylometry. In USENIX Security Symposium.Google Scholar
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]Google Scholar
George Christou, Grigoris Ntousakis, Eric Lahtinen, Sotiris Ioannidis, Vasileios P Kemerlis, and Nikos Vasilakis. 2023. BinWrap: Hybrid Protection against Native Node.Js Add-ons. In ACM AsiaCCS.Google Scholar
Thomas Claburn. 2023. GitHub and OpenAI Fail to Wriggle out of Copilot Lawsuit. https://www.theregister.com/2023/05/12/github_microsoft_openai_copilot/.Google Scholar
Lucian Constantin. 2020. SolarWinds Attack Explained: And Why It Was so Hard to Detect | CSO Online. https://www.csoonline.com/article/3601508/solarwindssupply-chain-attack-explained-why-organizations-were-not-prepared.html.Google Scholar
Edwin Dauber, Robert Erbacher, Gregory Shearer, Michael Weisman, Frederica Nelson, and Rachel Greenstadt. 2021. Supervised Authorship Segmentation of Open Source Code Projects. Proceedings on Privacy Enhancing Technologies 2021, 4 (Oct. 2021), 464--479. https://petsymposium.org/popets/2021/popets-2021- 0080.phpGoogle ScholarCross Ref
Thomas Dohmke. 2022. GitHub Copilot Is Generally Available to All Developers. https://github.blog/2022-06--21-github-copilot-is-generally-available-to-alldevelopers/.Google Scholar
Duc Ly Vu, Zachary Newman, and John Speed Meyers. 2023. Bad Snakes: Understanding and Improving Python Package Index Malware Scanning. In ICSE.Google Scholar
Emily Dreibelbis. 2023. OpenAI Quietly Shuts Down AI Text-Detection Tool Over Inaccuracies. https://www.pcmag.com/news/openai-quietly-shuts-downai-text-detection-tool-over-inaccuracies.Google Scholar
Mark Chen et al. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 (July 2021). arXiv:2107.03374 [cs] http://arxiv.org/abs/2107. 03374Google Scholar
Aurore Fass, Robert P. Krawczyk, Michael Backes, and Ben Stock. 2018. JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript. In Detection of Intrusions and Malware, and Vulnerability Assessment. Vol. 10885. Springer International Publishing, Cham, 303--325. http://link.springer.com/10.1007/978- 3--319--93411--2_14Google Scholar
Shafi Goldwasser, Michael P. Kim, Vinod Vaikuntanathan, and Or Zamir. 2022. Planting Undetectable Backdoors in Machine Learning Models. arXiv:2204.06974 [cs] http://arxiv.org/abs/2204.06974Google Scholar
Niels Hansen, Lorenzo De Carli, and Drew Davidson. 2020. Assessing Adaptive Attacks Against Trained JavaScript Classifiers. In Security and Privacy in Communication Networks. Vol. 335. Springer International Publishing, Cham, 190--210. https://link.springer.com/10.1007/978--3-030--63086--7_12Google Scholar
Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. arXiv:2203.13474 (Sept. 2022). arXiv:2203.13474 [cs] http://arxiv.org/abs/2203.13474Google Scholar
Chris O'Donnell. 2018. The "event-Stream" Vulnerability. https://medium.com/@codfish/the-event-stream-vulnerability-6acd4c515aae.Google Scholar
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 754--768.Google ScholarCross Ref
Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2022. Do Users Write More Insecure Code with AI Assistants? arXiv:2211.03622 (Nov. 2022). arXiv:2211.03622 [cs] http://arxiv.org/abs/2211.03622Google Scholar
Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA. https://www.usenix. org/conference/usenixsecurity23/presentation/sandovalGoogle Scholar
Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: Data from the Security-focused User Study. https://doi.org/10.5281/ZENODO.7708658Google ScholarCross Ref
Roei Schuster, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. 2021. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. In USENIX Security Symposium.Google Scholar
Ruturaj K. Vaidya, Lorenzo De Carli, Drew Davidson, and Vaibhav Rastogi. 2019. Security Issues in Language-based Sofware Ecosystems. CoRR abs/1903.02613 (2019). arXiv:1903.02613 http://arxiv.org/abs/1903.02613Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs/1706.03762 (2017). arXiv:1706.03762 http://arxiv.org/abs/ 1706.03762Google ScholarDigital Library
Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein. 2023. Poisoning Language Models During Instruction Tuning. arXiv:2305.00944 [cs.CL]Google Scholar
Elizabeth Wyss, Lorenzo De Carli, and Drew Davidson. 2022. What the Fork?: Finding Hidden Code Clones in Npm. In ICSE.Google Scholar
Elizabeth Wyss, Alexander Wittman, Drew Davidson, and Lorenzo De Carli. 2022. Wolf at the Door: Preventing Install-Time Attacks in Npm with Latch. In ACM AsiaCCS.Google Scholar

Index Terms

Distinguishing AI- and Human-Generated Code: A Case Study
1. Security and privacy
  1. Software and application security
2. Software and its engineering
  1. Software creation and management
    1. Software development process management
      1. Risk management

Recommendations

On the naturalness of auto-generated code: can we identify auto-generated code automatically?
ICPC '18: Proceedings of the 26th Conference on Program Comprehension

Recently, a variety of studies have been conducted on source code analysis. If auto-generated code is included in the target source code, it is usually removed in a preprocessing phase because the presence of auto-generated code may have negative ...
Read More
Are architectural smells independent from code smells? An empirical study
Highlights
- Case study analyzing the correlations among code smells, groups of code smells and architectural smells.
Abstract
Background. Architectural smells and code smells are symptoms of bad code or design that can cause different quality problems, such as faults, technical debt, or difficulties with maintenance and evolution. Some studies ...
Read More
Code Artificiality: A Metric for the Code Stealth Based on an N-Gram Model
SPRO '15: Proceedings of the 2015 IEEE/ACM 1st International Workshop on Software Protection

This paper proposes a method for evaluating the artificiality of protected code by means of an N-gram model. The proposed artificiality metric helps us measure the stealth of the protected code, that is, the degree to which protected code can be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SCORED '23: Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses
November 2023
111 pages
ISBN:9798400702631
DOI:10.1145/3605770
General Chairs:
Santiago Torres-Arias
Purdue University, USA
,
Marcela Melara
Intel Corporation, USA
,
Laurent Simon
Google Inc., USA
,
Program Chairs:
Nikos Vasilakis
Brown University
,
Kathleen Moriarty
Center for Internet Security
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ai code generation
program analysis
supply chain security
Qualifiers
- research-article
Conference
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 120
  Total Downloads
- Downloads (Last 12 months)120
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Distinguishing AI- and Human-Generated Code: A Case Study

SCORED '23: Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses

ABSTRACT

References

Cited By

Index Terms

Recommendations

On the naturalness of auto-generated code: can we identify auto-generated code automatically?

Are architectural smells independent from code smells? An empirical study

Code Artificiality: A Metric for the Code Stealth Based on an N-Gram Model