ABSTRACT
While the use of AI assistants for code generation has the potential to revolutionize the way software is produced, assistants may generate insecure code, either by accident or as a result of poisoning attacks. They may also inadvertently violate copyright laws by mimicking code protected by restrictive licenses. We argue for the importance of tracking the provenance of AI-generated code in the software supply chain, so that adequate controls can be put in place to mitigate risks. For that, it is necessary to have techniques that can distinguish between human- and AI-generate code, and we conduct a case study in regards to whether such techniques can reliably work. We evaluate the effectiveness of lexical and syntactic features for distinguishing AI- and human-generated code on a standardized task. Results show accuracy up to 92%, suggesting that the problem deserves further investigation.
- 2021. Executive Order on Improving the Nation's Cybersecurity. https://www.whitehouse.gov/briefing-room/presidentialactions/2021/05/12/executive-order-on-improving-the-nations-cybersecurity/.Google Scholar
- 2022. Software Security in Supply Chains. https://www.nist.gov/itl/executiveorder-14028-improving-nations-cybersecurity/software-security-supplychainsGoogle Scholar
- 2023. Networkx 3.1 Documentation. https://networkx.org/documentation/stable/index.html.Google Scholar
- 2023. Scikit-Learn: Machine Learning in Python - Scikit-Learn 1.2.2 Documentation. https://scikit-learn.org/stable/.Google Scholar
- 2023. SigStore - Open Source Security Foundation. https://openssf.org/ community/sigstore/Google Scholar
- 2023. Tree-sitter | Introduction. https://tree-sitter.github.io/tree-sitter/.Google Scholar
- Adriana Sejfia and Max Schafer. 2022. Practical Automated Detection of Malicious Npm Packages. In ICSE.Google Scholar
- Paul E. Black, Vadim Okun, and Barbara Guttman. 2021. Guidelines on Minimum Standards for Developer Verification of Software. https://doi.org/10.6028/NIST. IR.8397Google ScholarCross Ref
- Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]Google Scholar
- Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. 2018. When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries. In NDSS. https://www.ndss-symposium.org/wp-content/uploads/sites/25/2018/ 02/ndss2018_06B-2_Caliskan_paper.pdfGoogle Scholar
- Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-Anonymizing Programmers via Code Stylometry. In USENIX Security Symposium.Google Scholar
- Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]Google Scholar
- George Christou, Grigoris Ntousakis, Eric Lahtinen, Sotiris Ioannidis, Vasileios P Kemerlis, and Nikos Vasilakis. 2023. BinWrap: Hybrid Protection against Native Node.Js Add-ons. In ACM AsiaCCS.Google Scholar
- Thomas Claburn. 2023. GitHub and OpenAI Fail to Wriggle out of Copilot Lawsuit. https://www.theregister.com/2023/05/12/github_microsoft_openai_copilot/.Google Scholar
- Lucian Constantin. 2020. SolarWinds Attack Explained: And Why It Was so Hard to Detect | CSO Online. https://www.csoonline.com/article/3601508/solarwindssupply-chain-attack-explained-why-organizations-were-not-prepared.html.Google Scholar
- Edwin Dauber, Robert Erbacher, Gregory Shearer, Michael Weisman, Frederica Nelson, and Rachel Greenstadt. 2021. Supervised Authorship Segmentation of Open Source Code Projects. Proceedings on Privacy Enhancing Technologies 2021, 4 (Oct. 2021), 464--479. https://petsymposium.org/popets/2021/popets-2021- 0080.phpGoogle ScholarCross Ref
- Thomas Dohmke. 2022. GitHub Copilot Is Generally Available to All Developers. https://github.blog/2022-06--21-github-copilot-is-generally-available-to-alldevelopers/.Google Scholar
- Duc Ly Vu, Zachary Newman, and John Speed Meyers. 2023. Bad Snakes: Understanding and Improving Python Package Index Malware Scanning. In ICSE.Google Scholar
- Emily Dreibelbis. 2023. OpenAI Quietly Shuts Down AI Text-Detection Tool Over Inaccuracies. https://www.pcmag.com/news/openai-quietly-shuts-downai-text-detection-tool-over-inaccuracies.Google Scholar
- Mark Chen et al. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 (July 2021). arXiv:2107.03374 [cs] http://arxiv.org/abs/2107. 03374Google Scholar
- Aurore Fass, Robert P. Krawczyk, Michael Backes, and Ben Stock. 2018. JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript. In Detection of Intrusions and Malware, and Vulnerability Assessment. Vol. 10885. Springer International Publishing, Cham, 303--325. http://link.springer.com/10.1007/978- 3--319--93411--2_14Google Scholar
- Shafi Goldwasser, Michael P. Kim, Vinod Vaikuntanathan, and Or Zamir. 2022. Planting Undetectable Backdoors in Machine Learning Models. arXiv:2204.06974 [cs] http://arxiv.org/abs/2204.06974Google Scholar
- Niels Hansen, Lorenzo De Carli, and Drew Davidson. 2020. Assessing Adaptive Attacks Against Trained JavaScript Classifiers. In Security and Privacy in Communication Networks. Vol. 335. Springer International Publishing, Cham, 190--210. https://link.springer.com/10.1007/978--3-030--63086--7_12Google Scholar
- Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. arXiv:2203.13474 (Sept. 2022). arXiv:2203.13474 [cs] http://arxiv.org/abs/2203.13474Google Scholar
- Chris O'Donnell. 2018. The "event-Stream" Vulnerability. https://medium.com/@codfish/the-event-stream-vulnerability-6acd4c515aae.Google Scholar
- Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 754--768.Google ScholarCross Ref
- Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2022. Do Users Write More Insecure Code with AI Assistants? arXiv:2211.03622 (Nov. 2022). arXiv:2211.03622 [cs] http://arxiv.org/abs/2211.03622Google Scholar
- Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA. https://www.usenix. org/conference/usenixsecurity23/presentation/sandovalGoogle Scholar
- Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: Data from the Security-focused User Study. https://doi.org/10.5281/ZENODO.7708658Google ScholarCross Ref
- Roei Schuster, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. 2021. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. In USENIX Security Symposium.Google Scholar
- Ruturaj K. Vaidya, Lorenzo De Carli, Drew Davidson, and Vaibhav Rastogi. 2019. Security Issues in Language-based Sofware Ecosystems. CoRR abs/1903.02613 (2019). arXiv:1903.02613 http://arxiv.org/abs/1903.02613Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs/1706.03762 (2017). arXiv:1706.03762 http://arxiv.org/abs/ 1706.03762Google ScholarDigital Library
- Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein. 2023. Poisoning Language Models During Instruction Tuning. arXiv:2305.00944 [cs.CL]Google Scholar
- Elizabeth Wyss, Lorenzo De Carli, and Drew Davidson. 2022. What the Fork?: Finding Hidden Code Clones in Npm. In ICSE.Google Scholar
- Elizabeth Wyss, Alexander Wittman, Drew Davidson, and Lorenzo De Carli. 2022. Wolf at the Door: Preventing Install-Time Attacks in Npm with Latch. In ACM AsiaCCS.Google Scholar
Index Terms
- Distinguishing AI- and Human-Generated Code: A Case Study
Recommendations
On the naturalness of auto-generated code: can we identify auto-generated code automatically?
ICPC '18: Proceedings of the 26th Conference on Program ComprehensionRecently, a variety of studies have been conducted on source code analysis. If auto-generated code is included in the target source code, it is usually removed in a preprocessing phase because the presence of auto-generated code may have negative ...
Are architectural smells independent from code smells? An empirical study
Highlights- Case study analyzing the correlations among code smells, groups of code smells and architectural smells.
AbstractBackground. Architectural smells and code smells are symptoms of bad code or design that can cause different quality problems, such as faults, technical debt, or difficulties with maintenance and evolution. Some studies ...
Code Artificiality: A Metric for the Code Stealth Based on an N-Gram Model
SPRO '15: Proceedings of the 2015 IEEE/ACM 1st International Workshop on Software ProtectionThis paper proposes a method for evaluating the artificiality of protected code by means of an N-gram model. The proposed artificiality metric helps us measure the stealth of the protected code, that is, the degree to which protected code can be ...
Comments