skip to main content
10.1145/3605770.3625215acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Distinguishing AI- and Human-Generated Code: A Case Study

Published:26 November 2023Publication History

ABSTRACT

While the use of AI assistants for code generation has the potential to revolutionize the way software is produced, assistants may generate insecure code, either by accident or as a result of poisoning attacks. They may also inadvertently violate copyright laws by mimicking code protected by restrictive licenses. We argue for the importance of tracking the provenance of AI-generated code in the software supply chain, so that adequate controls can be put in place to mitigate risks. For that, it is necessary to have techniques that can distinguish between human- and AI-generate code, and we conduct a case study in regards to whether such techniques can reliably work. We evaluate the effectiveness of lexical and syntactic features for distinguishing AI- and human-generated code on a standardized task. Results show accuracy up to 92%, suggesting that the problem deserves further investigation.

References

  1. 2021. Executive Order on Improving the Nation's Cybersecurity. https://www.whitehouse.gov/briefing-room/presidentialactions/2021/05/12/executive-order-on-improving-the-nations-cybersecurity/.Google ScholarGoogle Scholar
  2. 2022. Software Security in Supply Chains. https://www.nist.gov/itl/executiveorder-14028-improving-nations-cybersecurity/software-security-supplychainsGoogle ScholarGoogle Scholar
  3. 2023. Networkx 3.1 Documentation. https://networkx.org/documentation/stable/index.html.Google ScholarGoogle Scholar
  4. 2023. Scikit-Learn: Machine Learning in Python - Scikit-Learn 1.2.2 Documentation. https://scikit-learn.org/stable/.Google ScholarGoogle Scholar
  5. 2023. SigStore - Open Source Security Foundation. https://openssf.org/ community/sigstore/Google ScholarGoogle Scholar
  6. 2023. Tree-sitter | Introduction. https://tree-sitter.github.io/tree-sitter/.Google ScholarGoogle Scholar
  7. Adriana Sejfia and Max Schafer. 2022. Practical Automated Detection of Malicious Npm Packages. In ICSE.Google ScholarGoogle Scholar
  8. Paul E. Black, Vadim Okun, and Barbara Guttman. 2021. Guidelines on Minimum Standards for Developer Verification of Software. https://doi.org/10.6028/NIST. IR.8397Google ScholarGoogle ScholarCross RefCross Ref
  9. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]Google ScholarGoogle Scholar
  10. Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. 2018. When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries. In NDSS. https://www.ndss-symposium.org/wp-content/uploads/sites/25/2018/ 02/ndss2018_06B-2_Caliskan_paper.pdfGoogle ScholarGoogle Scholar
  11. Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-Anonymizing Programmers via Code Stylometry. In USENIX Security Symposium.Google ScholarGoogle Scholar
  12. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]Google ScholarGoogle Scholar
  13. George Christou, Grigoris Ntousakis, Eric Lahtinen, Sotiris Ioannidis, Vasileios P Kemerlis, and Nikos Vasilakis. 2023. BinWrap: Hybrid Protection against Native Node.Js Add-ons. In ACM AsiaCCS.Google ScholarGoogle Scholar
  14. Thomas Claburn. 2023. GitHub and OpenAI Fail to Wriggle out of Copilot Lawsuit. https://www.theregister.com/2023/05/12/github_microsoft_openai_copilot/.Google ScholarGoogle Scholar
  15. Lucian Constantin. 2020. SolarWinds Attack Explained: And Why It Was so Hard to Detect | CSO Online. https://www.csoonline.com/article/3601508/solarwindssupply-chain-attack-explained-why-organizations-were-not-prepared.html.Google ScholarGoogle Scholar
  16. Edwin Dauber, Robert Erbacher, Gregory Shearer, Michael Weisman, Frederica Nelson, and Rachel Greenstadt. 2021. Supervised Authorship Segmentation of Open Source Code Projects. Proceedings on Privacy Enhancing Technologies 2021, 4 (Oct. 2021), 464--479. https://petsymposium.org/popets/2021/popets-2021- 0080.phpGoogle ScholarGoogle ScholarCross RefCross Ref
  17. Thomas Dohmke. 2022. GitHub Copilot Is Generally Available to All Developers. https://github.blog/2022-06--21-github-copilot-is-generally-available-to-alldevelopers/.Google ScholarGoogle Scholar
  18. Duc Ly Vu, Zachary Newman, and John Speed Meyers. 2023. Bad Snakes: Understanding and Improving Python Package Index Malware Scanning. In ICSE.Google ScholarGoogle Scholar
  19. Emily Dreibelbis. 2023. OpenAI Quietly Shuts Down AI Text-Detection Tool Over Inaccuracies. https://www.pcmag.com/news/openai-quietly-shuts-downai-text-detection-tool-over-inaccuracies.Google ScholarGoogle Scholar
  20. Mark Chen et al. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 (July 2021). arXiv:2107.03374 [cs] http://arxiv.org/abs/2107. 03374Google ScholarGoogle Scholar
  21. Aurore Fass, Robert P. Krawczyk, Michael Backes, and Ben Stock. 2018. JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript. In Detection of Intrusions and Malware, and Vulnerability Assessment. Vol. 10885. Springer International Publishing, Cham, 303--325. http://link.springer.com/10.1007/978- 3--319--93411--2_14Google ScholarGoogle Scholar
  22. Shafi Goldwasser, Michael P. Kim, Vinod Vaikuntanathan, and Or Zamir. 2022. Planting Undetectable Backdoors in Machine Learning Models. arXiv:2204.06974 [cs] http://arxiv.org/abs/2204.06974Google ScholarGoogle Scholar
  23. Niels Hansen, Lorenzo De Carli, and Drew Davidson. 2020. Assessing Adaptive Attacks Against Trained JavaScript Classifiers. In Security and Privacy in Communication Networks. Vol. 335. Springer International Publishing, Cham, 190--210. https://link.springer.com/10.1007/978--3-030--63086--7_12Google ScholarGoogle Scholar
  24. Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. arXiv:2203.13474 (Sept. 2022). arXiv:2203.13474 [cs] http://arxiv.org/abs/2203.13474Google ScholarGoogle Scholar
  25. Chris O'Donnell. 2018. The "event-Stream" Vulnerability. https://medium.com/@codfish/the-event-stream-vulnerability-6acd4c515aae.Google ScholarGoogle Scholar
  26. Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 754--768.Google ScholarGoogle ScholarCross RefCross Ref
  27. Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2022. Do Users Write More Insecure Code with AI Assistants? arXiv:2211.03622 (Nov. 2022). arXiv:2211.03622 [cs] http://arxiv.org/abs/2211.03622Google ScholarGoogle Scholar
  28. Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA. https://www.usenix. org/conference/usenixsecurity23/presentation/sandovalGoogle ScholarGoogle Scholar
  29. Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: Data from the Security-focused User Study. https://doi.org/10.5281/ZENODO.7708658Google ScholarGoogle ScholarCross RefCross Ref
  30. Roei Schuster, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. 2021. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. In USENIX Security Symposium.Google ScholarGoogle Scholar
  31. Ruturaj K. Vaidya, Lorenzo De Carli, Drew Davidson, and Vaibhav Rastogi. 2019. Security Issues in Language-based Sofware Ecosystems. CoRR abs/1903.02613 (2019). arXiv:1903.02613 http://arxiv.org/abs/1903.02613Google ScholarGoogle Scholar
  32. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs/1706.03762 (2017). arXiv:1706.03762 http://arxiv.org/abs/ 1706.03762Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein. 2023. Poisoning Language Models During Instruction Tuning. arXiv:2305.00944 [cs.CL]Google ScholarGoogle Scholar
  34. Elizabeth Wyss, Lorenzo De Carli, and Drew Davidson. 2022. What the Fork?: Finding Hidden Code Clones in Npm. In ICSE.Google ScholarGoogle Scholar
  35. Elizabeth Wyss, Alexander Wittman, Drew Davidson, and Lorenzo De Carli. 2022. Wolf at the Door: Preventing Install-Time Attacks in Npm with Latch. In ACM AsiaCCS.Google ScholarGoogle Scholar

Index Terms

  1. Distinguishing AI- and Human-Generated Code: A Case Study

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SCORED '23: Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses
        November 2023
        111 pages
        ISBN:9798400702631
        DOI:10.1145/3605770

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 November 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA
      • Article Metrics

        • Downloads (Last 12 months)120
        • Downloads (Last 6 weeks)26

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader