Skip to main content

Advertisement

Log in

BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Using few-shot demonstrations in prompts significantly enhances the generation quality of large language models (LLMs), including code generation. However, adversarial examples injected by malicious service providers via few-shot prompting pose a risk of backdoor attacks in large language models. There is no research on backdoor attacks on large language models in the few-shot prompting setting for code generation tasks. In this paper, we propose BadCodePrompt, the first backdoor attack for code generation tasks targeting LLMS in the few-shot prompting scenario, without requiring access to training data or model parameters and with lower computational overhead. BadCodePrompt exploits the insertion of triggers and poisonous code patterns into examples, causing the output of poisonous source code when there is a backdoor trigger in the end user’s query prompt. We demonstrate the effectiveness of BadCodePrompt in conducting backdoor attacks on three LLMS (GPT-4, Claude-3.5-Sonnet, and Gemini Pro-1.5) in code generation tasks without affecting the functionality of the generated code. LLMs with stronger reasoning capabilities are also more vulnerable to BadCodePrompt, with an average attack success rate of up to 98.53% for GPT-4 in two benchmark tasks. Finally, we employ state-of-the-art defenses against backdoor attacks in Prompt Engineering and show their overall ineffectiveness against BadCodePrompt. Therefore, BadCodePrompt remains a serious threat to LLMS, underscoring the urgency of developing effective defense mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availibility

No datasets were generated or analysed during the current study.

Notes

  1. For example, https://www.fiverr.com/gigs/ai-prompt.

References

  • Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., Sutton, C.: Program Synthesis with Large Language Models. arXiv:2108.07732 [cs] (2021). Accessed 18 March 2024

  • Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems (2020)

  • Cai, X., Xu, H., Xu, S., Zhang, Y., Yuan, X.: BadPrompt: Backdoor attacks on continuous prompts. In: Advances in Neural Information Processing Systems (2022)

  • Cai, H., Zhang, P., Dong, H., Xiao, Y., Koffas, S., Li, Y.: Towards stealthy backdoor attacks against speech recognition via elements of sound. arXiv:2307.08208 (2023)

  • Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning (2017). https://arxiv.org/abs/1712.05526v1

  • Chen, X., Salem, A., Chen, D., Backes, M., Ma, S., Shen, Q., Wu, Z., Zhang, Y.: BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements. In: Annual Computer Security Applications Conference, pp. 554–569 (2021a). https://doi.org/10.1145/3485832.3485837. arXiv:2006.01043. Accessed 26 May 2023

  • Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F.P., Cummings, D., Plappert, M., Chantzis, F., Barnes, E., Herbert-Voss, A., Guss, W.H., Nichol, A., Paino, A., Tezak, N., Tang, J., Babuschkin, I., Balaji, S., Jain, S., Saunders, W., Hesse, C., Carr, A.N., Leike, J., Achiam, J., Misra, V., Morikawa, E., Radford, A., Knight, M., Brundage, M., Murati, M., Mayer, K., Welinder, P., McGrew, B., Amodei, D., McCandlish, S., Sutskever, I., Zaremba, W.: Evaluating large language models trained on code (2021b). https://doi.org/10.48550/arXiv.2107.03374

  • Conti, M., Dragoni, N., Lesyk, V.: A survey of man in the middle attacks. IEEE Commun. Surv. Tutor. 18(3), 2027–2051 (2016)

    Article  MATH  Google Scholar 

  • Contributors, D.: DriveLM: Drive on language (2023). https://github.com/OpenDriveLab/DriveLM

  • Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access Pract. Innov. Open Solut. 7, 138872–138878 (2019)

    Google Scholar 

  • Du, W., Liu, G.: A survey of backdoor attack in deep learning. J. Cyber Secur. 7(3), 1–16 (2022). https://doi.org/10.19363/J.CNKI.CN10-1380/TN.2022.05.01

    Article  MATH  Google Scholar 

  • Fried, D., Aghajanyan, A., Lin, J., Wang, S., Wallace, E., Shi, F., Zhong, R., Yih, W.-t., Zettlemoyer, L., Lewis, M.: InCoder: A Generative Model for Code Infilling and Synthesis. arXiv:2204.05999 [cs] (2023)

  • Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: BadNets: Evaluating backdooring attacks on deep neural networks. IEEE Access Pract. Innov. Open Solut. 7, 47230–47244 (2019)

    Google Scholar 

  • Hong, S., Carlini, N., Kurakin, A.: Handcrafted backdoors in deep neural networks. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022)

  • Huang, K., Li, Y., Wu, B., Qin, Z., Ren, K.: Backdoor defense via decoupling the training process. In: International Conference on Learning Representations (ICLR) (2022)

  • Jiang, X., Dong, Y., Wang, L., Zheng, F., Shang, Q., Li, G., Jin, Z., Jiao, W.: Self-planning code generation with large language models. ACM Trans. Softw. Eng. Methodol. (2023)

  • Kandpal, N., Jagielski, M., Tramèr, F., Carlini, N.: Backdoor attacks for in-context learning with language models. In: The Second Workshop on New Frontiers in Adversarial Machine Learning (2023)

  • Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022)

  • Li, Y., Choi, D., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., Eccles, T., Keeling, J., Gimeno, F., Lago, A.D., Hubert, T., Choy, P., d’Autume, C.d.M., Babuschkin, I., Chen, X., Huang, P.-S., Welbl, J., Gowal, S., Cherepanov, A., Molloy, J., Mankowitz, D.J., Robson, E.S., Kohli, P., Freitas, N., Kavukcuoglu, K., Vinyals, O.: Competition-level code generation with AlphaCode. Science (New York, N.Y.) 378(6624), 1092–1097 (2022). https://doi.org/10.1126/science.abq1158

  • Li, J., Zhao, Y., Li, Y., Li, G., Jin, Z.: Acecoder: Utilizing existing code to enhance code generation. arXiv preprint arXiv:2303.17780 (2023)

  • Liu, Y., Ma, S., Aafer, Y., Lee, W.-C., Zhai, J., Wang, W., Zhang, X.: Trojaning attack on neural networks (2018). https://doi.org/10.14722/NDSS.2018.23291

  • Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., Chen, W.: What makes good in-context examples for GPT-3? arXiv preprint arXiv:2101.06804 (2021)

  • Lou, Q., Liu, Y., Feng, B.: TrojText: Test-time invisible textual trojan insertion. In: The Eleventh International Conference on Learning Representations (2023)

  • Mei, K., Li, Z., Wang, Z., Zhang, Y., Ma, S.: NOTABLE: Transferable backdoor attacks against prompt-based NLP models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers) (2023)

  • Miller, D.J., Xiang, Z., Kesidis, G.: Adversarial learning in statistical classification: a comprehensive review of defenses against attacks. Proc. IEEE 108, 402–433 (2020)

    Article  MATH  Google Scholar 

  • Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., Zettlemoyer, L.: Rethinking the role of demonstrations: What makes in-context learning work? In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (2022)

  • Moor, M., Huang, Q., Wu, S., Yasunaga, M., Zakka, C., Dalmia, Y., Reis, E.P., Rajpurkar, P., Leskovec, J.: Med-flamingo: a multimodal medical few-shot learner. arXiv:2307.15189 [cs.CV] (2023)

  • Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., Xiong, C.: CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. arXiv:2203.13474 [cs] (2023)

  • OpenAI: GPT-4 technical report. arXiv (2023)

  • Panda, A., Zhang, Z., Yang, Y., Mittal, P.: Teach GPT to phish. In: The Second Workshop on New Frontiers in Adversarial Machine Learning (2023)

  • Penedo, G., Malartic, Q., Hesslow, D., Cojocaru, R., Cappelli, A., Alobeidli, H., Pannier, B., Almazrouei, E., Launay, J.: The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116 (2023)

  • Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer (2021a). https://doi.org/10.48550/arXiv.2110.07139. arXiv:2110.07139. Accessed 25 April 2023

  • Qi, F., Li, M., Chen, Y., Zhang, Z., Liu, Z., Wang, Y., Sun, M.: Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger. arXiv:2105.12400 [cs] (2021b). Accessed 08 May 2024

  • Qi, X., Zhu, J., Xie, C., Yang, Y.: Subnet Replacement: Deployment-stage backdoor attack against deep neural networks in gray-box setting. In: International Conference on Learning Representations (ICLR) Workshop on Security and Safety in Machine Learning Systems (2021c)

  • Ren, S., Guo, D., Lu, S., Zhou, L., Liu, S., Tang, D., Sundaresan, N., Zhou, M., Blanco, A., Ma, S.: Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297 (2020)

  • Shen, B., Zhang, J., Chen, T., Zan, D., Geng, B., Fu, A., Zeng, M., Yu, A., Ji, J., Zhao, J., Guo, Y., Wang, Q.: PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback. arXiv (2023)

  • Sonoda, Y., Kurokawa, R., Nakamura, Y., Kanzawa, J., Kurokawa, M., Ohizumi, Y., Gonoi, W., Abe, O.: Diagnostic performances of GPT-4o, claude 3 opus, and gemini 1.5 pro in diagnosis please cases. Jpn. J. Radiol. 1–5 (2024)

  • Team, G., Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

  • Wan, A., Wallace, E., Shen, S., Klein, D.: Poisoning language models during instruction tuning. In: International Conference on Machine Learning (2023)

  • Wang, B., Pei, H., Pan, B., Chen, Q., Wang, S., Li, B.: T3: Tree-autoencoder regularized adversarial text generation for targeted attack. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6134–6150 (2020)

  • Wang, B., Chen, W., Pei, H., Xie, C., Kang, M., Zhang, C., Xu, C., Xiong, Z., Dutta, R., Schaeffer, R., Truong, S.T., Arora, S., Mazeika, M., Hendrycks, D., Lin, Z., Cheng, Y., Koyejo, S., Song, D., Li, B.: DecodingTrust: a comprehensive assessment of trustworthiness in GPT models. arXiv:2306.11698 [cs.CL] (2023)

  • Weber, M., Xu, X., Karlaš, B., Zhang, C., Li, B.: RAB: Provable robustness against backdoor attacks. In: 2023 IEEE Symposium on Security and Privacy (SP), pp. 640–657 (2023)

  • Wei, J., Wei, J., Tay, Y., Tran, D., Webson, A., Lu, Y., Chen, X., Liu, H., Huang, D., Zhou, D., Ma, T.: Larger language models do in-context learning differently. arXiv:2303.03846 (2023)

  • Wu, J., Gaur, Y., Chen, Z., Zhou, L., Zhu, Y., Wang, T., Li, J., Liu, S., Ren, B., Liu, L., Wu, Y.: On decoder-only architecture for speech-to-text and large language model integration. arXiv:2307.03917 [eess.AS] (2023)

  • Xiang, Z., Xiong, Z., Li, B.: CBD: a certified backdoor detector based on local dominant probability. In: Advances in Neural Information Processing Systems (NeurIPS) (2023a)

  • Xiang, Z., Xiong, Z., Li, B.: UMD: Unsupervised model detection for X2X backdoor attacks. In: International Conference on Machine Learning (ICML) (2023b)

  • Xiang, Z., Jiang, F., Xiong, Z., Ramasubramanian, B., Poovendran, R., Li, B.: BadChain: Backdoor chain-of-thought prompting for large language models (2024)

  • Xie, T., Wu, C.H., Shi, P., Zhong, R., Scholak, T., Yasunaga, M., Wu, C.-S., Zhong, M., Yin, P., Wang, S.I., Zhong, V., Wang, B., Li, C., Boyle, C., Ni, A., Yao, Z., Radev, D., Xiong, C., Kong, L., Zhang, R., Smith, N.A., Zettlemoyer, L., Yu, T.: UnifiedSKG: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. arXiv: 2201.05966 [cs.CL] (2022)

  • Xu, L., Chen, Y., Cui, G., Gao, H., Liu, Z.: Exploring the universal vulnerability of prompt-based learning paradigm. In: Findings of the Association for Computational Linguistics: NAACL 2022 (2022)

  • Xu, J., Ma, M.D., Wang, F., Xiao, C., Chen, M.: Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models. arXiv:2305.14710 (2023)

  • Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q.V., Zhou, D., Chen, X.: Large language models as optimizers. arXiv:2309.03409 (2023)

  • Yasunaga, M., Chen, X., Li, Y., Pasupat, P., Leskovec, J., Liang, P., Chi, E.H., Zhou, D.: Large language models as analogical reasoners. In: The Twelfth International Conference on Learning Representations (2024)

  • Zeng, G., Qi, F., Zhou, Q., Zhang, T., Ma, Z., Hou, B., Zang, Y., Liu, Z., Sun, M.: OpenAttack: An Open-source Textual Adversarial Attack Toolkit. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pp. 363–371 (2021). https://doi.org/10.18653/v1/2021.acl-demo.43. arXiv:2009.09191. Accessed 21 August 2023

  • Zhang, H., Huang, J., Li, Z., Naik, M., Xing, E.: Improved logical reasoning of language models via differentiable symbolic programming. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023, pp. 3062–3077. Association for Computational Linguistics, Toronto, Canada (2023). https://doi.org/10.18653/v1/2023.findings-acl.191

  • Zhang, K., Li, J., Li, G., Shi, X., Jin, Z.: CodeAgent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges (2024)

  • Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.-G.: Clean-label backdoor attacks on video recognition models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  • Zhao, Z., Wallace, E., Feng, S., Klein, D., Singh, S.: Calibrate before use: Improving few-shot performance of language models. In: International Conference on Machine Learning, pp. 12697–12706. PMLR (2021)

  • Zhao, W., Geva, M., Lin, B.Y., Yasunaga, M., Madaan, A., Yu, T.: Complex reasoning in natural language. In: Chen, Y.-N.V., Margot, M., Reddy, S. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (volume 6: Tutorial Abstracts), 11–20. Association for Computational Linguistics, Toronto, Canada (2023a). https://doi.org/10.18653/v1/2023.acl-tutorials.2

  • Zhao, S., Wen, J., Tuan, L.A., Zhao, J., Fu, J.: Prompt as triggers for backdoor attack: examining the vulnerability in language models. arXiv:2305.01219 [cs.CL] (2023b)

  • Zheng, Q., Xia, X., Zou, X., Dong, Y., Wang, S., Xue, Y., Wang, Z., Shen, L., Wang, A., Li, Y., Su, T., Yang, Z., Tang, J.: CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X (2023). arXiv:2303.17568. Accessed 28 June 2023

  • Zheng, Z., Ning, K., Wang, Y., Zhang, J., Zheng, D., Ye, M., Chen, J.: A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends. arXiv:2311.10372 [cs] (2024). Accessed 29 March 2024

  • Zhuo, T.Y., Vu, M.C., Chim, J., Hu, H., Yu, W., Widyasari, R., Yusuf, I.N.B., Zhan, H., He, J., Paul, I., et al.: Bigcodebench: Benchmarking code generation with diverse function calls and complex instructions. arXiv preprint arXiv:2406.15877 (2024)

  • Zou, A., Wang, Z., Kolter, J.Z., Fredrikson, M.: Universal and transferable adversarial attacks on aligned language models. arXiv:2307.15043 [cs.CL] (2023)

Download references

Funding

This work was supported by and China Vocational Education Society Huawei Technologies Co., Ltd. 2024 Annual Industry-Education Integration Special Topic (No. XHHWCJRH2024-02-01-02), 2023 Higher Education Scientific Research Planning Project of China Society of Higher Education (No. 23PG0408), 2023 Philosophy and Social Science Research Programs in Jiangsu Province (No. 2023SJSZ0993), Nantong Science and Technology Project (No. JC2023070), Key Project of Jiangsu Province Education Science 14th Five-Year Plan(Grant No. B-b/2024/02/41) and the Open Fund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province (Grant No. SKLACSS-202407). This work is sponsored by the Cultivation of Young and Middle-aged Academic Leaders in Qing Lan Project of Jiangsu Province.

Author information

Authors and Affiliations

Authors

Contributions

Yubin Qu conceived and designed the whole study, collected and analyzed the data, and wrote the manuscript; Song Huang supervised the project, provided guidance throughout the study, and critically reviewed the manuscript; Yanzhou Li provided expertise in statistical analysis and contributed to manuscript revisions. Tongtong Bai contributed to manuscript revisions. Xiang Chen provided expertise in statistical analysis, assisted with data interpretation, and contributed to manuscript revisions. Xingya Wang contributed to manuscript revisions. Long Li provided expertise in statistical analysis, assisted with data interpretation. Yongming Yao provided expertise in statistical analysis, assisted with data interpretation, and contributed to manuscript revisions.

Corresponding author

Correspondence to Song Huang.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qu, Y., Huang, S., Li, Y. et al. BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation. Autom Softw Eng 32, 17 (2025). https://doi.org/10.1007/s10515-024-00485-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-024-00485-2

Keywords