Abstract
Using few-shot demonstrations in prompts significantly enhances the generation quality of large language models (LLMs), including code generation. However, adversarial examples injected by malicious service providers via few-shot prompting pose a risk of backdoor attacks in large language models. There is no research on backdoor attacks on large language models in the few-shot prompting setting for code generation tasks. In this paper, we propose BadCodePrompt, the first backdoor attack for code generation tasks targeting LLMS in the few-shot prompting scenario, without requiring access to training data or model parameters and with lower computational overhead. BadCodePrompt exploits the insertion of triggers and poisonous code patterns into examples, causing the output of poisonous source code when there is a backdoor trigger in the end user’s query prompt. We demonstrate the effectiveness of BadCodePrompt in conducting backdoor attacks on three LLMS (GPT-4, Claude-3.5-Sonnet, and Gemini Pro-1.5) in code generation tasks without affecting the functionality of the generated code. LLMs with stronger reasoning capabilities are also more vulnerable to BadCodePrompt, with an average attack success rate of up to 98.53% for GPT-4 in two benchmark tasks. Finally, we employ state-of-the-art defenses against backdoor attacks in Prompt Engineering and show their overall ineffectiveness against BadCodePrompt. Therefore, BadCodePrompt remains a serious threat to LLMS, underscoring the urgency of developing effective defense mechanisms.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availibility
No datasets were generated or analysed during the current study.
Notes
For example, https://www.fiverr.com/gigs/ai-prompt.
References
Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., Sutton, C.: Program Synthesis with Large Language Models. arXiv:2108.07732 [cs] (2021). Accessed 18 March 2024
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems (2020)
Cai, X., Xu, H., Xu, S., Zhang, Y., Yuan, X.: BadPrompt: Backdoor attacks on continuous prompts. In: Advances in Neural Information Processing Systems (2022)
Cai, H., Zhang, P., Dong, H., Xiao, Y., Koffas, S., Li, Y.: Towards stealthy backdoor attacks against speech recognition via elements of sound. arXiv:2307.08208 (2023)
Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning (2017). https://arxiv.org/abs/1712.05526v1
Chen, X., Salem, A., Chen, D., Backes, M., Ma, S., Shen, Q., Wu, Z., Zhang, Y.: BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements. In: Annual Computer Security Applications Conference, pp. 554–569 (2021a). https://doi.org/10.1145/3485832.3485837. arXiv:2006.01043. Accessed 26 May 2023
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F.P., Cummings, D., Plappert, M., Chantzis, F., Barnes, E., Herbert-Voss, A., Guss, W.H., Nichol, A., Paino, A., Tezak, N., Tang, J., Babuschkin, I., Balaji, S., Jain, S., Saunders, W., Hesse, C., Carr, A.N., Leike, J., Achiam, J., Misra, V., Morikawa, E., Radford, A., Knight, M., Brundage, M., Murati, M., Mayer, K., Welinder, P., McGrew, B., Amodei, D., McCandlish, S., Sutskever, I., Zaremba, W.: Evaluating large language models trained on code (2021b). https://doi.org/10.48550/arXiv.2107.03374
Conti, M., Dragoni, N., Lesyk, V.: A survey of man in the middle attacks. IEEE Commun. Surv. Tutor. 18(3), 2027–2051 (2016)
Contributors, D.: DriveLM: Drive on language (2023). https://github.com/OpenDriveLab/DriveLM
Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access Pract. Innov. Open Solut. 7, 138872–138878 (2019)
Du, W., Liu, G.: A survey of backdoor attack in deep learning. J. Cyber Secur. 7(3), 1–16 (2022). https://doi.org/10.19363/J.CNKI.CN10-1380/TN.2022.05.01
Fried, D., Aghajanyan, A., Lin, J., Wang, S., Wallace, E., Shi, F., Zhong, R., Yih, W.-t., Zettlemoyer, L., Lewis, M.: InCoder: A Generative Model for Code Infilling and Synthesis. arXiv:2204.05999 [cs] (2023)
Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: BadNets: Evaluating backdooring attacks on deep neural networks. IEEE Access Pract. Innov. Open Solut. 7, 47230–47244 (2019)
Hong, S., Carlini, N., Kurakin, A.: Handcrafted backdoors in deep neural networks. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022)
Huang, K., Li, Y., Wu, B., Qin, Z., Ren, K.: Backdoor defense via decoupling the training process. In: International Conference on Learning Representations (ICLR) (2022)
Jiang, X., Dong, Y., Wang, L., Zheng, F., Shang, Q., Li, G., Jin, Z., Jiao, W.: Self-planning code generation with large language models. ACM Trans. Softw. Eng. Methodol. (2023)
Kandpal, N., Jagielski, M., Tramèr, F., Carlini, N.: Backdoor attacks for in-context learning with language models. In: The Second Workshop on New Frontiers in Adversarial Machine Learning (2023)
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022)
Li, Y., Choi, D., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., Eccles, T., Keeling, J., Gimeno, F., Lago, A.D., Hubert, T., Choy, P., d’Autume, C.d.M., Babuschkin, I., Chen, X., Huang, P.-S., Welbl, J., Gowal, S., Cherepanov, A., Molloy, J., Mankowitz, D.J., Robson, E.S., Kohli, P., Freitas, N., Kavukcuoglu, K., Vinyals, O.: Competition-level code generation with AlphaCode. Science (New York, N.Y.) 378(6624), 1092–1097 (2022). https://doi.org/10.1126/science.abq1158
Li, J., Zhao, Y., Li, Y., Li, G., Jin, Z.: Acecoder: Utilizing existing code to enhance code generation. arXiv preprint arXiv:2303.17780 (2023)
Liu, Y., Ma, S., Aafer, Y., Lee, W.-C., Zhai, J., Wang, W., Zhang, X.: Trojaning attack on neural networks (2018). https://doi.org/10.14722/NDSS.2018.23291
Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., Chen, W.: What makes good in-context examples for GPT-3? arXiv preprint arXiv:2101.06804 (2021)
Lou, Q., Liu, Y., Feng, B.: TrojText: Test-time invisible textual trojan insertion. In: The Eleventh International Conference on Learning Representations (2023)
Mei, K., Li, Z., Wang, Z., Zhang, Y., Ma, S.: NOTABLE: Transferable backdoor attacks against prompt-based NLP models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers) (2023)
Miller, D.J., Xiang, Z., Kesidis, G.: Adversarial learning in statistical classification: a comprehensive review of defenses against attacks. Proc. IEEE 108, 402–433 (2020)
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., Zettlemoyer, L.: Rethinking the role of demonstrations: What makes in-context learning work? In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (2022)
Moor, M., Huang, Q., Wu, S., Yasunaga, M., Zakka, C., Dalmia, Y., Reis, E.P., Rajpurkar, P., Leskovec, J.: Med-flamingo: a multimodal medical few-shot learner. arXiv:2307.15189 [cs.CV] (2023)
Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., Xiong, C.: CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. arXiv:2203.13474 [cs] (2023)
OpenAI: GPT-4 technical report. arXiv (2023)
Panda, A., Zhang, Z., Yang, Y., Mittal, P.: Teach GPT to phish. In: The Second Workshop on New Frontiers in Adversarial Machine Learning (2023)
Penedo, G., Malartic, Q., Hesslow, D., Cojocaru, R., Cappelli, A., Alobeidli, H., Pannier, B., Almazrouei, E., Launay, J.: The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116 (2023)
Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer (2021a). https://doi.org/10.48550/arXiv.2110.07139. arXiv:2110.07139. Accessed 25 April 2023
Qi, F., Li, M., Chen, Y., Zhang, Z., Liu, Z., Wang, Y., Sun, M.: Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger. arXiv:2105.12400 [cs] (2021b). Accessed 08 May 2024
Qi, X., Zhu, J., Xie, C., Yang, Y.: Subnet Replacement: Deployment-stage backdoor attack against deep neural networks in gray-box setting. In: International Conference on Learning Representations (ICLR) Workshop on Security and Safety in Machine Learning Systems (2021c)
Ren, S., Guo, D., Lu, S., Zhou, L., Liu, S., Tang, D., Sundaresan, N., Zhou, M., Blanco, A., Ma, S.: Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297 (2020)
Shen, B., Zhang, J., Chen, T., Zan, D., Geng, B., Fu, A., Zeng, M., Yu, A., Ji, J., Zhao, J., Guo, Y., Wang, Q.: PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback. arXiv (2023)
Sonoda, Y., Kurokawa, R., Nakamura, Y., Kanzawa, J., Kurokawa, M., Ohizumi, Y., Gonoi, W., Abe, O.: Diagnostic performances of GPT-4o, claude 3 opus, and gemini 1.5 pro in diagnosis please cases. Jpn. J. Radiol. 1–5 (2024)
Team, G., Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)
Wan, A., Wallace, E., Shen, S., Klein, D.: Poisoning language models during instruction tuning. In: International Conference on Machine Learning (2023)
Wang, B., Pei, H., Pan, B., Chen, Q., Wang, S., Li, B.: T3: Tree-autoencoder regularized adversarial text generation for targeted attack. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6134–6150 (2020)
Wang, B., Chen, W., Pei, H., Xie, C., Kang, M., Zhang, C., Xu, C., Xiong, Z., Dutta, R., Schaeffer, R., Truong, S.T., Arora, S., Mazeika, M., Hendrycks, D., Lin, Z., Cheng, Y., Koyejo, S., Song, D., Li, B.: DecodingTrust: a comprehensive assessment of trustworthiness in GPT models. arXiv:2306.11698 [cs.CL] (2023)
Weber, M., Xu, X., Karlaš, B., Zhang, C., Li, B.: RAB: Provable robustness against backdoor attacks. In: 2023 IEEE Symposium on Security and Privacy (SP), pp. 640–657 (2023)
Wei, J., Wei, J., Tay, Y., Tran, D., Webson, A., Lu, Y., Chen, X., Liu, H., Huang, D., Zhou, D., Ma, T.: Larger language models do in-context learning differently. arXiv:2303.03846 (2023)
Wu, J., Gaur, Y., Chen, Z., Zhou, L., Zhu, Y., Wang, T., Li, J., Liu, S., Ren, B., Liu, L., Wu, Y.: On decoder-only architecture for speech-to-text and large language model integration. arXiv:2307.03917 [eess.AS] (2023)
Xiang, Z., Xiong, Z., Li, B.: CBD: a certified backdoor detector based on local dominant probability. In: Advances in Neural Information Processing Systems (NeurIPS) (2023a)
Xiang, Z., Xiong, Z., Li, B.: UMD: Unsupervised model detection for X2X backdoor attacks. In: International Conference on Machine Learning (ICML) (2023b)
Xiang, Z., Jiang, F., Xiong, Z., Ramasubramanian, B., Poovendran, R., Li, B.: BadChain: Backdoor chain-of-thought prompting for large language models (2024)
Xie, T., Wu, C.H., Shi, P., Zhong, R., Scholak, T., Yasunaga, M., Wu, C.-S., Zhong, M., Yin, P., Wang, S.I., Zhong, V., Wang, B., Li, C., Boyle, C., Ni, A., Yao, Z., Radev, D., Xiong, C., Kong, L., Zhang, R., Smith, N.A., Zettlemoyer, L., Yu, T.: UnifiedSKG: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. arXiv: 2201.05966 [cs.CL] (2022)
Xu, L., Chen, Y., Cui, G., Gao, H., Liu, Z.: Exploring the universal vulnerability of prompt-based learning paradigm. In: Findings of the Association for Computational Linguistics: NAACL 2022 (2022)
Xu, J., Ma, M.D., Wang, F., Xiao, C., Chen, M.: Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models. arXiv:2305.14710 (2023)
Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q.V., Zhou, D., Chen, X.: Large language models as optimizers. arXiv:2309.03409 (2023)
Yasunaga, M., Chen, X., Li, Y., Pasupat, P., Leskovec, J., Liang, P., Chi, E.H., Zhou, D.: Large language models as analogical reasoners. In: The Twelfth International Conference on Learning Representations (2024)
Zeng, G., Qi, F., Zhou, Q., Zhang, T., Ma, Z., Hou, B., Zang, Y., Liu, Z., Sun, M.: OpenAttack: An Open-source Textual Adversarial Attack Toolkit. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pp. 363–371 (2021). https://doi.org/10.18653/v1/2021.acl-demo.43. arXiv:2009.09191. Accessed 21 August 2023
Zhang, H., Huang, J., Li, Z., Naik, M., Xing, E.: Improved logical reasoning of language models via differentiable symbolic programming. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023, pp. 3062–3077. Association for Computational Linguistics, Toronto, Canada (2023). https://doi.org/10.18653/v1/2023.findings-acl.191
Zhang, K., Li, J., Li, G., Shi, X., Jin, Z.: CodeAgent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges (2024)
Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.-G.: Clean-label backdoor attacks on video recognition models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Zhao, Z., Wallace, E., Feng, S., Klein, D., Singh, S.: Calibrate before use: Improving few-shot performance of language models. In: International Conference on Machine Learning, pp. 12697–12706. PMLR (2021)
Zhao, W., Geva, M., Lin, B.Y., Yasunaga, M., Madaan, A., Yu, T.: Complex reasoning in natural language. In: Chen, Y.-N.V., Margot, M., Reddy, S. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (volume 6: Tutorial Abstracts), 11–20. Association for Computational Linguistics, Toronto, Canada (2023a). https://doi.org/10.18653/v1/2023.acl-tutorials.2
Zhao, S., Wen, J., Tuan, L.A., Zhao, J., Fu, J.: Prompt as triggers for backdoor attack: examining the vulnerability in language models. arXiv:2305.01219 [cs.CL] (2023b)
Zheng, Q., Xia, X., Zou, X., Dong, Y., Wang, S., Xue, Y., Wang, Z., Shen, L., Wang, A., Li, Y., Su, T., Yang, Z., Tang, J.: CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X (2023). arXiv:2303.17568. Accessed 28 June 2023
Zheng, Z., Ning, K., Wang, Y., Zhang, J., Zheng, D., Ye, M., Chen, J.: A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends. arXiv:2311.10372 [cs] (2024). Accessed 29 March 2024
Zhuo, T.Y., Vu, M.C., Chim, J., Hu, H., Yu, W., Widyasari, R., Yusuf, I.N.B., Zhan, H., He, J., Paul, I., et al.: Bigcodebench: Benchmarking code generation with diverse function calls and complex instructions. arXiv preprint arXiv:2406.15877 (2024)
Zou, A., Wang, Z., Kolter, J.Z., Fredrikson, M.: Universal and transferable adversarial attacks on aligned language models. arXiv:2307.15043 [cs.CL] (2023)
Funding
This work was supported by and China Vocational Education Society Huawei Technologies Co., Ltd. 2024 Annual Industry-Education Integration Special Topic (No. XHHWCJRH2024-02-01-02), 2023 Higher Education Scientific Research Planning Project of China Society of Higher Education (No. 23PG0408), 2023 Philosophy and Social Science Research Programs in Jiangsu Province (No. 2023SJSZ0993), Nantong Science and Technology Project (No. JC2023070), Key Project of Jiangsu Province Education Science 14th Five-Year Plan(Grant No. B-b/2024/02/41) and the Open Fund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province (Grant No. SKLACSS-202407). This work is sponsored by the Cultivation of Young and Middle-aged Academic Leaders in Qing Lan Project of Jiangsu Province.
Author information
Authors and Affiliations
Contributions
Yubin Qu conceived and designed the whole study, collected and analyzed the data, and wrote the manuscript; Song Huang supervised the project, provided guidance throughout the study, and critically reviewed the manuscript; Yanzhou Li provided expertise in statistical analysis and contributed to manuscript revisions. Tongtong Bai contributed to manuscript revisions. Xiang Chen provided expertise in statistical analysis, assisted with data interpretation, and contributed to manuscript revisions. Xingya Wang contributed to manuscript revisions. Long Li provided expertise in statistical analysis, assisted with data interpretation. Yongming Yao provided expertise in statistical analysis, assisted with data interpretation, and contributed to manuscript revisions.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qu, Y., Huang, S., Li, Y. et al. BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation. Autom Softw Eng 32, 17 (2025). https://doi.org/10.1007/s10515-024-00485-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-024-00485-2