Abstract
Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Li R, Allal L B, Zi Y, Muennighoff N, Kocetkov D, et al. StarCoder: may the source be with you! 2016, arXiv preprint arXiv: 2305.06161
Nijkamp E, Pang B, Hayashi H, Tu L, Wang H, Zhou Y, Savarese S, Xiong C. CodeGen: an open large language model for code with multi-turn program synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023
Rozière B, Gehring J, Gloeckle F, Sootla S, Gat I, Tan X E, Adi Y, Liu J, Sauvestre R, Remez T, Rapin J, Kozhevnikov A, Evtimov I, Bitton J, Bhatt M, Ferrer C C, Grattafiori A, Xiong W, Défossez A, Copet J, Azhar F, Touvron H, Martin L, Usunier N, Scialom T, Synnaeve G. Code LLaMa: open foundation models for code. 2024, arXiv preprint arXiv: 2308.12950
Hu Y, Jiang H, Hu Z. Measuring code maintainability with deep neural networks. Frontiers of Computer Science, 2023, 17(6): 176214
Chen B, Zhang F, Nguyen A, Zan D, Lin Z, Lou J G, Chen W. CodeT: code generation with generated tests. In: Proceedings of the 11th International Conference on Learning Representations. 2023
Zhang K, Wang D, Xia J, Wang W Y, Li L. ALGO: synthesizing algorithmic programs with LLM-generated oracle verifiers. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 54769–54784
Chen M, Tworek J, Jun H, Yuan Q, de Oliveira Pinto H P, et al. Evaluating large language models trained on code. 2021, arXiv preprint arXiv: 2107.03374
Inala J P, Wang C, Yang M, Codas A, Encarnación M, Lahiri S K, Musuvathi M, Gao J. Fault-aware neural code rankers. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 13419–13432
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000–6010
Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018
Black S, Leo G, Wang P, Leahy C, Biderman S. GPT-Neo: large scale autoregressive language modeling with mesh-tensorflow. 2021
Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. GPT-4 technical report. 2024, arXiv preprint arXiv: 2303.08774
Anil R, Dai A M, Firat O, Johnson M, Lepikhin D, et al. PaLM 2 technical report. 2023, arXiv preprint arXiv: 2305.10403
Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, et al. PaLM: scaling language modeling with pathways. The Journal of Machine Learning Research, 2024, 24(1): 240
Wang Y, Wang W, Joty S, Hoi S C H. CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8696–8708
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21(1): 140
Li Y, Choi D, Chung J, Kushman N, Schrittwieser J, Leblond R, Eccles T, Keeling J, Gimeno F, Dal Lago A, Hubert T, Choy P, de Masson d’Autume C, Babuschkin I, Chen X, Huang P S, Welbl J, Gowal S, Cherepanov A, Molloy J, Mankowitz D J, Sutherland Robson E, Kohli P, de Freitas N, Kavukcuoglu K, Vinyals O. Competition-level code generation with AlphaCode. Science, 2022, 378(6624): 1092–1097
Luo Z, Xu C, Zhao P, Sun Q, Geng X, Hu W, Tao C, Ma J, Lin Q, Jiang D. WizardCoder: Empowering code large language models with Evol-Instruct. In: Proceedings of the 12th International Conference on Learning Representations. 2024
Gunasekar S, Zhang Y, Aneja J, Mendes C C T, Del Giorno A, Gopi S, Javaheripi M, Kauffmann P, de Rosa G, Saarikivi O, Salim A, Shah S, Behl H S, Wang X, Bubeck S, Eldan R, Kalai A T, Lee Y T, Li Y. Textbooks are all you need. 2023, arXiv preprint arXiv: 2306.11644
Bi X, Chen D, Chen G, Chen S, Dai D, et al. DeepSeek LLM: scaling open-source language models with longtermism. 2024, arXiv preprint arXiv: 2401.02954
Zheng Q, Xia X, Zou X, Dong Y, Wang S, Xue Y, Shen L, Wang Z, Wang A, Li Y, Su T, Yang Z, Tang J. CodeGeeX: a pre-trained model for code generation with multilingual benchmarking on HumanEval-X. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023, 5673–5684
Fried D, Aghajanyan A, Lin J, Wang S, Wallace E, Shi F, Zhong R, Yih S, Zettlemoyer L, Lewis M. InCoder: a generative model for code infilling and synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023
Chen X, Lin M, Schärli N, Zhou D. Teaching large language models to self-debug. In: Proceedings of the 12th International Conference on Learning Representations. 2024
Liu J, Xia C S, Wang Y, Zhang L. Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2024, 943
Deng Y, Xia C S, Peng H, Yang C, Zhang L. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2023, 423–435
Wang W, Li G, Ma B, Xia X, Jin Z. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: Proceedings of the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering. 2020, 261–271
Gu J, Chen Z, Monperrus M. Multimodal representation for neural code search. In: Proceedings of 2021 IEEE International Conference on Software Maintenance and Evolution. 2021, 483–494
Arakelyan S, Hakhverdyan A, Allamanis M, Garcia L, Hauser C, Ren X. NS3: neuro-symbolic semantic code search. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2024, 761
Li Z, Pan M, Pei Y, Zhang T, Wang L, Li X. Empirically revisiting and enhancing automatic classification of bug and non-bug issues. Frontiers of Computer Science, 2024, 18(5): 185207
Kanade A, Maniatis P, Balakrishnan G, Shi K. Learning and evaluating contextual embedding of source code. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 474
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M. CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020. 2020, 1536–1547
Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng S K, Clement C B, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M. GraphCodeBERT: pre-training code representations with data flow. In: Proceedings of the 9th International Conference on Learning Representations. 2021
Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J. UniXcoder: unified cross-modal pre-training for code representation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 7212–7225
Ahmad W, Chakraborty S, Ray B, Chang K W. Unified pre-training for program understanding and generation. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 2655–2668
Wang X, Wang Y, Mi F, Zhou P, Wan Y, Liu X, Li L, Wu H, Liu J, Jiang X. SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation. 2021, arXiv preprint arXiv: 2108.04556
Clark K, Luong M T, Le Q V, Manning C D. Electra: pre-training text encoders as discriminators rather than generators. In: Proceedings of the 8th International Conference on Learning Representations. 2020
Hendrycks D, Basart S, Kadavath S, Mazeika M, Arora A, Guo E, Burns C, Puranik S, He H, Song D, Steinhardt J. Measuring coding challenge competence with APPS. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021
Austin J, Odena A, Nye M, Bosma M, Michalewski H, Dohan D, Jiang E, Cai C, Terry M, Le Q, Sutton C. Program synthesis with large language models. 2021, arXiv preprint arXiv: 2108.07732
OpenAI. ChatGPT: optimizing language models for dialogue. 2022
Zhang T, Yu T, Hashimoto T B, Lewis M, Yih W T, Fried D, Wang S I. Coder reviewer reranking for code generation. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 41832–41846
Acknowledgments
This research was supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 62076121, 61921006) and the Major Program (JD) of Hubei Province (2023BAA024). The authors would like to thank Hao-Yuan He and Hui Sun for helpful discussion.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests Ming Li is an Editorial Board member of the journal and a co-author of this article. To minimize bias, they were excluded from all editorial decision-making related to the acceptance of this article for publication. The remaining authors declare no conflict of interest.
Additional information
Zhicun Lyu obtained the BSc degree from Nanjing University, China in 2022. Currently, she is working toward the master degree with the School of Artificial Intelligence, Nanjing University, China. She is a member of the LAMDA Group. Her research interests mainly include machine learning and data mining, especially in software mining.
Xinye Li obtained the BSc and MSc degree from Nanjing University, China in 2020 and 2023 respectively. Currently, he is working toward the PhD degree with the School of Artificial Intelligence, Nanjing University, China. He is a member of the LAMDA Group. His research interests mainly include machine learning and data mining, especially in software mining. He has received a number of awards including National Scholarship, Outstanding Graduate of Nanjing University and so forth.
Zheng Xie obtained the PhD degree from Nanjing University, China in 2023 and the BEng degree from Xi’an Jiaotong University, China in 2016. Currently, he is working as a researcher at Huawei. His research interests mainly include machine learning and data mining. He has published over 10 papers in top-tier international journals or conference proceedings, including IEEE TPAMI, FCS, AAAI, IJCAI, ICML and so forth.
Ming Li is currently a professor at the School of Artificial Intelligence, Nanjing University, China. He is also a member of LAMDA group. His major research interests include machine learning and data mining, especially in software mining. He has served as the area chair of IJCAI, IEEE ICDM, etc., senior PC member of the premium conferences in artificial intelligence such as AAAI, and PC members for other premium conferences, such as KDD, NeurIPS, and ICML. He is the founding chair of the International Workshop on Software Mining. He has been granted various awards including the PAKDD Early Career Award, the NSFC Excellent Youth Award, the New Century Excellent Talents program of the Education Ministry of China, etc.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Lyu, Z., Li, X., Xie, Z. et al. Top Pass: improve code generation by pass@k-maximized code ranking. Front. Comput. Sci. 19, 198341 (2025). https://doi.org/10.1007/s11704-024-40415-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-024-40415-9