Top Pass: improve code generation by pass@k-maximized code ranking

Lyu, Zhicun; Li, Xinye; Xie, Zheng; Li, Ming

doi:10.1007/s11704-024-40415-9

Top Pass: improve code generation by pass@k-maximized code ranking

Research Article
Published: 13 January 2025

Volume 19, article number 198341, (2025)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Zhicun Lyu^1,2,
Xinye Li^1,2,
Zheng Xie¹ &
…
Ming Li^1,2

108 Accesses
1 Citation
Explore all metrics

Abstract

Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation Metrics in LLM Code Generation

UCP: a unified framework for code generation with pseudocode-based multi-task learning and reinforcement alignment

Article 10 June 2025

The Return of Formal Requirements Engineering in the Era of Large Language Models

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Li R, Allal L B, Zi Y, Muennighoff N, Kocetkov D, et al. StarCoder: may the source be with you! 2016, arXiv preprint arXiv: 2305.06161
Nijkamp E, Pang B, Hayashi H, Tu L, Wang H, Zhou Y, Savarese S, Xiong C. CodeGen: an open large language model for code with multi-turn program synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023
Google Scholar
Rozière B, Gehring J, Gloeckle F, Sootla S, Gat I, Tan X E, Adi Y, Liu J, Sauvestre R, Remez T, Rapin J, Kozhevnikov A, Evtimov I, Bitton J, Bhatt M, Ferrer C C, Grattafiori A, Xiong W, Défossez A, Copet J, Azhar F, Touvron H, Martin L, Usunier N, Scialom T, Synnaeve G. Code LLaMa: open foundation models for code. 2024, arXiv preprint arXiv: 2308.12950
Hu Y, Jiang H, Hu Z. Measuring code maintainability with deep neural networks. Frontiers of Computer Science, 2023, 17(6): 176214
Article MATH Google Scholar
Chen B, Zhang F, Nguyen A, Zan D, Lin Z, Lou J G, Chen W. CodeT: code generation with generated tests. In: Proceedings of the 11th International Conference on Learning Representations. 2023
MATH Google Scholar
Zhang K, Wang D, Xia J, Wang W Y, Li L. ALGO: synthesizing algorithmic programs with LLM-generated oracle verifiers. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 54769–54784
MATH Google Scholar
Chen M, Tworek J, Jun H, Yuan Q, de Oliveira Pinto H P, et al. Evaluating large language models trained on code. 2021, arXiv preprint arXiv: 2107.03374
Inala J P, Wang C, Yang M, Codas A, Encarnación M, Lahiri S K, Musuvathi M, Gao J. Fault-aware neural code rankers. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 13419–13432
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000–6010
Google Scholar
Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018
Google Scholar
Black S, Leo G, Wang P, Leahy C, Biderman S. GPT-Neo: large scale autoregressive language modeling with mesh-tensorflow. 2021
Google Scholar
Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. GPT-4 technical report. 2024, arXiv preprint arXiv: 2303.08774
Anil R, Dai A M, Firat O, Johnson M, Lepikhin D, et al. PaLM 2 technical report. 2023, arXiv preprint arXiv: 2305.10403
Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, et al. PaLM: scaling language modeling with pathways. The Journal of Machine Learning Research, 2024, 24(1): 240
Google Scholar
Wang Y, Wang W, Joty S, Hoi S C H. CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8696–8708
Chapter MATH Google Scholar
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21(1): 140
MathSciNet Google Scholar
Li Y, Choi D, Chung J, Kushman N, Schrittwieser J, Leblond R, Eccles T, Keeling J, Gimeno F, Dal Lago A, Hubert T, Choy P, de Masson d’Autume C, Babuschkin I, Chen X, Huang P S, Welbl J, Gowal S, Cherepanov A, Molloy J, Mankowitz D J, Sutherland Robson E, Kohli P, de Freitas N, Kavukcuoglu K, Vinyals O. Competition-level code generation with AlphaCode. Science, 2022, 378(6624): 1092–1097
Article Google Scholar
Luo Z, Xu C, Zhao P, Sun Q, Geng X, Hu W, Tao C, Ma J, Lin Q, Jiang D. WizardCoder: Empowering code large language models with Evol-Instruct. In: Proceedings of the 12th International Conference on Learning Representations. 2024
MATH Google Scholar
Gunasekar S, Zhang Y, Aneja J, Mendes C C T, Del Giorno A, Gopi S, Javaheripi M, Kauffmann P, de Rosa G, Saarikivi O, Salim A, Shah S, Behl H S, Wang X, Bubeck S, Eldan R, Kalai A T, Lee Y T, Li Y. Textbooks are all you need. 2023, arXiv preprint arXiv: 2306.11644
Bi X, Chen D, Chen G, Chen S, Dai D, et al. DeepSeek LLM: scaling open-source language models with longtermism. 2024, arXiv preprint arXiv: 2401.02954
Zheng Q, Xia X, Zou X, Dong Y, Wang S, Xue Y, Shen L, Wang Z, Wang A, Li Y, Su T, Yang Z, Tang J. CodeGeeX: a pre-trained model for code generation with multilingual benchmarking on HumanEval-X. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023, 5673–5684
Chapter MATH Google Scholar
Fried D, Aghajanyan A, Lin J, Wang S, Wallace E, Shi F, Zhong R, Yih S, Zettlemoyer L, Lewis M. InCoder: a generative model for code infilling and synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023
Google Scholar
Chen X, Lin M, Schärli N, Zhou D. Teaching large language models to self-debug. In: Proceedings of the 12th International Conference on Learning Representations. 2024
MATH Google Scholar
Liu J, Xia C S, Wang Y, Zhang L. Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2024, 943
MATH Google Scholar
Deng Y, Xia C S, Peng H, Yang C, Zhang L. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2023, 423–435
Chapter MATH Google Scholar
Wang W, Li G, Ma B, Xia X, Jin Z. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: Proceedings of the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering. 2020, 261–271
MATH Google Scholar
Gu J, Chen Z, Monperrus M. Multimodal representation for neural code search. In: Proceedings of 2021 IEEE International Conference on Software Maintenance and Evolution. 2021, 483–494
MATH Google Scholar
Arakelyan S, Hakhverdyan A, Allamanis M, Garcia L, Hauser C, Ren X. NS³: neuro-symbolic semantic code search. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2024, 761
Google Scholar
Li Z, Pan M, Pei Y, Zhang T, Wang L, Li X. Empirically revisiting and enhancing automatic classification of bug and non-bug issues. Frontiers of Computer Science, 2024, 18(5): 185207
Article MATH Google Scholar
Kanade A, Maniatis P, Balakrishnan G, Shi K. Learning and evaluating contextual embedding of source code. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 474
MATH Google Scholar
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M. CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020. 2020, 1536–1547
Chapter MATH Google Scholar
Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng S K, Clement C B, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M. GraphCodeBERT: pre-training code representations with data flow. In: Proceedings of the 9th International Conference on Learning Representations. 2021
MATH Google Scholar
Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J. UniXcoder: unified cross-modal pre-training for code representation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 7212–7225
Google Scholar
Ahmad W, Chakraborty S, Ray B, Chang K W. Unified pre-training for program understanding and generation. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 2655–2668
MATH Google Scholar
Wang X, Wang Y, Mi F, Zhou P, Wan Y, Liu X, Li L, Wu H, Liu J, Jiang X. SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation. 2021, arXiv preprint arXiv: 2108.04556
Clark K, Luong M T, Le Q V, Manning C D. Electra: pre-training text encoders as discriminators rather than generators. In: Proceedings of the 8th International Conference on Learning Representations. 2020
MATH Google Scholar
Hendrycks D, Basart S, Kadavath S, Mazeika M, Arora A, Guo E, Burns C, Puranik S, He H, Song D, Steinhardt J. Measuring coding challenge competence with APPS. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021
Google Scholar
Austin J, Odena A, Nye M, Bosma M, Michalewski H, Dohan D, Jiang E, Cai C, Terry M, Le Q, Sutton C. Program synthesis with large language models. 2021, arXiv preprint arXiv: 2108.07732
OpenAI. ChatGPT: optimizing language models for dialogue. 2022
Google Scholar
Zhang T, Yu T, Hashimoto T B, Lewis M, Yih W T, Fried D, Wang S I. Coder reviewer reranking for code generation. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 41832–41846
Google Scholar

Download references

Acknowledgments

This research was supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 62076121, 61921006) and the Major Program (JD) of Hubei Province (2023BAA024). The authors would like to thank Hao-Yuan He and Hui Sun for helpful discussion.

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Zhicun Lyu, Xinye Li, Zheng Xie & Ming Li
School of Artificial Intelligence, Nanjing University, Nanjing, 210023, China
Zhicun Lyu, Xinye Li & Ming Li

Authors

Zhicun Lyu
View author publications
Search author on:PubMed Google Scholar
Xinye Li
View author publications
Search author on:PubMed Google Scholar
Zheng Xie
View author publications
Search author on:PubMed Google Scholar
Ming Li
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Ming Li.

Ethics declarations

Competing interests Ming Li is an Editorial Board member of the journal and a co-author of this article. To minimize bias, they were excluded from all editorial decision-making related to the acceptance of this article for publication. The remaining authors declare no conflict of interest.

Additional information

Zhicun Lyu obtained the BSc degree from Nanjing University, China in 2022. Currently, she is working toward the master degree with the School of Artificial Intelligence, Nanjing University, China. She is a member of the LAMDA Group. Her research interests mainly include machine learning and data mining, especially in software mining.

Xinye Li obtained the BSc and MSc degree from Nanjing University, China in 2020 and 2023 respectively. Currently, he is working toward the PhD degree with the School of Artificial Intelligence, Nanjing University, China. He is a member of the LAMDA Group. His research interests mainly include machine learning and data mining, especially in software mining. He has received a number of awards including National Scholarship, Outstanding Graduate of Nanjing University and so forth.

Zheng Xie obtained the PhD degree from Nanjing University, China in 2023 and the BEng degree from Xi’an Jiaotong University, China in 2016. Currently, he is working as a researcher at Huawei. His research interests mainly include machine learning and data mining. He has published over 10 papers in top-tier international journals or conference proceedings, including IEEE TPAMI, FCS, AAAI, IJCAI, ICML and so forth.

Ming Li is currently a professor at the School of Artificial Intelligence, Nanjing University, China. He is also a member of LAMDA group. His major research interests include machine learning and data mining, especially in software mining. He has served as the area chair of IJCAI, IEEE ICDM, etc., senior PC member of the premium conferences in artificial intelligence such as AAAI, and PC members for other premium conferences, such as KDD, NeurIPS, and ICML. He is the founding chair of the International Workshop on Software Mining. He has been granted various awards including the PAKDD Early Career Award, the NSFC Excellent Youth Award, the New Century Excellent Talents program of the Education Ministry of China, etc.

Electronic Supplementary Material