Skip to main content

Advertisement

Log in

Top Pass: improve code generation by pass@k-maximized code ranking

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Li R, Allal L B, Zi Y, Muennighoff N, Kocetkov D, et al. StarCoder: may the source be with you! 2016, arXiv preprint arXiv: 2305.06161

  2. Nijkamp E, Pang B, Hayashi H, Tu L, Wang H, Zhou Y, Savarese S, Xiong C. CodeGen: an open large language model for code with multi-turn program synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023

    Google Scholar 

  3. Rozière B, Gehring J, Gloeckle F, Sootla S, Gat I, Tan X E, Adi Y, Liu J, Sauvestre R, Remez T, Rapin J, Kozhevnikov A, Evtimov I, Bitton J, Bhatt M, Ferrer C C, Grattafiori A, Xiong W, Défossez A, Copet J, Azhar F, Touvron H, Martin L, Usunier N, Scialom T, Synnaeve G. Code LLaMa: open foundation models for code. 2024, arXiv preprint arXiv: 2308.12950

  4. Hu Y, Jiang H, Hu Z. Measuring code maintainability with deep neural networks. Frontiers of Computer Science, 2023, 17(6): 176214

    Article  MATH  Google Scholar 

  5. Chen B, Zhang F, Nguyen A, Zan D, Lin Z, Lou J G, Chen W. CodeT: code generation with generated tests. In: Proceedings of the 11th International Conference on Learning Representations. 2023

    MATH  Google Scholar 

  6. Zhang K, Wang D, Xia J, Wang W Y, Li L. ALGO: synthesizing algorithmic programs with LLM-generated oracle verifiers. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 54769–54784

    MATH  Google Scholar 

  7. Chen M, Tworek J, Jun H, Yuan Q, de Oliveira Pinto H P, et al. Evaluating large language models trained on code. 2021, arXiv preprint arXiv: 2107.03374

  8. Inala J P, Wang C, Yang M, Codas A, Encarnación M, Lahiri S K, Musuvathi M, Gao J. Fault-aware neural code rankers. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 13419–13432

    Google Scholar 

  9. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000–6010

    Google Scholar 

  10. Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018

    Google Scholar 

  11. Black S, Leo G, Wang P, Leahy C, Biderman S. GPT-Neo: large scale autoregressive language modeling with mesh-tensorflow. 2021

    Google Scholar 

  12. Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. GPT-4 technical report. 2024, arXiv preprint arXiv: 2303.08774

  13. Anil R, Dai A M, Firat O, Johnson M, Lepikhin D, et al. PaLM 2 technical report. 2023, arXiv preprint arXiv: 2305.10403

  14. Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, et al. PaLM: scaling language modeling with pathways. The Journal of Machine Learning Research, 2024, 24(1): 240

    Google Scholar 

  15. Wang Y, Wang W, Joty S, Hoi S C H. CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8696–8708

    Chapter  MATH  Google Scholar 

  16. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21(1): 140

    MathSciNet  Google Scholar 

  17. Li Y, Choi D, Chung J, Kushman N, Schrittwieser J, Leblond R, Eccles T, Keeling J, Gimeno F, Dal Lago A, Hubert T, Choy P, de Masson d’Autume C, Babuschkin I, Chen X, Huang P S, Welbl J, Gowal S, Cherepanov A, Molloy J, Mankowitz D J, Sutherland Robson E, Kohli P, de Freitas N, Kavukcuoglu K, Vinyals O. Competition-level code generation with AlphaCode. Science, 2022, 378(6624): 1092–1097

    Article  Google Scholar 

  18. Luo Z, Xu C, Zhao P, Sun Q, Geng X, Hu W, Tao C, Ma J, Lin Q, Jiang D. WizardCoder: Empowering code large language models with Evol-Instruct. In: Proceedings of the 12th International Conference on Learning Representations. 2024

    MATH  Google Scholar 

  19. Gunasekar S, Zhang Y, Aneja J, Mendes C C T, Del Giorno A, Gopi S, Javaheripi M, Kauffmann P, de Rosa G, Saarikivi O, Salim A, Shah S, Behl H S, Wang X, Bubeck S, Eldan R, Kalai A T, Lee Y T, Li Y. Textbooks are all you need. 2023, arXiv preprint arXiv: 2306.11644

  20. Bi X, Chen D, Chen G, Chen S, Dai D, et al. DeepSeek LLM: scaling open-source language models with longtermism. 2024, arXiv preprint arXiv: 2401.02954

  21. Zheng Q, Xia X, Zou X, Dong Y, Wang S, Xue Y, Shen L, Wang Z, Wang A, Li Y, Su T, Yang Z, Tang J. CodeGeeX: a pre-trained model for code generation with multilingual benchmarking on HumanEval-X. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023, 5673–5684

    Chapter  MATH  Google Scholar 

  22. Fried D, Aghajanyan A, Lin J, Wang S, Wallace E, Shi F, Zhong R, Yih S, Zettlemoyer L, Lewis M. InCoder: a generative model for code infilling and synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023

    Google Scholar 

  23. Chen X, Lin M, Schärli N, Zhou D. Teaching large language models to self-debug. In: Proceedings of the 12th International Conference on Learning Representations. 2024

    MATH  Google Scholar 

  24. Liu J, Xia C S, Wang Y, Zhang L. Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2024, 943

    MATH  Google Scholar 

  25. Deng Y, Xia C S, Peng H, Yang C, Zhang L. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2023, 423–435

    Chapter  MATH  Google Scholar 

  26. Wang W, Li G, Ma B, Xia X, Jin Z. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: Proceedings of the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering. 2020, 261–271

    MATH  Google Scholar 

  27. Gu J, Chen Z, Monperrus M. Multimodal representation for neural code search. In: Proceedings of 2021 IEEE International Conference on Software Maintenance and Evolution. 2021, 483–494

    MATH  Google Scholar 

  28. Arakelyan S, Hakhverdyan A, Allamanis M, Garcia L, Hauser C, Ren X. NS3: neuro-symbolic semantic code search. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2024, 761

    Google Scholar 

  29. Li Z, Pan M, Pei Y, Zhang T, Wang L, Li X. Empirically revisiting and enhancing automatic classification of bug and non-bug issues. Frontiers of Computer Science, 2024, 18(5): 185207

    Article  MATH  Google Scholar 

  30. Kanade A, Maniatis P, Balakrishnan G, Shi K. Learning and evaluating contextual embedding of source code. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 474

    MATH  Google Scholar 

  31. Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M. CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020. 2020, 1536–1547

    Chapter  MATH  Google Scholar 

  32. Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng S K, Clement C B, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M. GraphCodeBERT: pre-training code representations with data flow. In: Proceedings of the 9th International Conference on Learning Representations. 2021

    MATH  Google Scholar 

  33. Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J. UniXcoder: unified cross-modal pre-training for code representation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 7212–7225

    Google Scholar 

  34. Ahmad W, Chakraborty S, Ray B, Chang K W. Unified pre-training for program understanding and generation. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 2655–2668

    MATH  Google Scholar 

  35. Wang X, Wang Y, Mi F, Zhou P, Wan Y, Liu X, Li L, Wu H, Liu J, Jiang X. SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation. 2021, arXiv preprint arXiv: 2108.04556

  36. Clark K, Luong M T, Le Q V, Manning C D. Electra: pre-training text encoders as discriminators rather than generators. In: Proceedings of the 8th International Conference on Learning Representations. 2020

    MATH  Google Scholar 

  37. Hendrycks D, Basart S, Kadavath S, Mazeika M, Arora A, Guo E, Burns C, Puranik S, He H, Song D, Steinhardt J. Measuring coding challenge competence with APPS. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021

    Google Scholar 

  38. Austin J, Odena A, Nye M, Bosma M, Michalewski H, Dohan D, Jiang E, Cai C, Terry M, Le Q, Sutton C. Program synthesis with large language models. 2021, arXiv preprint arXiv: 2108.07732

  39. OpenAI. ChatGPT: optimizing language models for dialogue. 2022

    Google Scholar 

  40. Zhang T, Yu T, Hashimoto T B, Lewis M, Yih W T, Fried D, Wang S I. Coder reviewer reranking for code generation. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 41832–41846

    Google Scholar 

Download references

Acknowledgments

This research was supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 62076121, 61921006) and the Major Program (JD) of Hubei Province (2023BAA024). The authors would like to thank Hao-Yuan He and Hui Sun for helpful discussion.

Author information

Authors and Affiliations

Corresponding author

Correspondence to Ming Li.

Ethics declarations

Competing interests Ming Li is an Editorial Board member of the journal and a co-author of this article. To minimize bias, they were excluded from all editorial decision-making related to the acceptance of this article for publication. The remaining authors declare no conflict of interest.

Additional information

Zhicun Lyu obtained the BSc degree from Nanjing University, China in 2022. Currently, she is working toward the master degree with the School of Artificial Intelligence, Nanjing University, China. She is a member of the LAMDA Group. Her research interests mainly include machine learning and data mining, especially in software mining.

Xinye Li obtained the BSc and MSc degree from Nanjing University, China in 2020 and 2023 respectively. Currently, he is working toward the PhD degree with the School of Artificial Intelligence, Nanjing University, China. He is a member of the LAMDA Group. His research interests mainly include machine learning and data mining, especially in software mining. He has received a number of awards including National Scholarship, Outstanding Graduate of Nanjing University and so forth.

Zheng Xie obtained the PhD degree from Nanjing University, China in 2023 and the BEng degree from Xi’an Jiaotong University, China in 2016. Currently, he is working as a researcher at Huawei. His research interests mainly include machine learning and data mining. He has published over 10 papers in top-tier international journals or conference proceedings, including IEEE TPAMI, FCS, AAAI, IJCAI, ICML and so forth.

Ming Li is currently a professor at the School of Artificial Intelligence, Nanjing University, China. He is also a member of LAMDA group. His major research interests include machine learning and data mining, especially in software mining. He has served as the area chair of IJCAI, IEEE ICDM, etc., senior PC member of the premium conferences in artificial intelligence such as AAAI, and PC members for other premium conferences, such as KDD, NeurIPS, and ICML. He is the founding chair of the International Workshop on Software Mining. He has been granted various awards including the PAKDD Early Career Award, the NSFC Excellent Youth Award, the New Century Excellent Talents program of the Education Ministry of China, etc.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lyu, Z., Li, X., Xie, Z. et al. Top Pass: improve code generation by pass@k-maximized code ranking. Front. Comput. Sci. 19, 198341 (2025). https://doi.org/10.1007/s11704-024-40415-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-024-40415-9

Keywords