skip to main content
10.1145/3609437.3609459acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections

Measuring Efficient Code Generation with GEC

Published: 05 October 2023 Publication History


Although efficiency is one of the core metrics in programming, recent large-scale language models often face the issue of “inefficient code” generation, which struggles to meet the real-time requirements of algorithms. However, there is relatively little research on evaluating the selection of efficient algorithms, and it is not easy to rigorously assess a model’s ability to correctly choose efficient algorithm solutions. Furthermore, the selection of efficient algorithm solutions often relies on the appropriate application of problem-solving skills, necessitating more in-depth research on algorithm reasoning. To address this challenge, we introduce the Generation of Efficient Code (GEC) benchmark, which aims to evaluate the ability to select efficient algorithm solutions. Unlike code generation, our benchmark focuses on a model’s ability to generate satisfactory efficient code when given a natural language description and inefficient code. We propose two novel metrics to examine the efficiency of the generated code and assess the model’s ability to generate efficient code. Our benchmark includes 3,712 problems, 31,577 combinations of efficient and inefficient code pairs, and 13,092 alternative efficient codes. We evaluate the performance of mainstream code generation models on the GEC benchmark. As the societal importance of code efficiency increases in the coming years, our benchmark will provide an essential measurement standard for tracking research progress. Our dataset and models are open-source and can be accessed at


[1] Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, and Jian-Guang Lou. When neural model meets nl2code: A survey. arXiv preprint arXiv:2212.09420, 2022.
[2] Binghong Chen, Daniel Tarlow, Kevin Swersky, Martin Maas, Pablo Heiber, Ashish Naik, Milad Hashemi, and Parthasarathy Ranganathan. Learning to improve code efficiency. arXiv preprint arXiv:2208.05297, 2022.
[3] David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350, 2021.
[4] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
[5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
[6] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, J Kaplan, H Edwards, Y Burda, N Joseph, G Brockman, et al. Evaluating large language models trained on code.(2021). arXiv preprint arXiv:2107.03374, 2021.
[7] Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022.
[8] Sila Lertbanjongngam, Bodin Chinthanet, Takashi Ishio, Raula Gaikovina Kula, Pattara Leelaprute, Bundit Manaskasemsak, Arnon Rungsawang, and Kenichi Matsumoto. An empirical evaluation of competitive programming ai: A case study of alphacode. In 2022 IEEE 16th International Workshop on Software Clones (IWSC), pages 10–15. IEEE, 2022.
[9] Mike Mirzayanov. Codeforces.
[10] Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, et al. Program synthesis. Foundations and Trends® in Programming Languages, 4(1-2):1–119, 2017.
[11] Rajeev Alur, Dana Fisman, Rishabh Singh, and Armando Solar-Lezama. Sygus-comp 2017: Results and analysis. In 6th Workshop on Synthesis, SYNT 2017, pages 97–115. Open Publishing Association, 2017.
[12] Jonathon Cai, Richard Shin, and Dawn Song. Making neural programming architectures generalize via recursion. In International Conference on Learning Representations, 2016.
[13] Pengcheng Yin and Graham Neubig. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 440–450, 2017.
[14] Christopher W Fraser and David R Hanson. A retargetable C compiler: design and implementation. Addison-Wesley Longman Publishing Co., Inc., 1995.
[15] Daya Guo, Alexey Svyatkovskiy, Jian Yin, Nan Duan, Marc Brockschmidt, and Miltiadis Allamanis. Learning to complete code with sketches. In International Conference on Learning Representations, 2021.
[16] Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. Robustfill: Neural program learning under noisy i/o. In International conference on machine learning, pages 990–998. PMLR, 2017.
[17] Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, et al. Measuring coding challenge competence with apps. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
[18] Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
[19] Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample. Dobf: A deobfuscation pre-training objective for programming languages. Advances in Neural Information Processing Systems, 34:14967–14979, 2021.
[20] Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297, 2020.
[21] Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, 2021.
[22] Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, and Steven Chu Hong Hoi. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328, 2022.
[23] Loshchilov Ilya, Hutter Frank, et al. Decoupled weight decay regularization. Proceedings of ICLR, 7, 2019.

Cited By

View all
  • (2025)Measuring code efficiency optimization capabilities with ACEOBJournal of Systems and Software10.1016/j.jss.2024.112250219(112250)Online publication date: Jan-2025
  • (2024)Software engineering education in the era of conversational AI: current trends and future directionsFrontiers in Artificial Intelligence10.3389/frai.2024.14363507Online publication date: 29-Aug-2024



Information & Contributors


Published In

cover image ACM Other conferences
Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware
August 2023
332 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2023


Request permissions for this article.

Check for updates

Author Tags

  1. Benchmark Datasets
  2. Code Efficiency Optimization
  3. Code Generation


  • Research-article
  • Research
  • Refereed limited


Internetware 2023

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Feb 2025

Other Metrics


Cited By

View all
  • (2025)Measuring code efficiency optimization capabilities with ACEOBJournal of Systems and Software10.1016/j.jss.2024.112250219(112250)Online publication date: Jan-2025
  • (2024)Software engineering education in the era of conversational AI: current trends and future directionsFrontiers in Artificial Intelligence10.3389/frai.2024.14363507Online publication date: 29-Aug-2024

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.


HTML Format

View this article in HTML Format.

HTML Format






Share this Publication link

Share on social media