ABSTRACT
Very large-scale (VLS) deep learning models are capable of generating meaningful code snippets, yet the performance drops dramatically when the coding task becomes more complex. Although fully neural approaches have been proposed to solve this problem, the value of the application is still limited. In our work, we propose a neuro-symbolic approach that integrates the symbolic natures of programming and the existing neural language models. We divide a programming task into three phases: forming a hierarchical task composed of functions, completing each function, and fulfilling the corner cases. Because each phase can be completed by language models, the coding process can be fully automated. Our contribution is three-fold. Firstly, we show that with little help from humans, VLS language models are capable of completing non-trivial programming tasks. Secondly, we provide a number of empirical insights to create prompt templates that help the language models generate better code. Thirdly, compared to the existing approaches, our work provides a much more practical approach for programmers and researchers to follow. The generated programming project using our fully automated programming approach and part of the ablation study code are available at https://github.com/BiEchi/FAP.
Supplemental Material
Available for Download
- Hao Bai. 2022. Modern Distributed Data-Parallel Large-Scale Pre-training Strategies For NLP models. In 2022 6th International Conference on High Performance Compilation, Computing and Communications (HP3C). 44–53.Google ScholarDigital Library
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.Google Scholar
- Roderic G Cattell. 1977. A survey and critique of some models of code generation. Technical Report. CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE.Google Scholar
- Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).Google Scholar
- Conrad Czejdo and Sambit Bhattacharya. 2022. Increasing Accessibility of Language Models with Multi-stage Information Extraction. Journal of Advances in Information Technology Vol 13, 2 (2022).Google ScholarCross Ref
- GitHub. 2022. Github copilot: your ai pair programmer. (2022). https://github.com/features/copilotGoogle Scholar
- Shirley Anugrah Hayati, Raphael Olivier, Pravalika Avvaru, Pengcheng Yin, Anthony Tomasic, and Graham Neubig. 2018. Retrieval-based neural code generation. arXiv preprint arXiv:1808.10025 (2018).Google Scholar
- Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, 2022. Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814 (2022).Google Scholar
- Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 (2021).Google ScholarDigital Library
- Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021).Google Scholar
- Tomáš Mikolov, Anoop Deoras, Daniel Povey, Lukáš Burget, and Jan Černock`y. 2011. Strategies for training large scale neural network language models. In 2011 IEEE Workshop on Automatic Speech Recognition & Understanding. IEEE, 196–201.Google Scholar
- Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi, and Hannaneh Hajishirzi. 2021. Reframing Instructional Prompts to GPTk's Language. arXiv preprint arXiv:2109.07830 (2021).Google Scholar
Index Terms
- A Practical Three-phase Approach To Fully Automated Programming Using System Decomposition And Coding Copilots
Comments