ABSTRACT
Code generation aims to automatically generate the source code based on given natural language (NL) descriptions, which is of great significance for automated software development. Some code generation models follow a language model-based paradigm (LMBP) to generate source code tokens sequentially. Some others focus on deriving the grammatical structure by generating the program’s abstract syntax tree (AST), i.e., using the grammatical structure-based paradigm (GSBP). Existing studies are trying to generate code through one of the above two models. However, human developers often consider both paradigms: building the grammatical structure of the code and writing source code sentences according to the language model. Therefore, we argue that code generation should consider both GSBP and LMBP. In this paper, we use mutual learning to combine two classes of models to make the two different paradigms train together. To implement the mutual learning framework, we design alignment methods between code and AST. Under this framework, models can be enhanced through shared encoders and knowledge interaction in aligned training steps. We experiment on three Python-based code generation datasets. Experimental results and ablation analysis confirm the effectiveness of our approach. Our results demonstrate that considering both GSBP and LMBP is helpful in improving the performance of code generation.
- Rajas Agashe, Srinivasan Iyer, and Luke Zettlemoyer. 2019. JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 5435–5445.Google Scholar
- Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In NAACL-HLT. Association for Computational Linguistics, 2655–2668.Google Scholar
- Ruisheng Cao, Su Zhu, Chen Liu, Jieyu Li, and Kai Yu. 2019. Semantic Parsing with Dual Learning. In ACL (1). Association for Computational Linguistics, 51–64.Google Scholar
- Huimin Chen, Yankai Lin, Fanchao Qi, Jinyi Hu, Peng Li, Jie Zhou, and Maosong Sun. 2021. Aspect-Level Sentiment-Controllable Review Generation with Mutual Learning Framework. In AAAI. AAAI Press, 12639–12647.Google Scholar
- Li Dong and Mirella Lapata. 2016. Language to Logical Form with Neural Attention. In ACL (1). The Association for Computer Linguistics.Google Scholar
- Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. 2023. Self-collaboration Code Generation via ChatGPT. CoRR abs/2304.07590 (2023).Google Scholar
- Yihong Dong, Ge Li, and Zhi Jin. 2022. CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation. CoRR abs/2211.00818 (2022).Google Scholar
- Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In ACL (1). Association for Computational Linguistics, 7212–7225.Google Scholar
- Jessica B. Hamrick. 2016. Creating and Grading IPython/Jupyter Notebook Assignments with NbGrader. In SIGCSE. ACM, 242.Google Scholar
- Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. 2012. On the naturalness of software. In ICSE. IEEE Computer Society, 837–847.Google ScholarDigital Library
- Peixian Hong, Tao Wu, Ancong Wu, Xintong Han, and Wei-Shi Zheng. 2021. Fine-Grained Shape-Appearance Mutual Learning for Cloth-Changing Person Re-Identification. In CVPR. Computer Vision Foundation / IEEE, 10513–10522.Google Scholar
- Robin Jia and Percy Liang. 2016. Data Recombination for Neural Semantic Parsing. In ACL (1). The Association for Computer Linguistics.Google Scholar
- Hui Jiang, Chulun Zhou, Fandong Meng, Biao Zhang, Jie Zhou, Degen Huang, Qingqiang Wu, and Jinsong Su. 2021. Exploring Dynamic Selection of Branch Expansion Orders for Code Generation. In ACL/IJCNLP (1). Association for Computational Linguistics, 5076–5085.Google Scholar
- Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2021. Code Prediction by Feeding Trees to Transformers. In ICSE. IEEE, 150–162.Google Scholar
- Tomasz Korbak, Hady Elsahar, Marc Dymetman, and Germán Kruszewski. 2021. Energy-Based Models for Code Generation under Compilability Constraints. CoRR abs/2106.04985 (2021).Google Scholar
- Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, Tomás Kociský, Fumin Wang, and Andrew W. Senior. 2016. Latent Predictor Networks for Code Generation. In ACL (1). The Association for Computer Linguistics.Google Scholar
- Fang Liu, Jia Li, and Li Zhang. 2023. Syntax and Domain Aware Model for Unsupervised Program Translation. In ICSE. IEEE, 755–767.Google Scholar
- Sajad Norouzi, Keyi Tang, and Yanshuai Cao. 2021. Code Generation from Natural Language with Less Prior Knowledge and More Monolingual Data. In ACL/IJCNLP (2). Association for Computational Linguistics, 776–785.Google Scholar
- Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract Syntax Networks for Code Generation and Semantic Parsing. In ACL (1). Association for Computational Linguistics, 1139–1149.Google Scholar
- Veselin Raychev, Martin T. Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In PLDI. ACM, 419–428.Google Scholar
- Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. CoRR abs/2009.10297 (2020).Google Scholar
- Zeyu Sun, Qihao Zhu, Yingfei Xiong, Yican Sun, Lili Mou, and Lu Zhang. 2020. TreeGen: A Tree-Based Transformer Architecture for Code Generation. In AAAI. AAAI Press, 8984–8991.Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS. 3104–3112.Google Scholar
- Sindhu Tipirneni, Ming Zhu, and Chandan K. Reddy. 2022. StructCoder: Structure-Aware Transformer for Code Generation. CoRR abs/2206.05239 (2022).Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998–6008.Google Scholar
- Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. In NIPS. 2692–2700.Google Scholar
- Xin Wang, Yasheng Wang, Yao Wan, Fei Mi, Yitong Li, Pingyi Zhou, Jin Liu, Hao Wu, Xin Jiang, and Qun Liu. 2022. Compilable Neural Code Generation with Compiler Feedback. In ACL (Findings). Association for Computational Linguistics, 9–19.Google Scholar
- Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In EMNLP (1). Association for Computational Linguistics, 8696–8708.Google Scholar
- Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, and Zhi Jin. 2019. Code Generation as a Dual Task of Code Summarization. In NeurIPS. 6559–6569.Google Scholar
- Binbin Xie, Jinsong Su, Yubin Ge, Xiang Li, Jianwei Cui, Junfeng Yao, and Bin Wang. 2021. Improving Tree-Structured Decoder Training for Code Generation via Mutual Learning. In AAAI. AAAI Press, 14121–14128.Google Scholar
- Frank F. Xu, Zhengbao Jiang, Pengcheng Yin, Bogdan Vasilescu, and Graham Neubig. 2020. Incorporating External Knowledge through Pre-training for Natural Language to Code Generation. In ACL. Association for Computational Linguistics, 6045–6052.Google Scholar
- Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. In ACL (1). Association for Computational Linguistics, 440–450.Google Scholar
- Pengcheng Yin and Graham Neubig. 2018. TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation. In EMNLP (Demonstration). Association for Computational Linguistics, 7–12.Google Scholar
- Pengcheng Yin and Graham Neubig. 2019. Reranking for Neural Semantic Parsing. In ACL (1). Association for Computational Linguistics, 4553–4559.Google Scholar
- Ying Zhang, Tao Xiang, Timothy M. Hospedales, and Huchuan Lu. 2018. Deep Mutual Learning. In CVPR. Computer Vision Foundation / IEEE Computer Society, 4320–4328.Google Scholar
- Jiawei Zhao, Wei Luo, Boxing Chen, and Andrew Gilman. 2021. Mutual-Learning Improves End-to-End Speech Translation. In EMNLP (1). Association for Computational Linguistics, 3989–3994.Google Scholar
Index Terms
- Seq2Seq or Seq2Tree: Generating Code Using Both Paradigms via Mutual Learning
Recommendations
CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and AnalysisGeneral-purpose code generation aims to automatically convert the natural language description to code snippets in a general-purpose programming language (GPL) such as Python. In the process of code generation, it is essential to guarantee the ...
Using dynamic programming to generate optimized code in a Graham-Glanville style code generator
Proceedings of the SIGPLAN '84 symposium on compiler constructionWe have performed an investigation of using a dynamic programming to generate optimized code in a Graham-Glanville style code generator We use Earley's algorithm rather than an IR algorithm for parsing in the code generator Not only does the use of ...
Two Birds with One Stone: Boosting Code Generation and Code Search via a Generative Adversarial Network
Automatically transforming developers' natural language descriptions into source code has been a longstanding goal in software engineering research. Two types of approaches have been proposed in the literature to achieve this: code generation, which ...
Comments