Abstract
Training a deep learning model on source code has gained significant traction recently. Since such models reason about vectors of numbers, source code needs to be converted to a code representation before vectorization. Numerous approaches have been proposed to represent source code, from sequences of tokens to abstract syntax trees. However, there is no systematic study to understand the effect of code representation on learning performance. Through a controlled experiment, we examine the impact of various code representations on model accuracy and usefulness in deep learning-based program repair. We train 21 different generative models that suggest fixes for name-based bugs, including 14 different homogeneous code representations, four mixed representations for the buggy and fixed code, and three different embeddings. We assess if fix suggestions produced by the model in various code representations are automatically patchable, meaning they can be transformed to a valid code that is ready to be applied to the buggy code to fix it. We also conduct a developer study to qualitatively evaluate the usefulness of inferred fixes in different code representations. Our results highlight the importance of code representation and its impact on learning and usefulness. Our findings indicate that (1) while code abstractions help the learning process, they can adversely impact the usefulness of inferred fixes from a developer’s point of view; this emphasizes the need to look at the patches generated from the practitioner’s perspective, which is often neglected in the literature, (2) mixed representations can outperform homogeneous code representations, (3) bug type can affect the effectiveness of different code representations; although current techniques use a single code representation for all bug types, there is no single best code representation applicable to all bug types.
Similar content being viewed by others
Notes
The impact of code representation on learning-based repair replication package. https://github.com/annon-reptory/reptory (2021)
The impact of code representation on learning-based repair replication package. https://github.com/annon-reptory/reptory (2021)
The impact of code representation on learning-based repair replication package. https://github.com/annon-reptory/reptory (2021)
The impact of code representation on learning-based repair replication package. https://github.com/annon-reptory/reptory (2021)
References
Ahmed T, Devanbu P, Hellendoorn VJ (2021) Learning lenient parsing & typing via indirect supervision. Empirical Software Engineering 26(2)
Allamanis M (2019) The adverse effects of code duplication in machine learning models of code. In: ACM SIGPLAN international symposium on new ideas, new paradigms, and reflections on programming and software, Onward! 2019, pp 143–153. Association for Computing Machinery
Allamanis M, Barr ET, Bird C, Sutton C (2015) Suggesting accurate method and class names. In: 10th Joint meeting on foundations of software engineering, ESEC/FSE 2015, pp 38–49. Association for Computing Machinery
Andrews JH, Briand LC, Labiche Y (2005) Is mutation an appropriate tool for testing experiments?. In: 27th International conference on software engineering, pp 402–411. Association for Computing Machinery
Bader J, Scott A, Pradel M, Chandra S (2019) Getafix: learning to fix bugs automatically. Proc. ACM Program. Lang 3(OOPSLA)
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, ICLR 2015
Bengio Y (2012) Practical recommendations for gradient-based training of deep architectures. In: Neural networks: tricks of the trade - second edition, lecture notes in computer science, vol 7700, pp 437–478. Springer
Bielik P, Raychev V, Vechev M (2016) Phog: probabilistic model for code. In: 33rd International conference on international conference on machine learning - volume 48, ICML’16, pp 2933–2942. JMLR.org
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Ling 5:135–146
Briem JA, Smit J, Sellik H, Rapoport P, Gousios G, Aniche M (2020) Offside: learning to identify mistakes in boundary conditions. In: IEEE/ACM 42nd international conference on software engineering workshops, ICSEW’20, pp 203–208. Association for Computing Machinery
Brody S, Alon U, Yahav E (2020) A structural model for contextual code changes. Proc. ACM Program. Lang 4(OOPSLA)
Chakraborty S, Ding Y, Allamanis M, Ray B (2020) Codit: code editing with tree-based neural models. IEEE Trans Softw Eng
Chandra S, Torlak E, Barman S, Bodik R (2011) Angelic debugging. In: 33rd International conference on software engineering, ICSE ’11, pp 121–130. Association for Computing Machinery
Chen Z, Monperrus M (2018) The remarkable role of similarity in redundancy-based program repair. arXiv:1811.05703
Chen Z, Monperrus M (2019) A literature study of embeddings on source code. arXiv:1904.03061
Chen Z, Kommrusch SJ, Tufano M, Pouchet L, Poshyvanyk D, Monperrus M (2019) Sequencer: sequence-to-sequence learning for end-to-end program repair. IEEE Trans Softw Eng, 1–1
Devlin J, Uesato J, Singh R, Kohli P (2017) Semantic code repair using neuro-symbolic transformation networks. arXiv:1710.11054
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT, pp 4171–4186. Association for Computational Linguistics
Dinella E, Dai H, Li Z, Naik M, Song L, Wang K (2020) Hoppity: learning graph transformations to detect and fix bugs in programs. In: 8th International conference on learning representations, ICLR 2020. OpenReview.net
Ding Y, Ray B, Devanbu P, Hellendoorn VJ (2020) Patching as translation: the data and the metaphor. In: 35th IEEE/ACM International conference on automated software engineering, ASE ’20, pp 275–286. Association for Computing Machinery
Dolan-Gavitt B, Hulin P, Kirda E, Leek T, Mambretti A, Robertson W, Ulrich F, Whelan R (2016) Lava: large-scale automated vulnerability addition. In: 2016 IEEE symposium on security and privacy (SP), pp 110–121
Durieux T, Monperrus M (2016) Dynamoth: dynamic code synthesis for automatic program repair. In: 2016 IEEE/ACM 11th international workshop in automation of software test (AST), pp 85–91
Eshkevari LM, Arnaoudova V, Di Penta M, Oliveto R, Guéhéneuc YG, Antoniol G (2011) An exploratory study of identifier renamings. In: 8th Working conference on mining software repositories, MSR ’11, pp 33–42. Association for Computing Machinery
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the association for computational linguistics: EMNLP 2020, pp 1536–1547. Association for Computational Linguistics
Gopinath R, Jensen C, Groce A (2014) Mutations: how close are they to real faults?. In: 2014 IEEE 25th International symposium on software reliability engineering, ISSRE ’14, pp 189–200. IEEE Computer Society
Gupta R, Pal S, Kanade A, Shevade S (2017) Deepfix: fixing common c language errors by deep learning. In: Thirty-First AAAI conference on artificial intelligence, AAAI’17, pp 1345–1351. AAAI Press
Gupta R, Kanade A, Shevade S (2018) Deep reinforcement learning for programming language correction. arXiv:1801.10467
Hajipour H, Bhattacharyya A, Fritz M (2020) Samplefix: learning to correct programs by efficient sampling of diverse fixes. In: NeurIPS 2020 Workshop on computer-assisted programming
Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: 31st International conference on neural information processing systems, NIPS’17, pp 1025–1035. Curran Associates Inc
Hanam Q, Brito FSdM, Mesbah A (2016) Discovering bug patterns in javascript. In: 24th ACM SIGSOFT international symposium on foundations of software engineering, FSE 2016, pp 144–156. Association for Computing Machinery
Haque S, LeClair A, Wu L, McMillan C (2020) Improved automatic summarization of subroutines via attention to file context. In: 17th International conference on mining software repositories, MSR ’20, pp 300–310. Association for Computing Machinery
Hartmann B, MacDougall D, Brandt J, Klemmer SR (2010) What would other programmers do: suggesting solutions to error messages. In: SIGCHI Conference on human factors in computing systems, CHI ’10, pp 1019–1028. Association for Computing Machinery
Hata H, Shihab E, Neubig G (2018) Learning to generate corrective patches using neural machine translation. arXiv:1812.07170
Hu X, Li G, Xia X, Lo D, Jin Z (2018) Deep code comment generation. In: 26th Conference on program comprehension, ICPC ’18, pp 200–210. Association for Computing Machinery
Jeffrey D, Feng M, Gupta N, Gupta R (2009) Bugfix: a learning-based tool to assist developers in fixing bugs. In: 2009 IEEE 17th International conference on program comprehension (ICPC 2009). IEEE Computer Society
Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678
Jiang N, Lutellier T, Tan L (2021) Cure: code-aware neural machine translation for automatic program repair. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), pp 1161–1173. IEEE Computer Society
Just R, Jalali D, Inozemtseva L, Ernst MD, Holmes R, Fraser G (2014) Are mutants a valid substitute for real faults in software testing?. In: 22nd ACM SIGSOFT International symposium on foundations of software engineering, FSE 2014. Association for Computing Machinery
Kaleeswaran S, Tulsian V, Kanade A, Orso A (2014) Minthint: automated synthesis of repair hints. In: 36th International conference on software engineering, ICSE 2014, pp 266–276. Association for Computing Machinery
Kanade A, Maniatis P, Balakrishnan G, Shi K (2020) Learning and evaluating contextual embedding of source code. In: 37th International conference on machine learning, ICML 2020, proceedings of machine learning research, vol 119, pp 5110–5121. PMLR
Karampatsis RM, Sutton C (2020) How often do single-statement bugs occur? the manysstubs4j dataset. In: 17th International conference on mining software repositories, MSR ’20, pp 573–577. Association for Computing Machinery
Karampatsis RM, Sutton C (2020) Scelmo: Source code embeddings from language models. arXiv:2004.13214
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations, ICLR
Koyuncu A, Liu K, Bissyandé TF, Kim D, Monperrus M, Klein J, Le Traon Y (2019) Ifixr: bug report driven program repair. In: 27th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2019, pp 314–325. Association for Computing Machinery
Le Goues C, Nguyen T, Forrest S, Weimer W (2012) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 38:54–72
Li G, Liu H, Jin J, Umer Q (2020a) Deep learning based identification of suspicious return statements. In: 2020 IEEE 27th International conference on software analysis, evolution and reengineering (SANER), pp 480–491
Li X, Wang L, Xin Y, Yang Y, Chen Y (2020b) Automated vulnerability detection in source code using minimum intermediate representation learning. Appl Sci 10:1692
Li Y, Wang S, Nguyen TN (2020c) Dlfix: context-based code transformation learning for automated program repair. In: ACM/IEEE 42nd international conference on software engineering, ICSE ’20, pp 602–614. Association for Computing Machinery
Lin CY, Och FJ (2004) Orange: a method for evaluating automatic evaluation metrics for machine translation. In: The 20th international conference on computational linguistics, pp 501–507. COLING
Liu H, Liu Q, Staicu CA, Pradel M, Luo Y (2016) Nomen est omen: exploring and exploiting similarities between argument and parameter names. In: 38th International conference on software engineering, ICSE ’16, pp 1063–1073. Association for Computing Machinery
Liu K, Koyuncu A, Kim K, Kim D, Bissyandé TF (2018) Lsrepair: live search of fix ingredients for automated program repair. In: 2018 25th Asia-Pacific software engineering conference (APSEC), pp 658–662. IEEE Computer Society
Liu K, Koyuncu A, Bissyandé TF, Kim D, Klein J, Traon YL (2019) You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems. In: 2019 12th IEEE conference on software testing, validation and verification (ICST), pp 102–113
Liu Q, Kusner MJ, Blunsom P (2020) A survey on contextual embeddings. arXiv:2003.07278
Long F, Rinard M (2016) Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL ’16, pp 298–312. Association for Computing Machinery
Luong T, Pham H, Manning C (2015) Effective approaches to attention-based neural machine translation. In: 2015 Conference on empirical methods in natural language processing, pp 1412–1421. Association for Computational Linguistics
Luong M, Brevdo E, Zhao R (2017) Neural machine translation (seq2seq) tutorial. https://github.com/tensorflow/nmt
Lutellier T, Pham HV, Pang L, Li Y, Wei M, Tan L (2020) Coconut: combining context-aware neural translation models using ensemble for program repair. In: 29th ACM SIGSOFT International symposium on software testing and analysis, ISSTA 2020, pp 101–114. Association for Computing Machinery
Malik MZ, Siddiqui JH, Khurshid S (2011) Constraint-based program debugging using data structure repair. In: 2011 Fourth IEEE international conference on software testing, verification and validation, pp 190–199
Malik RS, Patra J, Pradel M (2019) Nl2type: inferring javascript function types from natural language information. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), pp 304–315
Marginean A, Bader J, Chandra S, Harman M, Jia Y, Mao K, Mols A, Scott A (2019a) Sapfix: automated end-to-end repair at scale. In: 2019 IEEE/ACM 41st international conference on software engineering: software engineering in practice (ICSE-SEIP), pp 269–278
Marginean A, Bader J, Chandra S, Harman M, Jia Y, Mao K, Mols A, Scott A (2019b) Sapfix: automated end-to-end repair at scale. In: 41st International conference on software engineering: software engineering in practice, ICSE-SEIP ’19, pp 269–278. IEEE Press
McCann B, Bradbury J, Xiong C, Socher R (2017) Learned in translation: contextualized word vectors. In: Advances in neural information processing systems, vol 30. Curran Associates, Inc
Mehne B, Yoshida H, Prasad M, Sen K, Gopinath D, Khurshid S (2018) Accelerating search-based program repair. In: 2018 IEEE 11th international conference on software testing, verification and validation (ICST), pp 227–238
Mesbah A, Rice A, Johnston E, Glorioso N, Aftandilian E (2019) Deepdelta: learning to repair compilation errors. In: 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2019, pp 925–936. Association for Computing Machinery
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. In: 1st international conference on learning representations, ICLR 2013, Workshop Track Proceedings
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: 26th International conference on neural information processing systems - volume 2, NIPS’13, pp 3111–3119. Curran Associates Inc
Monperrus M (2014) A critical review of “automatic patch generation learned from human-written patches”: essay on the problem statement and the evaluation of automatic software repair. In: 36th International conference on software engineering, ICSE 2014, pp 234–242. Association for Computing Machinery
Monperrus M (2018) Automatic software repair: a bibliography. ACM Comput Surv 51(1)
Nguyen AT, Nguyen TN (2015) Graph-based statistical language model for code. In: 37th International conference on software engineering - volume 1, ICSE ’15, pp 858–868. IEEE Press
Nguyen X, Joty SR, Hoi SCH, Socher R (2020) Tree-structured attention with hierarchical accumulation. In: 8th International conference on learning representations, ICLR 2020. OpenReview.net
Pan K, Kim S, Whitehead EJ (2009) Toward an understanding of bug fix patterns. Empir Softw Eng 14(3):286–315
Pandita R, Xiao X, Zhong H, Xie T, Oney S, Paradkar A (2012) Inferring method specifications from natural language api descriptions. In: 2012 34th International conference on software engineering (ICSE), pp 815–825
Pandita R, Taneja K, Williams L, Tung T (2016) Icon: Inferring temporal constraints from natural language api descriptions. In: 2016 IEEE international conference on software maintenance and evolution (ICSME), pp 378–388
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th Annual meeting on association for computational linguistics, ACL ’02, pp 311–318. Association for Computational Linguistics
Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: 2014 Conference on empirical methods in natural language processing (EMNLP), pp 1532–1543. Association for Computational Linguistics
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: 2018 Conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long Papers), pp 2227–2237. Association for Computational Linguistics
Pewny J, Holz T (2016) Evilcoder: automated bug insertion. In: Proceedings of the 32nd annual conference on computer security applications, ACSAC ’16, pp 214–225. Association for Computing Machinery
Pradel M, Gross TR (2011) Detecting anomalies in the order of equally-typed method arguments. In: 2011 International symposium on software testing and analysis, ISSTA ’11, pp 232–242. Association for Computing Machinery
Pradel M, Gross TR (2013) Name-based analysis of equally typed method arguments. IEEE Trans Softw Eng 39(8):1127–1143
Pradel M, Sen K (2018) Deepbugs: a learning approach to name-based bug detection. Proc. ACM Program. Lang 2(OOPSLA)
Raychev V, Vechev M, Krause A (2015) Predicting program properties from “big code”. In: 42nd Annual ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL ’15, pp 111–124. Association for Computing Machinery
Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. SIGPLAN Not 51(1):761–774
Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: 2019 Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3982–3992. Association for Computational Linguistics
Rice A, Aftandilian E, Jaspan C, Johnston E, Pradel M, Arroyo-Paredes Y (2017) Detecting argument selection defects. Proc. ACM Program. Lang 1(OOPSLA)
Samimi H, Schäfer M, Artzi S, Millstein T, Tip F, Hendren L (2012) Automated repair of html generation errors in php applications using string constraint solving. In: 34th International conference on software engineering, ICSE ’12, pp 277–287. IEEE Press
Sato R (2020) A survey on the expressive power of graph neural networks. arXiv:2003.04078
Schramm L (2017) Improving performance of automatic program repair using learned heuristics. In: 11th Joint meeting on foundations of software engineering, ESEC/FSE 2017, pp 1071–1073. Association for Computing Machinery
Scott R, Ranieri J, Kot L, Kashyap V (2020) Out of sight, out of place: detecting and assessing swapped arguments. In: 2020 IEEE 20th International working conference on source code analysis and manipulation (SCAM), pp 227–237
Sellik H, van Paridon O, Gousios G, Aniche M (2021) Learning off-by-one mistakes: an empirical study. In: 2021 IEEE/ACM 18th international conference on mining software repositories (MSR), pp 58–67
Tang G, Meng L, Wang H, Ren S, Wang Q, Yang L, Cao W (2020) A comparative study of neural network techniques for automatic software vulnerability detection. In: 2020 International symposium on theoretical aspects of software engineering (TASE), pp 1–8. IEEE Computer Society
Tarlow D, Moitra S, Rice A, Chen Z, Manzagol PA, Sutton C, Aftandilian E (2020) Learning to fix build errors with graph2diff neural networks. In: Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops, pp 19–20. Association for Computing Machinery
Tufano M, Watson C, Bavota G, Di Penta M, White M, Poshyvanyk D (2018a) Deep learning similarities from different representations of source code. In: 2018 IEEE/ACM 15th International conference on mining software repositories (MSR), pp 542–553
Tufano M, Watson C, Bavota G, Di Penta M, White M, Poshyvanyk D (2018b) An empirical investigation into learning bug-fixing patches in the wild via neural machine translation. In: 33rd ACM/IEEE International conference on automated software engineering, ASE 2018, pp 832–837. Association for Computing Machinery
Tufano M, Pantiuchina J, Watson C, Bavota G, Poshyvanyk D (2019a) On learning meaningful code changes via neural machine translation. In: 41st International conference on software engineering, ICSE ’19, pp 25–36. IEEE Press
Tufano M, Watson C, Bavota G, Penta MD, White M, Poshyvanyk D (2019b) An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology
Tufano R, Pascarella L, Tufano M, Poshyvanyk D, Bavota G (2021) Towards automating code review activities. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), pp 163–174. IEEE Computer Society
Vasic M, Kanade A, Maniatis P, Bieber D, singh R (2019) Neural program repair by jointly learning to localize and repair. In: International conference on learning representations. OpenReview.net
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30. Curran Associates, Inc
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: International conference on learning representations
Wainakh Y, Rauf M, Pradel M (2021) Idbench: evaluating semantic representations of identifier names in source code. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), pp 562–573
Wang K, Christodorescu M (2019) COSET: a benchmark for evaluating neural program embeddings. arXiv:1905.11445
Watson C, Tufano M, Moran K, Bavota G, Poshyvanyk D (2020) On learning meaningful assert statements for unit test cases. In: ACM/IEEE 42nd international conference on software engineering, ICSE ’20, pp 1398–1409. Association for Computing Machinery
White M, Tufano M, Martínez M, Monperrus M, Poshyvanyk D (2019) Sorting and transforming program repair ingredients via deep learning code similarities. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER), pp 479–490
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media
Wotawa F, Nica M, Nica I (2012) Automated debugging based on a constraint model of the program and a test case. J Logic Algebraic Program 81:390–407
Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2021) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32 (1):4–24
Zhang J, Wang X, Zhang H, Sun H, Wang K, Liu X (2019) A novel neural source code representation based on abstract syntax tree. In: ICSE, pp 783–794
Zhao R, Bieber D, Swersky K, Tarlow D (2019) Neural networks for modeling source code edits. arXiv:1904.02818
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Andrea Stocco, Onn Shehory, Gunel Jahangirova, Vincenzo Riccio
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Software Testing in the Machine Learning Era.
Marjane Namavar and Noor Nashid contributed equally to this work.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Namavar, M., Nashid, N. & Mesbah, A. A controlled experiment of different code representations for learning-based program repair. Empir Software Eng 27, 190 (2022). https://doi.org/10.1007/s10664-022-10223-5
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10223-5