research-article

Neuro-symbolic program corrector for introductory programming assignments

Authors:

Pushmeet Kohli,

Rishabh SinghAuthors Info & Claims

ICSE '18: Proceedings of the 40th International Conference on Software Engineering

Pages 60 - 70

https://doi.org/10.1145/3180155.3180219

Published: 27 May 2018 Publication History

Abstract

Automatic correction of programs is a challenging problem with numerous real world applications in security, verification, and education. One application that is becoming increasingly important is the correction of student submissions in online courses for providing feedback. Most existing program repair techniques analyze Abstract Syntax Trees (ASTs) of programs, which are unfortunately unavailable for programs with syntax errors. In this paper, we propose a novel Neuro-symbolic approach that combines neural networks with constraint-based reasoning. Specifically, our method first uses a Recurrent Neural Network (RNN) to perform syntax repairs for the buggy programs; subsequently, the resulting syntactically-fixed programs are repaired using constraint-based techniques to ensure functional correctness. The RNNs are trained using a corpus of syntactically correct submissions for a given programming assignment, and are then queried to fix syntax errors in an incorrect programming submission by replacing or inserting the predicted tokens at the error location. We evaluate our technique on a dataset comprising of over 14,500 student submissions with syntax errors. Our method is able to repair syntax errors in 60% (8689) of submissions, and finds functionally correct repairs for 23.8% (3455) submissions.

References

[1]

Miltiadis Allamanis and Charles A. Sutton. Mining source code repositories at massive scale using language modeling. In MSR, pages 207--216, 2013.

Digital Library

[2]

Miltiadis Allamanis and Charles A. Sutton. Mining idioms from source code. In FSE, pages 472--483, 2014.

Digital Library

[3]

Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. Learning natural coding conventions. In FSE, pages 281--293, 2014.

Digital Library

[4]

Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. Suggesting accurate method and class names. In FSE, pages 38--49, 2015.

Digital Library

[5]

Sumit Basu, Chuck Jacobs, and Lucy Vanderwende. Powergrading: a clustering approach to amplify human effort for short answer grading. TACL, 1:391--402, 2013.

[6]

Joshua Charles Campbell, Abram Hindle, and José Nelson Amaral. Syntax errors just aren't natural: Improving error reporting with language models. In MSR 2014, pages 252--261, 2014.

Digital Library

[7]

Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In Advances in Neural Information Processing Systems, pages 577--585, 2015.

Digital Library

[8]

Loris D'Antoni, Roopsha Samanta, and Rishabh Singh. Qlose: Program repair with quantitative objectives. In CAV, pages 383--401, 2016.

[9]

Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. Robustfill: Neural program learning under noisy I/O. In ICML, pages 990--998, 2017.

[10]

C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. Genprog: A generic method for automatic software repair. IEEE Transactions on Software Engineering, 38(1): 54--72, 2012.

Digital Library

[11]

Claire Goues, Stephanie Forrest, and Westley Weimer. Current challenges in automatic software repair. Software Quality Journal, 21(3):421--443, September 2013.

Digital Library

[12]

Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. Deepfix: Fixing common c language errors by deep learning. In AAAI, 2017.

[13]

Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. On the naturalness of software. In ICSE, pages 837--847, 2012.

Digital Library

[14]

Chinmay Kulkarni, Michael S. Bernstein, and Scott R. Klemmer. Peerstudio: Rapid peer feedback emphasizes revision and improves performance. In Learning @ Scale, pages 75--84, 2015.

Digital Library

[15]

Chinmay Eishan Kulkarni, Pang Wei Wei, Huy Le, Daniel Jin hao Chia, Kathryn Papadopoulos, Justin Cheng, Daphne Koller, and Scott R. Klemmer. Peer and self assessment in massive online classes. ACM Trans. Comput.-Hum. Interact., 20(6): 33, 2013.

Digital Library

[16]

Andrew S. Lan, Divyanshu Vats, Andrew E. Waters, and Richard G. Baraniuk. Mathematical language processing: Automatic grading and feedback for open response mathematical questions. In Learning@Scale, pages 167--176, 2015.

Digital Library

[17]

Andy Nguyen, Christopher Piech, Jonathan Huang, and Leonidas J. Guibas. Codewebs: scalable homework search for massive open online programming courses. In WWW, pages 491--502, 2014.

Digital Library

[18]

Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. Semfix: program repair via semantic analysis. In ICSE, pages 772--781, 2013.

Digital Library

[19]

Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N.Nguyen. A statistical semantic language model for source code. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, 2013.

Digital Library

[20]

Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. Neuro-symbolic program synthesis. In ICLR, 2017.

[21]

Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, and Leonidas J. Guibas. Learning program embeddings to propagate feedback on student code. In ICML, pages 1093--1102, 2015.

Digital Library

[22]

Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. sk_p: a neural program corrector for moocs. CoRR, abs/1607.02902, 2016.

[23]

Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. On the naturalness of buggy code. In ICSE, pages 428--439, 2016.

Digital Library

[24]

Veselin Raychev, Martin T. Vechev, and Andreas Krause. Predicting program properties from "big code". In POPL, pages 111--124, 2015.

Digital Library

[25]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1. chapter Learning Internal Representations by Error Propagation, pages 318--362. MIT Press, 1986.

Digital Library

[26]

Gursimran Singh, Shashank Srikant, and Varun Aggarwal. Question independent grading using machine learning: The case of computer program grading. In KDD, pages 263--272, 2016.

Digital Library

[27]

Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. Automated feedback generation for introductory programming assignments. In PLDI, pages 15--26, 2013.

Digital Library

[28]

Armando Solar-Lezama, Liviu Tancau, Rastislav Bodík, Sanjit A. Seshia, and Vijay A. Saraswat. Combinatorial sketching for finite programs. In ASPLOS, pages 404--415, 2006.

Digital Library

[29]

Shashank Srikant and Varun Aggarwal. A system to grade computer programming skills using machine learning. In KDD, pages 1887--1896, 2014.

Digital Library

[30]

Ilya Sutskever, James Martens, and Geoffrey E Hinton. Generating text with recurrent neural networks. In ICML, pages 1017--1024, 2011.

Digital Library

[31]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104--3112, 2014.

Digital Library

[32]

P. J. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550--1560, Oct 1990.

Cited By

Wüst AStammer WDelfosse QDhami DKersting KKiyavash NMooij J(2024)Pix2CodeProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702855(3829-3852)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702855
Van Praet LHoobergs JSchrijvers THermans FBohrer R(2024)ASSIST: Automated Feedback Generation for Syntax and Logical Errors in Programming ExercisesProceedings of the 2024 ACM SIGPLAN International Symposium on SPLASH-E10.1145/3689493.3689981(66-76)Online publication date: 17-Oct-2024
https://dl.acm.org/doi/10.1145/3689493.3689981
Hu YGilad AStephens-Martinez KRoy SYang J(2024)Qr-Hint: Actionable Hints Towards Correcting Wrong SQL QueriesProceedings of the ACM on Management of Data10.1145/36549952:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654995
Show More Cited By

Index Terms

Neuro-symbolic program corrector for introductory programming assignments
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering
  2. Software organization and properties
    1. Software functional properties
      1. Correctness
        Functionality

Recommendations

Automated feedback generation for introductory programming assignments
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

We present a new method for automatically providing feedback for introductory programming problems. In order to use this method, we need a reference implementation of the assignment, and an error model consisting of potential corrections to errors that ...
Automated Program Repair for Introductory Programming Assignments
Automatic program repair (APR) tools are valuable for students to assist them with debugging tasks since program repair captures the code modification to make a buggy program pass the given test-suite. However, the process of manually generating catalogs ...
sk_p: a neural program corrector for MOOCs
SPLASH Companion 2016: Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity

We present a novel technique for automatic program correction in MOOCs, capable of fixing both syntactic and semantic errors without manual, problem specific correction strategies. Given an incorrect student program, it generates candidate programs from ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '18: Proceedings of the 40th International Conference on Software Engineering

May 2018

1307 pages

ISBN:9781450356381

DOI:10.1145/3180155

Conference Chair:
Michel Chaudron
Chalmers University of Technology, University of Gothenburg, Sweden
,
General Chair:
Ivica Crnkovic
Chalmers University of Technology, University of Gothenburg, Sweden
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Mark Harman
Facebook and University College London, United Kingdom

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '18

Sponsor:

SIGSOFT
IEEE-CS

ICSE '18: 40th International Conference on Software Engineering

May 27 - June 3, 2018

Gothenburg, Sweden

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
677
Total Downloads

Downloads (Last 12 months)72
Downloads (Last 6 weeks)8

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wüst AStammer WDelfosse QDhami DKersting KKiyavash NMooij J(2024)Pix2CodeProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702855(3829-3852)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702855
Van Praet LHoobergs JSchrijvers THermans FBohrer R(2024)ASSIST: Automated Feedback Generation for Syntax and Logical Errors in Programming ExercisesProceedings of the 2024 ACM SIGPLAN International Symposium on SPLASH-E10.1145/3689493.3689981(66-76)Online publication date: 17-Oct-2024
https://dl.acm.org/doi/10.1145/3689493.3689981
Hu YGilad AStephens-Martinez KRoy SYang J(2024)Qr-Hint: Actionable Hints Towards Correcting Wrong SQL QueriesProceedings of the ACM on Management of Data10.1145/36549952:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654995
Zhang JCambronero JGulwani SLe VPiskac RSoares GVerbruggen G(2024)PyDex: Repairing Bugs in Introductory Python Assignments using LLMsProceedings of the ACM on Programming Languages10.1145/36498508:OOPSLA1(1100-1124)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649850
Zirak AHemmati H(2024)Improving Automated Program Repair with Domain AdaptationACM Transactions on Software Engineering and Methodology10.1145/363197233:3(1-43)Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1145/3631972
Ishizue RSakamoto KWashizaki HFukazawa YStephenson BStone JBattestilli LRebelsky SShoop L(2024)Improved Program Repair Methods using Refactoring with GPT ModelsProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 110.1145/3626252.3630875(569-575)Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3626252.3630875
Saha SSandha SAggarwal MWang BHan LBriseno JSrivastava M(2024)TinyNS: Platform-aware Neurosymbolic Auto Tiny Machine LearningACM Transactions on Embedded Computing Systems10.1145/360317123:3(1-48)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3603171
Ye HMonperrus MRoychoudhury APaiva AAbreu RStorey M(2024)ITER: Iterative Neural Repair for Multi-Location PatchesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623337(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623337
Yang DHe JMao XLi TLei YYi XWu J(2024)Strider: Signal Value Transition-Guided Defect Repair for HDL Programming AssignmentsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334175043:5(1594-1607)Online publication date: May-2024
https://doi.org/10.1109/TCAD.2023.3341750
Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten