skip to main content
10.1145/3180155.3180219acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Neuro-symbolic program corrector for introductory programming assignments

Published:27 May 2018Publication History

ABSTRACT

Automatic correction of programs is a challenging problem with numerous real world applications in security, verification, and education. One application that is becoming increasingly important is the correction of student submissions in online courses for providing feedback. Most existing program repair techniques analyze Abstract Syntax Trees (ASTs) of programs, which are unfortunately unavailable for programs with syntax errors. In this paper, we propose a novel Neuro-symbolic approach that combines neural networks with constraint-based reasoning. Specifically, our method first uses a Recurrent Neural Network (RNN) to perform syntax repairs for the buggy programs; subsequently, the resulting syntactically-fixed programs are repaired using constraint-based techniques to ensure functional correctness. The RNNs are trained using a corpus of syntactically correct submissions for a given programming assignment, and are then queried to fix syntax errors in an incorrect programming submission by replacing or inserting the predicted tokens at the error location. We evaluate our technique on a dataset comprising of over 14,500 student submissions with syntax errors. Our method is able to repair syntax errors in 60% (8689) of submissions, and finds functionally correct repairs for 23.8% (3455) submissions.

References

  1. Miltiadis Allamanis and Charles A. Sutton. Mining source code repositories at massive scale using language modeling. In MSR, pages 207--216, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Miltiadis Allamanis and Charles A. Sutton. Mining idioms from source code. In FSE, pages 472--483, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. Learning natural coding conventions. In FSE, pages 281--293, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. Suggesting accurate method and class names. In FSE, pages 38--49, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sumit Basu, Chuck Jacobs, and Lucy Vanderwende. Powergrading: a clustering approach to amplify human effort for short answer grading. TACL, 1:391--402, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  6. Joshua Charles Campbell, Abram Hindle, and José Nelson Amaral. Syntax errors just aren't natural: Improving error reporting with language models. In MSR 2014, pages 252--261, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In Advances in Neural Information Processing Systems, pages 577--585, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Loris D'Antoni, Roopsha Samanta, and Rishabh Singh. Qlose: Program repair with quantitative objectives. In CAV, pages 383--401, 2016.Google ScholarGoogle Scholar
  9. Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. Robustfill: Neural program learning under noisy I/O. In ICML, pages 990--998, 2017.Google ScholarGoogle Scholar
  10. C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. Genprog: A generic method for automatic software repair. IEEE Transactions on Software Engineering, 38(1): 54--72, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Claire Goues, Stephanie Forrest, and Westley Weimer. Current challenges in automatic software repair. Software Quality Journal, 21(3):421--443, September 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. Deepfix: Fixing common c language errors by deep learning. In AAAI, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  13. Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. On the naturalness of software. In ICSE, pages 837--847, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chinmay Kulkarni, Michael S. Bernstein, and Scott R. Klemmer. Peerstudio: Rapid peer feedback emphasizes revision and improves performance. In Learning @ Scale,, pages 75--84, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chinmay Eishan Kulkarni, Pang Wei Wei, Huy Le, Daniel Jin hao Chia, Kathryn Papadopoulos, Justin Cheng, Daphne Koller, and Scott R. Klemmer. Peer and self assessment in massive online classes. ACM Trans. Comput.-Hum. Interact., 20(6): 33, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Andrew S. Lan, Divyanshu Vats, Andrew E. Waters, and Richard G. Baraniuk. Mathematical language processing: Automatic grading and feedback for open response mathematical questions. In Learning@Scale, pages 167--176, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Andy Nguyen, Christopher Piech, Jonathan Huang, and Leonidas J. Guibas. Codewebs: scalable homework search for massive open online programming courses. In WWW, pages 491--502, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. Semfix: program repair via semantic analysis. In ICSE, pages 772--781, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N.Nguyen. A statistical semantic language model for source code. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. Neuro-symbolic program synthesis. In ICLR, 2017.Google ScholarGoogle Scholar
  21. Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, and Leonidas J. Guibas. Learning program embeddings to propagate feedback on student code. In ICML, pages 1093--1102, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. sk_p: a neural program corrector for moocs. CoRR, abs/1607.02902, 2016.Google ScholarGoogle Scholar
  23. Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. On the naturalness of buggy code. In ICSE, pages 428--439, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Veselin Raychev, Martin T. Vechev, and Andreas Krause. Predicting program properties from "big code". In POPL, pages 111--124, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1. chapter Learning Internal Representations by Error Propagation, pages 318--362. MIT Press, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gursimran Singh, Shashank Srikant, and Varun Aggarwal. Question independent grading using machine learning: The case of computer program grading. In KDD, pages 263--272, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. Automated feedback generation for introductory programming assignments. In PLDI, pages 15--26, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Armando Solar-Lezama, Liviu Tancau, Rastislav Bodík, Sanjit A. Seshia, and Vijay A. Saraswat. Combinatorial sketching for finite programs. In ASPLOS, pages 404--415, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shashank Srikant and Varun Aggarwal. A system to grade computer programming skills using machine learning. In KDD, pages 1887--1896, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ilya Sutskever, James Martens, and Geoffrey E Hinton. Generating text with recurrent neural networks. In ICML, pages 1017--1024, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104--3112, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. J. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550--1560, Oct 1990.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Neuro-symbolic program corrector for introductory programming assignments

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICSE '18: Proceedings of the 40th International Conference on Software Engineering
          May 2018
          1307 pages
          ISBN:9781450356381
          DOI:10.1145/3180155
          • Conference Chair:
          • Michel Chaudron,
          • General Chair:
          • Ivica Crnkovic,
          • Program Chairs:
          • Marsha Chechik,
          • Mark Harman

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 May 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate276of1,856submissions,15%

          Upcoming Conference

          ICSE 2025

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader