skip to main content
10.1145/2597073.2597102acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Syntax errors just aren't natural: improving error reporting with language models

Published: 31 May 2014 Publication History

Abstract

A frustrating aspect of software development is that compiler error messages often fail to locate the actual cause of a syntax error. An errant semicolon or brace can result in many errors reported throughout the file. We seek to find the actual source of these syntax errors by relying on the consistency of software: valid source code is usually repetitive and unsurprising. We exploit this consistency by constructing a simple N-gram language model of lexed source code tokens. We implemented an automatic Java syntax-error locator using the corpus of the project itself and evaluated its performance on mutated source code from several projects. Our tool, trained on the past versions of a project, can effectively augment the syntax error locations produced by the native compiler. Thus we provide a methodology and tool that exploits the naturalness of software source code to detect syntax errors alongside the parser.

References

[1]
M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 207–216. IEEE Press, 2013.
[2]
M. G. Burke and G. A. Fisher. A practical method for LR and LL syntactic error diagnosis and recovery. ACM Transactions on Programming Languages and Systems (TOPLAS), 9(2):164–197, Mar. 1987.
[3]
R. Corchuelo, J. A. Pérez, A. Ruiz, and M. Toro. Repairing syntax errors in LR parsers. ACM Trans. Program. Lang. Syst., 24(6):698–710, Nov. 2002.
[4]
S. Garner, P. Haden, and A. Robins. My program is correct but it doesn’t run: a preliminary investigation of novice programmers’ problems. In Proceedings of the 7th Australasian conference on Computing education-Volume 42, pages 173–180. Australian Computer Society, Inc., 2005.
[5]
S. L. Graham, C. B. Haley, and W. N. Joy. Practical LR error recovery. SIGPLAN Not., 14(8):168–175, Aug. 1979.
[6]
B. J. Heeren. Top Quality Type Error Messages. PhD thesis, Universiteit Utrecht, Nederlands, 2005.
[7]
A. Hindle, E. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on, pages 837–847, June 2012.
[8]
M. Hristova, A. Misra, M. Rutter, and R. Mercuri. Identifying and correcting Java programming errors for introductory computer science students. ACM SIGCSE Bulletin, 35(1):153–156, 2003.
[9]
B. Hsu and J. Glass. Iterative language model estimation: efficient data structure & algorithms. 2008.
[10]
J. Jackson, M. Cobb, and C. Carver. Identifying top java errors for novice programmers. In Frontiers in Education, 2005. FIE’05. Proceedings 35th Annual Conference, pages T4C–T4C. IEEE, 2005.
[11]
M. C. Jadud. A first look at novice compilation behaviour using bluej. Computer Science Education, 15(1):25–40, 2005.
[12]
M. C. Jadud. Methods and tools for exploring novice compilation behaviour. In Proceedings of the second international workshop on Computing education research, pages 73–84. ACM, 2006.
[13]
I.-S. Kim and K.-M. Choe. Error repair with validation in LR-based parsing. ACM Trans. Program. Lang. Syst., 23(4):451–471, July 2001.
[14]
S. K. Kummerfeld and J. Kay. The neglected battle fields of syntax errors. In Proceedings of the fifth Australasian conference on Computing education-Volume 20, pages 105–111. Australian Computer Society, Inc., 2003.
[15]
B. S. Lerner, M. Flower, D. Grossman, and C. Chambers. Searching for type-error messages. In Conference on Programming Language Design and Implementation (PLDI), pages 425–434, San Diego, CA, USA, 2007.
[16]
L. McIver. The effect of programming language on error rates of novice programmers. In 12th Annual Workshop of the Psychology of Programming Interest Group, pages 181–192. Citeseer, 2000.
[17]
E. S. Tabanao, M. M. T. Rodrigo, and M. C. Jadud. Identifying at-risk novice java programmers through the analysis of online protocols. In Philippine Computing Science Congress, 2008.
[18]
E. S. Tabanao, M. M. T. Rodrigo, and M. C. Jadud. Predicting at-risk novice java programmers through the analysis of online protocols. In Proceedings of the seventh international workshop on Computing education research, ICER ’11, pages 85–92, New York, NY, USA, 2011. ACM.
[19]
E. M. Voorhees et al. The TREC-8 question answering track report. In Proceedings of TREC, volume 8, pages 77–82, 1999.
[20]
W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering, pages 364–374, Washington, DC, USA, 2009. IEEE Computer Society.

Cited By

View all
  • (2024)Using Chat GPT to Clean Qualitative Interview Transcriptions: A Usability and Feasibility AnalysisAmerican Journal of Qualitative Research10.29333/ajqr/144878:2(153-160)Online publication date: 2024
  • (2024)New Supportive Features for the Online Coding Tutorial Systems: The Learners and Educators PerspectivesProceedings of the 2024 the 16th International Conference on Education Technology and Computers10.1145/3702163.3702418(226-230)Online publication date: 18-Sep-2024
  • (2024)Python OCTS: Design, Implementation, and Evaluation of an Online Coding Tutorial System Prototype2024 IEEE World Engineering Education Conference (EDUNINE)10.1109/EDUNINE60625.2024.10500548(1-6)Online publication date: 10-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories
May 2014
427 pages
ISBN:9781450328630
DOI:10.1145/2597073
  • General Chair:
  • Premkumar Devanbu,
  • Program Chairs:
  • Sung Kim,
  • Martin Pinzger
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NLP
  2. error location
  3. language
  4. n-grams
  5. naturalness
  6. syntax

Qualifiers

  • Article

Conference

ICSE '14
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Using Chat GPT to Clean Qualitative Interview Transcriptions: A Usability and Feasibility AnalysisAmerican Journal of Qualitative Research10.29333/ajqr/144878:2(153-160)Online publication date: 2024
  • (2024)New Supportive Features for the Online Coding Tutorial Systems: The Learners and Educators PerspectivesProceedings of the 2024 the 16th International Conference on Education Technology and Computers10.1145/3702163.3702418(226-230)Online publication date: 18-Sep-2024
  • (2024)Python OCTS: Design, Implementation, and Evaluation of an Online Coding Tutorial System Prototype2024 IEEE World Engineering Education Conference (EDUNINE)10.1109/EDUNINE60625.2024.10500548(1-6)Online publication date: 10-Mar-2024
  • (2024)SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444788(67-80)Online publication date: 2-Mar-2024
  • (2024)IRJIT: A simple, online, information retrieval approach for just-in-time software defect predictionEmpirical Software Engineering10.1007/s10664-024-10514-z29:5Online publication date: 2-Aug-2024
  • (2023)Do Current Online Coding Tutorial Systems Address Novice Programmer Difficulties?Proceedings of the 15th International Conference on Education Technology and Computers10.1145/3629296.3629333(242-248)Online publication date: 26-Sep-2023
  • (2023)Code-line-level Bugginess Identification: How Far have We Come, and How Far have We Yet to Go?ACM Transactions on Software Engineering and Methodology10.1145/358257232:4(1-55)Online publication date: 27-May-2023
  • (2023)Always Provide Context: The Effects of Code Context on Programming Error Message EnhancementProceedings of the ACM Conference on Global Computing Education Vol 110.1145/3576882.3617909(147-153)Online publication date: 5-Dec-2023
  • (2023)Specializing Neural Networks for Cryptographic Code Completion ApplicationsIEEE Transactions on Software Engineering10.1109/TSE.2023.3265362(1-13)Online publication date: 2023
  • (2023)A Statistical Method for API Usage Learning and API Misuse Violation Finding2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)10.1109/SERA57763.2023.10197708(358-365)Online publication date: 23-May-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media