Article

Syntax errors just aren't natural: improving error reporting with language models

Authors:

Joshua Charles Campbell,

José Nelson AmaralAuthors Info & Claims

MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories

Pages 252 - 261

https://doi.org/10.1145/2597073.2597102

Published: 31 May 2014 Publication History

Abstract

A frustrating aspect of software development is that compiler error messages often fail to locate the actual cause of a syntax error. An errant semicolon or brace can result in many errors reported throughout the file. We seek to find the actual source of these syntax errors by relying on the consistency of software: valid source code is usually repetitive and unsurprising. We exploit this consistency by constructing a simple N-gram language model of lexed source code tokens. We implemented an automatic Java syntax-error locator using the corpus of the project itself and evaluated its performance on mutated source code from several projects. Our tool, trained on the past versions of a project, can effectively augment the syntax error locations produced by the native compiler. Thus we provide a methodology and tool that exploits the naturalness of software source code to detect syntax errors alongside the parser.

References

[1]

M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 207–216. IEEE Press, 2013.

Digital Library

[2]

M. G. Burke and G. A. Fisher. A practical method for LR and LL syntactic error diagnosis and recovery. ACM Transactions on Programming Languages and Systems (TOPLAS), 9(2):164–197, Mar. 1987.

Digital Library

[3]

R. Corchuelo, J. A. Pérez, A. Ruiz, and M. Toro. Repairing syntax errors in LR parsers. ACM Trans. Program. Lang. Syst., 24(6):698–710, Nov. 2002.

Digital Library

[4]

S. Garner, P. Haden, and A. Robins. My program is correct but it doesn’t run: a preliminary investigation of novice programmers’ problems. In Proceedings of the 7th Australasian conference on Computing education-Volume 42, pages 173–180. Australian Computer Society, Inc., 2005.

Digital Library

[5]

S. L. Graham, C. B. Haley, and W. N. Joy. Practical LR error recovery. SIGPLAN Not., 14(8):168–175, Aug. 1979.

Digital Library

[6]

B. J. Heeren. Top Quality Type Error Messages. PhD thesis, Universiteit Utrecht, Nederlands, 2005.

[7]

A. Hindle, E. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on, pages 837–847, June 2012.

Digital Library

[8]

M. Hristova, A. Misra, M. Rutter, and R. Mercuri. Identifying and correcting Java programming errors for introductory computer science students. ACM SIGCSE Bulletin, 35(1):153–156, 2003.

Digital Library

[9]

B. Hsu and J. Glass. Iterative language model estimation: eﬃcient data structure & algorithms. 2008.

[10]

J. Jackson, M. Cobb, and C. Carver. Identifying top java errors for novice programmers. In Frontiers in Education, 2005. FIE’05. Proceedings 35th Annual Conference, pages T4C–T4C. IEEE, 2005.

[11]

M. C. Jadud. A first look at novice compilation behaviour using bluej. Computer Science Education, 15(1):25–40, 2005.

[12]

M. C. Jadud. Methods and tools for exploring novice compilation behaviour. In Proceedings of the second international workshop on Computing education research, pages 73–84. ACM, 2006.

Digital Library

[13]

I.-S. Kim and K.-M. Choe. Error repair with validation in LR-based parsing. ACM Trans. Program. Lang. Syst., 23(4):451–471, July 2001.

Digital Library

[14]

S. K. Kummerfeld and J. Kay. The neglected battle fields of syntax errors. In Proceedings of the fifth Australasian conference on Computing education-Volume 20, pages 105–111. Australian Computer Society, Inc., 2003.

Digital Library

[15]

B. S. Lerner, M. Flower, D. Grossman, and C. Chambers. Searching for type-error messages. In Conference on Programming Language Design and Implementation (PLDI), pages 425–434, San Diego, CA, USA, 2007.

Digital Library

[16]

L. McIver. The effect of programming language on error rates of novice programmers. In 12th Annual Workshop of the Psychology of Programming Interest Group, pages 181–192. Citeseer, 2000.

[17]

E. S. Tabanao, M. M. T. Rodrigo, and M. C. Jadud. Identifying at-risk novice java programmers through the analysis of online protocols. In Philippine Computing Science Congress, 2008.

[18]

E. S. Tabanao, M. M. T. Rodrigo, and M. C. Jadud. Predicting at-risk novice java programmers through the analysis of online protocols. In Proceedings of the seventh international workshop on Computing education research, ICER ’11, pages 85–92, New York, NY, USA, 2011. ACM.

Digital Library

[19]

E. M. Voorhees et al. The TREC-8 question answering track report. In Proceedings of TREC, volume 8, pages 77–82, 1999.

[20]

W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering, pages 364–374, Washington, DC, USA, 2009. IEEE Computer Society.

Digital Library

Cited By

Taylor Z(2024)Using Chat GPT to Clean Qualitative Interview Transcriptions: A Usability and Feasibility AnalysisAmerican Journal of Qualitative Research10.29333/ajqr/144878:2(153-160)Online publication date: 2024
https://doi.org/10.29333/ajqr/14487
Alasmari OSinger JBikanga Ada M(2024)New Supportive Features for the Online Coding Tutorial Systems: The Learners and Educators PerspectivesProceedings of the 2024 the 16th International Conference on Education Technology and Computers10.1145/3702163.3702418(226-230)Online publication date: 18-Sep-2024
https://dl.acm.org/doi/10.1145/3702163.3702418
Alasmari OSinger JAda M(2024)Python OCTS: Design, Implementation, and Evaluation of an Online Coding Tutorial System Prototype2024 IEEE World Engineering Education Conference (EDUNINE)10.1109/EDUNINE60625.2024.10500548(1-6)Online publication date: 10-Mar-2024
https://doi.org/10.1109/EDUNINE60625.2024.10500548
Show More Cited By

Index Terms

Syntax errors just aren't natural: improving error reporting with language models
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Unidirectional Bit/Byte Error Control

This paper defines a new class of unidirectional errors, named t/u-unidirectional errors, which affect at most t bits confined to at most u bytes of the code word. Codes that are capable of detecting, locating and correcting t/1-unidirectional errors ...
Double Bits Error Correction Using CRC Method
SKG '09: Proceedings of the 2009 Fifth International Conference on Semantics, Knowledge and Grid

Error during sending information due to devastating factors like external electromagnetic sources and noise is inevitable. The Cyclic Redundancy Check (CRC) method is used for error detection. CRC is used to control such factors in received information. ...
Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition

This paper describes our research on statistical language modeling of Lithuanian. The idea of improving sparse n-gram models of highly inflected Lithuanian language by interpolating them with complex n-gram models based on word clustering and morphological ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories

May 2014

427 pages

ISBN:9781450328630

DOI:10.1145/2597073

General Chair:
Premkumar Devanbu
University of California at Davis, USA
,
Program Chairs:
Sung Kim
Hong Kong University of Science and Technology, China
,
Martin Pinzger
University of Klagenfurt, Austria

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICSE '14

Sponsor:

SIGSOFT

ICSE '14: 36th International Conference on Software Engineering

May 31 - June 1, 2014

Hyderabad, India

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

49
Total Citations
View Citations
593
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Taylor Z(2024)Using Chat GPT to Clean Qualitative Interview Transcriptions: A Usability and Feasibility AnalysisAmerican Journal of Qualitative Research10.29333/ajqr/144878:2(153-160)Online publication date: 2024
https://doi.org/10.29333/ajqr/14487
Alasmari OSinger JBikanga Ada M(2024)New Supportive Features for the Online Coding Tutorial Systems: The Learners and Educators PerspectivesProceedings of the 2024 the 16th International Conference on Education Technology and Computers10.1145/3702163.3702418(226-230)Online publication date: 18-Sep-2024
https://dl.acm.org/doi/10.1145/3702163.3702418
Alasmari OSinger JAda M(2024)Python OCTS: Design, Implementation, and Evaluation of an Online Coding Tutorial System Prototype2024 IEEE World Engineering Education Conference (EDUNINE)10.1109/EDUNINE60625.2024.10500548(1-6)Online publication date: 10-Mar-2024
https://doi.org/10.1109/EDUNINE60625.2024.10500548
Armengol-Estapé JWoodruff JCummins CO'Boyle M(2024)SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444788(67-80)Online publication date: 2-Mar-2024
https://doi.org/10.1109/CGO57630.2024.10444788
Sahar HBangash AHindle ABarbosa D(2024)IRJIT: A simple, online, information retrieval approach for just-in-time software defect predictionEmpirical Software Engineering10.1007/s10664-024-10514-z29:5Online publication date: 2-Aug-2024
https://doi.org/10.1007/s10664-024-10514-z
Alasmari OSinger JBikanga Ada M(2023)Do Current Online Coding Tutorial Systems Address Novice Programmer Difficulties?Proceedings of the 15th International Conference on Education Technology and Computers10.1145/3629296.3629333(242-248)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1145/3629296.3629333
Guo ZLiu SLiu XLai WMa MZhang XNi CYang YLi YChen LZhou GZhou Y(2023)Code-line-level Bugginess Identification: How Far have We Come, and How Far have We Yet to Go?ACM Transactions on Software Engineering and Methodology10.1145/358257232:4(1-55)Online publication date: 27-May-2023
https://dl.acm.org/doi/10.1145/3582572
Santos EPrasad PBecker BChoppella VPhatak DLuxton-Reilly ACraig M(2023)Always Provide Context: The Effects of Code Context on Programming Error Message EnhancementProceedings of the ACM Conference on Global Computing Education Vol 110.1145/3576882.3617909(147-153)Online publication date: 5-Dec-2023
https://dl.acm.org/doi/10.1145/3576882.3617909
Xiao YSong WQi JViswanath BMcDaniel PYao D(2023)Specializing Neural Networks for Cryptographic Code Completion ApplicationsIEEE Transactions on Software Engineering10.1109/TSE.2023.3265362(1-13)Online publication date: 2023
https://doi.org/10.1109/TSE.2023.3265362
Panda DBasia PNallavolu KZhong XSiy HSong M(2023)A Statistical Method for API Usage Learning and API Misuse Violation Finding2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)10.1109/SERA57763.2023.10197708(358-365)Online publication date: 23-May-2023
https://doi.org/10.1109/SERA57763.2023.10197708
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten