research-article

Correct Me If I'm Wrong: Fixing Grammatical Errors by Preposition Ranking

Authors:

Roman Prokofyev,

Ruslan Mavlyutov,

Gianluca Demartini,

Philippe Cudré-MaurouxAuthors Info & Claims

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 331 - 340

https://doi.org/10.1145/2661829.2661942

Published: 03 November 2014 Publication History

Abstract

The detection and correction of grammatical errors still represent very hard problems for modern error-correction systems. As an example, the top-performing systems at the preposition correction challenge CoNLL-2013 only achieved a F1 score of 17%. In this paper, we propose and extensively evaluate a series of approaches for correcting prepositions, analyzing a large body of high-quality textual content to capture language usage. Leveraging n-gram statistics, association measures, and machine learning techniques, our system is able to learn which words or phrases govern the usage of a specific preposition. Our approach makes heavy use of n-gram statistics generated from very large textual corpora. In particular, one of our key features is the use of n-gram association measures (e.g., Pointwise Mutual Information) between words and prepositions to generate better aggregated preposition rankings for the individual n-grams. We evaluate the effectiveness of our approach using cross-validation with different feature combinations and on two test collections created from a set of English language exams and StackExchange forums. We also compare against state-of-the-art supervised methods. Experimental results from the CoNLL-2013 test collection show that our approach to preposition correction achieves ∼30% in F1 score which results in 13% absolute improvement over the best performing approach at that challenge.

References

[1]

S. Bergsma, D. Lin, and R. Goebel. Web-scale n-gram models for lexical disambiguation. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, IJCAI'09, pages 1507--1512, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc.

Digital Library

[2]

K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22--29, Mar. 1990.

Digital Library

[3]

D. Dahlmeier and H. T. Ng. A beam-search decoder for grammatical error correction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 568--578, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.

Digital Library

[4]

D. Dahlmeier, H. T. Ng, and E. J. F. Ng. Nus at the hoo 2012 shared task. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 216--224, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.

Digital Library

[5]

R. Dale, I. Anisimoff, and G. Narroway. Hoo 2012: A report on the preposition and determiner error correction shared task. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 54--62, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.

Digital Library

[6]

A. Elghafari, D. Meurers, and H. Wunsch. Exploring the data-driven prediction of prepositions in english. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING '10, pages 267--275, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.

Digital Library

[7]

P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 63(1):3--42, Apr. 2006.

Digital Library

[8]

M. Heilman, A. Cahill, and J. Tetreault. Precision isn't everything: A hybrid approach to grammatical error detection. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 233--241, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.

Digital Library

[9]

J. Huang, J. Gao, J. Miao, X. Li, K. Wang, F. Behr, and C. L. Giles. Exploring web scale language models for search query processing. In Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 451--460, New York, NY, USA, 2010. ACM.

Digital Library

[10]

A. Islam and D. Inkpen. An unsupervised approach to preposition error correction. In Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on, pages 1--4. IEEE, 2010.

[11]

T.-h. Kao, Y.-w. Chang, H.-w. Chiu, T.-H. Yen, J. Boisson, J.-c. Wu, and J. S. Chang. Conll-2013 shared task: Grammatical error correction nthu system description. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pages 20--25, Sofia, Bulgaria, 2013. Association for Computational Linguistics.

[12]

E. Kochmar, O. Andersen, and T. Briscoe. Hoo 2012 error recognition and correction shared task: Cambridge university submission report. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 242--250, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.

Digital Library

[13]

C. Leacock, M. Chodorow, M. Gamon, and J. Tetreault. Automated Grammatical Error Detection for Language Learners. Morgan and Claypool Publishers, 2010.

Digital Library

[14]

C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

[15]

J.-B. Michel, Y. K. Shen, A. P. Aiden, A. Veres, M. K. Gray, T. G. B. Team, J. P. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, S. Pinker, M. A. Nowak, and E. L. Aiden. Quantitative analysis of culture using millions of digitized books. Science, 331(6014):176--182, 2011.

[16]

H. T. Ng, S. M. Wu, Y. Wu, C. Hadiwinoto, and J. Tetreault. The conll-2013 shared task on grammatical error correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pages 1--12, Sofia, Bulgaria, 2013. Association for Computational Linguistics.

[17]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.

Digital Library

[18]

R. Prokofyev, G. Demartini, and P. Cudré-Mauroux. Effective named entity recognition for idiosyncratic web collections. In Proceedings of the 23rd International Conference on World Wide Web, WWW '14, pages 397--408, Republic and Canton of Geneva, Switzerland, 2014. International World Wide Web Conferences Steering Committee.

Digital Library

[19]

A. Rozovskaya, K.-W. Chang, M. Sammons, and D. Roth. The University of Illinois system in the conll-2013 shared task. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pages 13--19, Sofia, Bulgaria, 2013. Association for Computational Linguistics.

[20]

Y. Xiang, B. Yuan, Y. Zhang, X. Wang, W. Zheng, and C. Wei. A hybrid model for grammatical error correction. CoNLL-2013, page 115, 2013.

Cited By

Olayiwola AOlayiwola DOyedeji A(2024)Development of an automatic grammar checker for Yorùbá word processing using Government and Binding TheoryExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121351236:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121351
Bhagat PVarde AFeldman A(2019)WordPrep: Word-based Preposition Prediction Tool2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9005608(2169-2176)Online publication date: Dec-2019
https://doi.org/10.1109/BigData47090.2019.9005608

Index Terms

Correct Me If I'm Wrong: Fixing Grammatical Errors by Preposition Ranking
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

n-Gram Statistics for Natural Language Understanding and Text Processing

n-gram (n = 1 to 5) statistics and other properties of the English language were derived for applications in natural language understanding and text processing. They were computed from a well-known corpus composed of 1 million word samples. Similar ...
Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy

Pointwise mutual information (PMI) is a widely used word similarity measure, but it lacks a clear explanation of how it works. We explore how PMI differs from distributional similarity, and we introduce a novel metric, $({\rm PMI}_{max})$, that augments ...
Using wiktionary to improve lexical disambiguation in multiple languages
CICLing'12: Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

This paper proposes using linguistic knowledge from Wiktionary to improve lexical disambiguation in multiple languages, focusing on part-of-speech tagging in selected languages with various characteristics including English, Vietnamese, and Korean. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

November 2014

2152 pages

ISBN:9781450325981

DOI:10.1145/2661829

General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA

Copyright © 2014.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Swiss National Science Foundation

Conference

CIKM '14

Sponsor:

CIKM '14: 2014 ACM Conference on Information and Knowledge Management

November 3 - 7, 2014

Shanghai, China

Acceptance Rates

CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
200
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Olayiwola AOlayiwola DOyedeji A(2024)Development of an automatic grammar checker for Yorùbá word processing using Government and Binding TheoryExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121351236:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121351
Bhagat PVarde AFeldman A(2019)WordPrep: Word-based Preposition Prediction Tool2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9005608(2169-2176)Online publication date: Dec-2019
https://doi.org/10.1109/BigData47090.2019.9005608

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten