Abstract
An experiment on Web-assisted detection and correction of malapropism is reported. Malapropos words semantically destroy collocations they are in, usually with retention of syntactical links with other words. A hundred English malapropisms were gathered, each supplied with its correction candidates, i.e. word combinations with one word equal to an editing variant of the corresponding word in the malapropism. Google statistics of occurrences and co-occurrences were gathered for each malapropism and correcting candidate. The collocation components may be adjacent or separated by other words in a sentence, so statistics were accumulated for the most probable distance between them. The raw Google occurrence statistics are then recalculated to numeric values of a specially defined Semantic Compatibility Index (SCI). Heuristic rules are proposed to signal malapropisms when SCI values are lower than a predetermined threshold and to retain a few highly SCI-ranked correction candidates. Within certain limitations, the experiment gave promising results.
Work done under partial support of Mexican Government (CONACyT, SNI) and CGEPI-IPN, Mexico. Many thanks to Denis Filatov, Alexander Gelbukh, and Patrick Cassidy for their help with manuscript preparation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bolshakov, I.A.: Getting one’s first million..Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)
Bolshakov, I.A., Gelbukh, A.: On Detection of Malapropisms by Multistage Collocation Testing. In: Düsterhöft, A., Talheim, B. (eds.) Proc. 8th Int. Conference on Applications of Natural Language to Information Systems NLDB 2003, Burg, Germany, June 2003, vol. V. P-29, Bonn, pp. 28–41 (2003)
Bolshakov, I.A., Gelbukh, A.: Paronyms for Accelerated Correction of Semantic Errors. International Journal on Information Theories & Applications 10, 198–204 (2003)
Gelbukh, A., Bolshakov, I.A.: On Correction of Semantic Errors in Natural Language Texts with a Dictionary of Literal Paronyms. In: Favela, J., Menasalvas, E., Chávez, E. (eds.) AWIC 2004. LNCS (LNAI), vol. 3034, pp. 105–114. Springer, Heidelberg (2004)
Keller, F., Lapata, M.: Using the Web to Obtain Frequencies for Unseen Bigram. Computational linguistics 29(3), 459–484 (2003)
Kilgarriff, A., Grefenstette, G.: Introduction to the Special Issue on the Web as Corpus. Computational linguistics 29(3), 333–347 (2003)
Hirst, G., St-Onge, D.: Lexical Chains as Representation of Context for Detection and Corrections of Malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 305–332. MIT Press, Cambridge (1998)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Mel’čuk, I.: Dependency Syntax: Theory and Practice. SONY Press, NY (1988)
Oxford Collocations Dictionary for Students of English. Oxford University Press (2003)
Sekine, S., Carrol, J.J., Ananiadou, S., Tsujii, J.: Automatic Learning for Semantic Collocation. In: Proc. 3rd Conf. ANLP, Trento, Italy, pp. 104–110 (1992)
Wermter, J., Hahn, U.: Collocation Extraction Based on Modifiability Statistics. In: Proc. 20th Int. Conf. on Computational Linguistics Coling 2004, Geneva, Switzerland, August 2004, pp. 980–986 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bolshakov, I.A., Galicia-Haro, S.N. (2005). Web-Assisted Detection and Correction of Joint and Disjoint Malapropos Word Combinations. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_12
Download citation
DOI: https://doi.org/10.1007/11428817_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26031-8
Online ISBN: 978-3-540-32110-1
eBook Packages: Computer ScienceComputer Science (R0)