Abstract
The paper presents a method for automatic detection of “non-trivial” word combinations in the text. It is based on automatic syntactic analysis. The method shows better precision and recall than the baseline method (bigrams). It was tested on a text in Spanish. The method can be used for enrichment of very large dictionaries of word combinations.
Work done under partial support of Mexican Government (CONACyT, SNI), IPN (CGPI, COFAA, PIFI), Korean Government (KIPA Professorship for Visiting Faculty Positions in Korea), and ITRI of Chung-Ang University. First author is currently on Sabbatical leave at Chung-Ang University. We thank Prof. I. A. Bolshakov for useful discussion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baddorf, D.S., Evens, M.W.: Finding phrases rather than discovering collocations: Searching corpora for dictionary phrases. In: Proc. of the 9th Midwest Artificial Intelligence and Cognitive Science Conference (MAICS 1998), Dayton, USA (1998)
Bank of English. Collins, http://titania.cobuild.collins.co.uk/boe_info.html
Basili, R., Pazienza, M.T., Velardi, P.: Semi-automatic extraction of linguistic information for syntactic disambiguation. Applied Artificial Intelligence 7, 339–364 (1993)
Biemann, C., Bordag, S., Heyer, G., Quasthoff, U., Wolff, C.: Language-independent methods for compiling monolingual lexical data. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 217–228. Springer, Heidelberg (2004)
Bolshakov, I.A.: Multifunction thesaurus for Russian word processing. In: Proceedings of 4th Conference on Applied Natural language Processing, Stuttgart, pp. 200–202 (1994)
Bolshakov, I.A.: Getting One’s First Million...Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)
Bolshakov, I.A., Gelbukh, A.: A Very Large Database of Collocations and Semantic Links. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds.) NLDB 2000. LNCS, vol. 1959, pp. 103–114. Springer, Heidelberg (2001)
Bolshakov, I.A., Gelbukh, A.: Word Combinations as an Important Part of Modern Electronic Dictionaries. Procesamiento del Lenguaje Natural 29, 47–54 (2002)
Dagan, I., Lee, L., Pereira, F.: Similarity-based models of word cooccurrence probabilities. Machine Learning 34(1) (1999)
Gelbukh, A., Sidorov, G., Galicia Haro, S., Bolshakov, I.: Environment for Development of a Natural Language Syntactic Analyzer. In: Acta Academia 2002, Moldova, pp. 206–213 (2002)
Kim, S., Yoon, J., Song, M.: Automatic extraction of collocations from Korean text. Computers and the Humanities 35(3), 273–297 (2001)
Kita, K., Kato, Y., Omoto, T., Yano, Y.: A comparative study of automatic extraction of collocations from corpora: Mutual information vs. cost criteria. Journal of Natural Language 1(1), 21–33 (1994)
Koster, C.H.A.: Head/Modifier Frames for Information Retrieval. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 420–432. Springer, Heidelberg (2004)
Mel’čuk, I.: Phrasemes in language and phraseology in linguistics. In: Idioms: structural and psychological perspective, pp. 167–232
Oxford collocation dictionary, Oxford (2003)
Smadja, F.: Retrieving collocations from texts: Xtract. Computational linguistics 19(1), 143–177 (1993)
Strzalkowski, T.: Evaluating natural language processing techniques in information retrieval. In: Strzalkowski, T. (ed.) Natural language information retrieval. Kluwer, Dordrecht (1999)
Yu, J., Jin, Z., Wen, Z.: Automatic extraction of collocations (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gelbukh, A., Sidorov, G., Han, SY., Hernández-Rubio, E. (2004). Automatic Syntactic Analysis for Detection of Word Combinations. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-24630-5_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive