Abstract
A method for recognizing syntactic patterns for Spanish is presented. This method is based on dependency parsing using heuristic rules to infer dependency relationships between words, and word co-occurrence statistics (learnt in an unsupervised manner) to resolve ambiguities such as prepositional phrase attachment. If a complete parse cannot be produced, a partial structure is built with some (if not all) dependency relations identified. Evaluation shows that in spite of its simplicity, the parser’s accuracy is superior to the available existing parsers for Spanish. Though certain grammar rules, as well as the lexical resources used, are specific for Spanish, the suggested approach is language-independent.
This work was done under partial support of Mexican Government (SNI, CGPI-IPN, COFAA-IPN, and PIFI-IPN). The authors cordially thank Jordi Atserias for providing the data on the comparison of TACAT parser with our system.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Apresyan, Y.D., Boguslavski, I., Iomdin, L., Lazurski, A., Pertsov, N., Sannikov, V., Tsinman, L.: Linguistic Support of the ETAP-2 System, Moscow, Nauka (1989) (in Russian)
Bolshakov, I.A.: A Method of Linguistic Steganography Based on Collocationally-Verified Synonymy. In: Fridrich, J. (ed.) IH 2004. LNCS, vol. 3200, pp. 180–191. Springer, Heidelberg (2004)
Bolshakov, I.A., Gelbukh, A.: Lexical functions in Spanish. In: Proc. CIC-98, Simposium Internacional de Computación, Mexico, pp. 383–395 (1998), http://www.gelbukh.com/CV/Publications/1998/
Bolshakov, I.A., Gelbukh, A.: A Very Large Database of Collocations and Semantic Links. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds.) NLDB 2000. LNCS, vol. 1959, pp. 103–114. Springer, Heidelberg (2001)
Bolshakov, I.A., Gelbukh, A.: On Detection of Malapropisms by Multistage Collocation Testing. In: NLDB-2003, 8th Int. Conf. on Application of Natural Language to Information Systems, pp. 28–41. Bonner Köllen Verlag (2003)
Brants, T.: TNT–A Statistical Part-of-Speech Tagger. In: Proc. ANLP 2000, 6th Applied NLP Conference, Seattle (2000)
Briscoe, T., Carroll, J., Graham, J., Copestake, A.: Relational evaluation schemes. In: Procs. of the Beyond PARSEVAL Workshop, 3rd International Conference on Language Resources and Evaluation, pp. 4–8. Las Palmas, Gran Canaria (2002)
Calvo, H., Gelbukh, A.: Natural Language Interface Framework for Spatial Object Composition Systems. Procesamiento de Lenguaje Natural 31 (2003)
Calvo, H., Gelbukh, A.: Acquiring selectional preferences from untagged text for prepositional phrase attachment disambiguation. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 207–216. Springer, Heidelberg (2004)
Calvo, H., Gelbukh, A., Kilgarriff, A.: Distributional Thesaurus Versus WordNet: A Comparison of Backoff Techniques for Unsupervised PP Attachment. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 177–188. Springer, Heidelberg (2005)
Carreras, X., Chao, I., Padró, L., Padró, M.: FreeLing: An Open-Source Suite of Language Analyzers. In: Proc. 4th Intern. Conf. on Language Resources and Evaluation (LREC 2004), Portugal (2004)
Chomsky, N.: Syntactic Structures. Mouton & Co., The Hague (1957)
Civit, M., Martí, M.A.: Estándares de anotación morfosintáctica para el español. Workshop of tools and resources for Spanish and Portuguese. In: IBERAMIA 2004 (2004)
Copestake, A., Flickinger, D., Sag, I.A.: Minimal Recursion Semantics. In: An introduction. CSLI, Stanford University (1997)
Debusmann, R., Duchier, D., Kruijff, G.-J.M.: Extensible Dependency Grammar: A New Methodology. In: Recent Advances in Dependency Grammar. Proc. of a workshop at COLING-2004, Geneve (2004)
Díaz, I., Moreno, L., Fuentes, I., Pastor, Ó.: Integrating Natural Language Techniques in OO-Method. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 560–571. Springer, Heidelberg (2005)
Gelbukh, A., Torres, S., Calvo, H.: Transforming a Constituency Treebank into a Dependency Treebank 34, Spain (2005) (submitted to Procesamiento del Lenguaje Natural)
Gelbukh, A., Sidorov, G., Velásquez, F.: Análisis morfológico automático del español a través de generación. Escritos 28, 9–26 (2003)
Gladki, A.V.: Syntax Structures of Natural Language in Automated Dialogue Systems (in Russian). Moscow, Nauka (1985)
Mel’čuk, I.A.: Meaning-text models: a recent trend in Soviet linguistics. Annual Review of Anthropology 10, 27–62 (1981)
Mel’čuk, I.A.: Dependency Syntax: Theory and Practice. State U. Press, NY (1988)
Mel’čuk, I.A.: Lexical Functions: A Tool for the Description of Lexical Relations in the Lexicon. In: Wanner, L. (ed.) Lexical Functions in Lexicography and Natural Language Processing, Benjamins, Amsterdam/Philadelphia (1996)
Montes-y-Gómez, M., Gelbukh, A.F., López-López, A.: Text Mining at Detail Level Using Conceptual Graphs. In: Priss, U., Corbett, D.R., Angelova, G. (eds.) ICCS 2002. LNCS (LNAI), vol. 2393, pp. 122–136. Springer, Heidelberg (2002)
Montes-y-Gómez, M., López-López, A., Gelbukh, A.: Information Retrieval with Conceptual Graph Matching. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 312–321. Springer, Heidelberg (2000)
Pollard, C., Sag, I.: Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago (1994)
Sag, I., Wasow, T., Bender, E.M.: Syntactic Theory. A Formal Introduction, 2nd edn. CSLI Publications, Stanford, CA (2003)
Sowa, J.F.: Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley Publishing Co., Reading (1984)
Steele, J.: Meaning-Text Theory. Linguistics, Lexicography, and Implications. Univ. of Ottawa Press, Ottawa (1990)
Tapanainen, P.: Parsing in two frameworks: finite-state and functional dependency grammar. Academic Dissertation. University of Helsinki, Language Technology, Department of General Linguistics, Faculty of Arts (1999)
Tesnière, L.: Eléments de syntaxe structurale. Librairie Klincksieck. Paris (1959)
Yuret, D.: Discovery of Linguistic Relations Using Lexical Attraction, PhD thesis, MIT (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Calvo, H., Gelbukh, A. (2006). DILUCT: An Open-Source Spanish Dependency Parser Based on Rules, Heuristics, and Selectional Preferences. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2006. Lecture Notes in Computer Science, vol 3999. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11765448_15
Download citation
DOI: https://doi.org/10.1007/11765448_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34616-6
Online ISBN: 978-3-540-34617-3
eBook Packages: Computer ScienceComputer Science (R0)