Abstract
This article describes the application of lemmatization and shallow parsing as a linguistically-based alternative to stemming in Text Retrieval, with the aim of managing linguistic variation at both word level and phrase level. Several alternatives for selecting the index terms among the syntactic dependencies detected by the parser are evaluated. Though this article focuses on Spanish, this approach is extensible to other languages by simply adapting the grammar used by the parser.
Supported in part by Ministerio de Ciencia y Tecnología (HF2002-81), FPU grants of Secretaría de Estado de Educación y Universidades (AP2001-2545), Xunta de Galicia (PGIDIT02PXIB30501PR, PGIDIT02SIN01E and PGIDIT03SIN30501PR) and Universidade da Coruña.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
http://snowball.tartarus.org (site visited October 2003)
Barcala, F.M., Vilares, J., Alonso, M.A., Graña, J., Vilares, M.: Tokenization and proper noun recognition for information retrieval. In: DEXA Workshop 2002, pp. 246–250. IEEE Computer Society Press, Los Alamitos (2002)
Buckley, C.: Implementation of the SMART information retrieval system. Technical report, Department of Computer Science, Cornell University (1985)
Graña, J.: Técnicas de Análisis Sintáctico Robusto para la Etiquetación del Lenguaje Natural. PhD thesis, University of La Coruña, La Coruña, Spain (2000)
Graña, J., Alonso, M.A., Vilares, M.: A common solution for tokenization and part-of-speech tagging: One-pass Viterbi algorithm vs. iterative approaches. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 3–10. Springer, Heidelberg (2002)
Graña, J., Barcala, F.M., Alonso, M.A.: Compilation methods of minimal acyclic automata for large dictionaries. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 135–148. Springer, Heidelberg (2003)
Grefenstette, G., Schiller, A., Aït-Mokhtar, S.: Recognizing lexical patterns in text. In: Van Eynde, F., Gibbon, D. (eds.) Lexicon Development for Speech and Language Processing, pp. 141–168. Kluwer Academic, Dordrecht (2000)
Jacquemin, C., Tzoukermann, E.: NLP for term variant extraction: synergy between morphology, lexicon and syntax. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval, pp. 25–74. Kluwer Academic, Dordrecht (1999)
Khan, M.S., Khor, S.: Enhanced web document retrieval using automatic query expansion. JASIST 55(1), 29–40 (2004)
Khoo, C.S.-G.: The use of relation matching in Information Retrieval. LIBRES: Library and Information Science Research 7(2) (1997)
Montes-y-Gómez, M., Gelbukh, A., López-López, A., Baeza-Yates, R.: Flexible Comparison of Conceptual Structures. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 102–111. Springer, Heidelberg (2001)
Montes-y-Gómez, M., López-López, A., Gelbukh, A.: Information Retrieval with conceptual graph matching. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 312–321. Springer, Heidelberg (2000)
Narita, M., Ogawa, Y.: The use of phrases from query texts in information retrieval. In: Proc. of ACM SIGIR 2000, Athens, Greece, pp. 318–320 (2000)
Nicolas, S., Moulin, B., Mineau, G.W.: Sesei: A CG-based filter for Internet search engines. In: Ganter, B., de Moor, A., Lex, W. (eds.) ICCS 2003. LNCS (LNAI), vol. 2746, pp. 362–377. Springer, Heidelberg (2003)
Peters, C., Borri, F. (eds.): Results of the CLEF 2003 Cross-Language System Evaluation Campaign, Working Notes for the CLEF 2003 Workshop, Trondheim, Norway (August 2003)
Rocchio, J.J.: Relevance Feedback in Information Retrieval. In: Salton, G. (ed.) The SMART Retrieval System-Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs (1971)
Vilares, J., Alonso, M.A.: A Grammatical Approach to the Extraction of Index Terms. In: Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP 2003), Borovets, Bulgaria, pp. 500–504 (2003)
Vilares, J., Barcala, F.M., Alonso, M.A.: Using syntactic dependency-pairs conation to improve retrieval performance in Spanish. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 381–390. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vilares, J., Alonso, M.A., Vilares, M. (2004). Morphological and Syntactic Processing for Text Retrieval. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-30075-5_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22936-0
Online ISBN: 978-3-540-30075-5
eBook Packages: Springer Book Archive