Abstract
In this our second participation in the CLEF Spanish monolingual track, we have continued applying Natural Language Processing techniques for single word and multi-word term conflation. Two different conflation approaches have been tested. The first approach is based on the lemmatization of the text in order to avoid inflectional variation. Our second approach consists of the employment of syntactic dependencies as complex index terms, in an attempt to solve the problems derived from syntactic variation and, in this way, to obtain more precise terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
ftp://ftp.cs.cornell.edu/pub/smart (site visited, October 2003)
http://www.clef-campaign.org (site visited, October 2003)
http://www.itl.nist.gov (site visited, October 2003)
Abney, S.: Partial parsing via finite-state cascades. Natural Language Engineering 2(4), 337–344 (1997)
Arampatzis, A., van der Weide, T., Koster, C., van Bommel, P.: Linguistically motivated information retrieval. In: Encyclopedia of Library and Information Science. Marcel Dekker, Inc., New York (2000)
Barcala, F.M., Vilares, J., Alonso, M.A., Graña, J., Vilares, M.: Tokenization and proper noun recognition for information retrieval. In: Tjoa, A.M., Wagner, R.R. (eds.) Thirteenth International Workshop on Database and Expert Systems Applications, pp. 246–250. IEEE Computer Society Press, Los Alamitos (2002)
Brants, T.: TnT - a statistical part-of-speech tagger. In: Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP 2000), Seattle (2000)
Buckley, C.: Implementation of the SMART information retrieval system. Technical report, Department of Computer Science, Cornell University (1985), Source code available at [1]
Graña, J.: Técnicas de Análisis Sintáctico Robusto para la Etiquetación del Lenguaje Natural. PhD thesis, University of La Coruña, La Coruña, Spain (2000)
Graña, J., Alonso, M.A., Vilares, M.: A common solution for tokenization and part-of-speech tagging: One-pass Viterbi algorithm vs. iterative approaches. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 3–10. Springer, Heidelberg (2002)
Graña, J., Barcala, F.M., Alonso, M.A.: Compilation methods of minimal acyclic automata for large dictionaries. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 135–148. Springer, Heidelberg (2003)
Graña, J., Barcala, F.M., Vilares, J.: Formal methods of tokenization for part-of-speech tagging. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 240–249. Springer, Heidelberg (2002)
Graña, J., Chappelier, J.-C., Vilares, M.: Integrating external dictionaries into stochastic part-of-speech taggers. In: Proceedings of the Euroconference Recent Advances in Natural Language Processing (RANLP 2001), Tzigov Chark, Bulgaria, pp. 122–128 (2001)
Hull, D.A., Grefenstette, G., Schulze, B.M., Gaussier, E., Schutze, H., Pedersen, J.O.: Xerox TREC-5 site report: routing, filtering, NLP, and Spanish tracks. In: Proceedings of the Fifth Text REtrieval Conference (TREC-5), pp. 167–180 (1997)
Jacquemin, C., Tzoukermann, E.: NLP for term variant extraction: synergy between morphology, lexicon and syntax. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval. Text, Speech and Language Technology, vol. 7, pp. 25–74. Kluwer Academic Publishers, Dordrecht (1999)
Perez-Carballo, J., Strzalkowski, T.: Natural language information retrieval: progress report. Information Processing and Management 36(1), 155–178 (2000)
Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. In: Voorhees, E., Harman, D.K. (eds.) Proceedings of the Eighth Text REtrieval Conference (TREC-8), pp. 151–161. NIST Special Publication 500-264 (2000)
Rocchio, J.J.: Relevance Feedback in Information Retrieval. In: Salton, G. (ed.) The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs (1971)
Savoy, J.: Report on CLEF 2002 Experiments: Combining Multiple Sources of Evidence. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) Advances in Cross-Language Information Retrieval: Results of the CLEF 2002 Evaluation Campaign. LNCS, vol. 2785, pp. 66–90. Springer, Heidelberg (2003)
Savoy, J., Le Calve, A., Vrajitoru, D.: Report on the TREC-5 experiment: Data fusion and collection fusion. In: Proceedings of TREC’5, pp. 489–502. NIST publication #500-238, Gaithersburg (1997)
Vilares, J., Alonso, M.A.: A Grammatical Approach to the Extraction of Index Terms. In: Angelova, G., Bontcheva, K., Mitkov, R., Nicolov, N. (eds.) Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP 2003), Borovets, Bulgaria, pp. 500–504 (2003)
Vilares, J., Alonso, M.A., Ribadas, F.J.: COLE experiments at CLEF 2003 Spanish monolingual track. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 197–206. Springer, Heidelberg (2004), Available at [2]
Vilares, J., Alonso, M.A., Ribadas, F.J., Vilares, M.: COLE experiments in the CLEF 2002 Spanish monolingual track. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) Advances in Cross-Language Information Retrieval: Results of the CLEF 2002 Evaluation Campaign. LNCS, vol. 2785, pp. 265–278. Springer, Heidelberg(2003)
Vilares, J., Barcala, F.M., Alonso, M.A.: Using syntactic dependency-pairs conflation to improve retrieval performance in Spanish. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 381–390. Springer, Heidelberg (2002)
Vilares, J., Cabrero, D., Alonso, M.A.: Applying productive derivational morphology to term indexing of Spanish texts. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 336–348. Springer, Heidelberg (2001)
Vogt, C., Cottrell, G.W.: Fusion via a linear combination of scores. Information Retrieval 1(3), 151–173 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vilares, J., Alonso, M.A., Ribadas, F.J. (2004). COLE Experiments at CLEF 2003 in the Spanish Monolingual Track. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds) Comparative Evaluation of Multilingual Information Access Systems. CLEF 2003. Lecture Notes in Computer Science, vol 3237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30222-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-30222-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24017-4
Online ISBN: 978-3-540-30222-3
eBook Packages: Springer Book Archive