Skip to main content

COLE Experiments at CLEF 2003 in the Spanish Monolingual Track

  • Conference paper
Comparative Evaluation of Multilingual Information Access Systems (CLEF 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3237))

Included in the following conference series:

  • 397 Accesses

Abstract

In this our second participation in the CLEF Spanish monolingual track, we have continued applying Natural Language Processing techniques for single word and multi-word term conflation. Two different conflation approaches have been tested. The first approach is based on the lemmatization of the text in order to avoid inflectional variation. Our second approach consists of the employment of syntactic dependencies as complex index terms, in an attempt to solve the problems derived from syntactic variation and, in this way, to obtain more precise terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. ftp://ftp.cs.cornell.edu/pub/smart (site visited, October 2003)

  2. http://www.clef-campaign.org (site visited, October 2003)

  3. http://www.itl.nist.gov (site visited, October 2003)

  4. Abney, S.: Partial parsing via finite-state cascades. Natural Language Engineering 2(4), 337–344 (1997)

    Article  MathSciNet  Google Scholar 

  5. Arampatzis, A., van der Weide, T., Koster, C., van Bommel, P.: Linguistically motivated information retrieval. In: Encyclopedia of Library and Information Science. Marcel Dekker, Inc., New York (2000)

    Google Scholar 

  6. Barcala, F.M., Vilares, J., Alonso, M.A., Graña, J., Vilares, M.: Tokenization and proper noun recognition for information retrieval. In: Tjoa, A.M., Wagner, R.R. (eds.) Thirteenth International Workshop on Database and Expert Systems Applications, pp. 246–250. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  7. Brants, T.: TnT - a statistical part-of-speech tagger. In: Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP 2000), Seattle (2000)

    Google Scholar 

  8. Buckley, C.: Implementation of the SMART information retrieval system. Technical report, Department of Computer Science, Cornell University (1985), Source code available at [1]

    Google Scholar 

  9. Graña, J.: Técnicas de Análisis Sintáctico Robusto para la Etiquetación del Lenguaje Natural. PhD thesis, University of La Coruña, La Coruña, Spain (2000)

    Google Scholar 

  10. Graña, J., Alonso, M.A., Vilares, M.: A common solution for tokenization and part-of-speech tagging: One-pass Viterbi algorithm vs. iterative approaches. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 3–10. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Graña, J., Barcala, F.M., Alonso, M.A.: Compilation methods of minimal acyclic automata for large dictionaries. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 135–148. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Graña, J., Barcala, F.M., Vilares, J.: Formal methods of tokenization for part-of-speech tagging. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 240–249. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  13. Graña, J., Chappelier, J.-C., Vilares, M.: Integrating external dictionaries into stochastic part-of-speech taggers. In: Proceedings of the Euroconference Recent Advances in Natural Language Processing (RANLP 2001), Tzigov Chark, Bulgaria, pp. 122–128 (2001)

    Google Scholar 

  14. Hull, D.A., Grefenstette, G., Schulze, B.M., Gaussier, E., Schutze, H., Pedersen, J.O.: Xerox TREC-5 site report: routing, filtering, NLP, and Spanish tracks. In: Proceedings of the Fifth Text REtrieval Conference (TREC-5), pp. 167–180 (1997)

    Google Scholar 

  15. Jacquemin, C., Tzoukermann, E.: NLP for term variant extraction: synergy between morphology, lexicon and syntax. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval. Text, Speech and Language Technology, vol. 7, pp. 25–74. Kluwer Academic Publishers, Dordrecht (1999)

    Google Scholar 

  16. Perez-Carballo, J., Strzalkowski, T.: Natural language information retrieval: progress report. Information Processing and Management 36(1), 155–178 (2000)

    Article  Google Scholar 

  17. Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. In: Voorhees, E., Harman, D.K. (eds.) Proceedings of the Eighth Text REtrieval Conference (TREC-8), pp. 151–161. NIST Special Publication 500-264 (2000)

    Google Scholar 

  18. Rocchio, J.J.: Relevance Feedback in Information Retrieval. In: Salton, G. (ed.) The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs (1971)

    Google Scholar 

  19. Savoy, J.: Report on CLEF 2002 Experiments: Combining Multiple Sources of Evidence. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) Advances in Cross-Language Information Retrieval: Results of the CLEF 2002 Evaluation Campaign. LNCS, vol. 2785, pp. 66–90. Springer, Heidelberg (2003)

    Google Scholar 

  20. Savoy, J., Le Calve, A., Vrajitoru, D.: Report on the TREC-5 experiment: Data fusion and collection fusion. In: Proceedings of TREC’5, pp. 489–502. NIST publication #500-238, Gaithersburg (1997)

    Google Scholar 

  21. Vilares, J., Alonso, M.A.: A Grammatical Approach to the Extraction of Index Terms. In: Angelova, G., Bontcheva, K., Mitkov, R., Nicolov, N. (eds.) Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP 2003), Borovets, Bulgaria, pp. 500–504 (2003)

    Google Scholar 

  22. Vilares, J., Alonso, M.A., Ribadas, F.J.: COLE experiments at CLEF 2003 Spanish monolingual track. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 197–206. Springer, Heidelberg (2004), Available at [2]

    Google Scholar 

  23. Vilares, J., Alonso, M.A., Ribadas, F.J., Vilares, M.: COLE experiments in the CLEF 2002 Spanish monolingual track. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) Advances in Cross-Language Information Retrieval: Results of the CLEF 2002 Evaluation Campaign. LNCS, vol. 2785, pp. 265–278. Springer, Heidelberg(2003)

    Google Scholar 

  24. Vilares, J., Barcala, F.M., Alonso, M.A.: Using syntactic dependency-pairs conflation to improve retrieval performance in Spanish. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 381–390. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  25. Vilares, J., Cabrero, D., Alonso, M.A.: Applying productive derivational morphology to term indexing of Spanish texts. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 336–348. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  26. Vogt, C., Cottrell, G.W.: Fusion via a linear combination of scores. Information Retrieval 1(3), 151–173 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vilares, J., Alonso, M.A., Ribadas, F.J. (2004). COLE Experiments at CLEF 2003 in the Spanish Monolingual Track. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds) Comparative Evaluation of Multilingual Information Access Systems. CLEF 2003. Lecture Notes in Computer Science, vol 3237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30222-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30222-3_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24017-4

  • Online ISBN: 978-3-540-30222-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics