Abstract
In this paper we present the results of BUAP/UPV universities in WebCLEF, a particular task of CLEF 2005. Particularly, we evaluate our information retrieval system at the bilingual “English to Spanish” task. Our system uses a term reduction process based on the Transition Point technique. Our results show that it is possible to reduce the number of terms to index, thereby improving the performance of our system. We evaluate different percentages of reduction over a subset of EuroGOV, in order to determine the best one. We observed that after reducing the 82.55% of the corpus, a Mean Reciprocal Rank of 0.0844 was obtained, compared with 0.0465 of such evaluation with full documents.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Artiles, J., Peinado, V., Peñas, A., Verdejo, F.: UNED at WebCLEF 2005, Extended abstract in Working notes of CLEF 2005, Viena (2005)
Booth, A.: A Law of Ocurrences for Words of Low Frequency. Information and control (1967)
Bueno, C., Pinto, D., Jimenez, H.: El párrafo virtual en la generación de extractos. Research on Computing Science Journal (2005) ISSN 1665-9899
Cabrera, R., Pinto, D., Jimenez, H., Vilariño, D.: Una nueva ponderación para el modelo de espacio vectorial de recuperación de información. Research on Computing Science Journal (2005) ISSN 1665-9899
CLEF 2005: Cross-Language Evaluation Forum (2005), http://www.clef-campaign.org/
Croft, W.B.: Language Modeling for Information Retrieval. Lafferty, John (eds.). The Information Retrieval Series, vol. 13 (2003)
Jimenez, H., Pinto, D., Rosso, P.: Selección de Términos No Supervisada para Agrupamiento de Resúmenes. In: Proceedings of Workshop on Human Language, ENC 2005 (2005)
Martínez, T., Noguera, E., Muñoz, R., Llopis, F.: Web Track for CLEF2005 at ALICANTE UNIVERSTITY, Extended abstract in Working notes of CLEF 2005, Viena (2005)
Moyotl, E., Jimenez, H.: An Analysis on Frequency of Terms for Text Categorization. In: Proceedings of XX Conference of Spanish Natural Language Processing Society (SEPLN 2004) (2004)
Pinto, D., Pérez, F.: Una Técnica para la Identificación de Términos Multipalabra. In: Proceedings of 2nd. National Conference on Computer Science, Mexico (2004)
Pinto, D., Jiménez-Salazar, H., Rosso, P., Sanchis, E.: TPIRS: A System for Document Indexing Reduction on WebCLEF, Extended abstract in Working notes of CLEF 2005, Viena (2005)
Reyes-Aguirre, B., Moyotl-Hernández, E., Jiménez-Salazar, H.: Reducción de Términos Indice Usando el Punto de Transición. In: Proceedings of Facultad de Ciencias de Computación XX Anniversary Conferences, BUAP (2003)
Sigurbjörnsson, B., Kamps, J., de Rijke, M.: EuroGOV: Engineering a Multilingual Web Corpus. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 825–836. Springer, Heidelberg (2006)
Sigurbjörnsson, B., Kamps, J., de Rijke, M.: WebCLEF 2005: Cross-Lingual Web Retrieval. In: Proceedings of CLEF 2005 (2005)
TextCat: Language identification tool (2005), http://odur.let.rug.nl/~vannord/TextCat/
Tovar, M., Carrillo, M., Pinto, D., Jimenez, H.: Combining Keyword Identification Techniques. Research on Computing Science Journal (2005) ISSN 1665-9899
Urbizagástegui, R.: Las posibilidades de la Ley de Zipf en la indización automática, Research report of the California Riverside University (1999)
Zipf, G.K.: Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge (1949)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pinto, D., Jiménez-Salazar, H., Rosso, P., Sanchis, E. (2006). BUAP-UPV TPIRS: A System for Document Indexing Reduction at WebCLEF. In: Peters, C., et al. Accessing Multilingual Information Repositories. CLEF 2005. Lecture Notes in Computer Science, vol 4022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11878773_96
Download citation
DOI: https://doi.org/10.1007/11878773_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45697-1
Online ISBN: 978-3-540-45700-8
eBook Packages: Computer ScienceComputer Science (R0)