Skip to main content

BUAP-UPV TPIRS: A System for Document Indexing Reduction at WebCLEF

  • Conference paper
Accessing Multilingual Information Repositories (CLEF 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4022))

Included in the following conference series:

  • 293 Accesses

Abstract

In this paper we present the results of BUAP/UPV universities in WebCLEF, a particular task of CLEF 2005. Particularly, we evaluate our information retrieval system at the bilingual “English to Spanish” task. Our system uses a term reduction process based on the Transition Point technique. Our results show that it is possible to reduce the number of terms to index, thereby improving the performance of our system. We evaluate different percentages of reduction over a subset of EuroGOV, in order to determine the best one. We observed that after reducing the 82.55% of the corpus, a Mean Reciprocal Rank of 0.0844 was obtained, compared with 0.0465 of such evaluation with full documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Artiles, J., Peinado, V., Peñas, A., Verdejo, F.: UNED at WebCLEF 2005, Extended abstract in Working notes of CLEF 2005, Viena (2005)

    Google Scholar 

  2. Booth, A.: A Law of Ocurrences for Words of Low Frequency. Information and control (1967)

    Google Scholar 

  3. Bueno, C., Pinto, D., Jimenez, H.: El párrafo virtual en la generación de extractos. Research on Computing Science Journal (2005) ISSN 1665-9899

    Google Scholar 

  4. Cabrera, R., Pinto, D., Jimenez, H., Vilariño, D.: Una nueva ponderación para el modelo de espacio vectorial de recuperación de información. Research on Computing Science Journal (2005) ISSN 1665-9899

    Google Scholar 

  5. CLEF 2005: Cross-Language Evaluation Forum (2005), http://www.clef-campaign.org/

  6. Croft, W.B.: Language Modeling for Information Retrieval. Lafferty, John (eds.). The Information Retrieval Series, vol. 13 (2003)

    Google Scholar 

  7. Jimenez, H., Pinto, D., Rosso, P.: Selección de Términos No Supervisada para Agrupamiento de Resúmenes. In: Proceedings of Workshop on Human Language, ENC 2005 (2005)

    Google Scholar 

  8. Martínez, T., Noguera, E., Muñoz, R., Llopis, F.: Web Track for CLEF2005 at ALICANTE UNIVERSTITY, Extended abstract in Working notes of CLEF 2005, Viena (2005)

    Google Scholar 

  9. Moyotl, E., Jimenez, H.: An Analysis on Frequency of Terms for Text Categorization. In: Proceedings of XX Conference of Spanish Natural Language Processing Society (SEPLN 2004) (2004)

    Google Scholar 

  10. Pinto, D., Pérez, F.: Una Técnica para la Identificación de Términos Multipalabra. In: Proceedings of 2nd. National Conference on Computer Science, Mexico (2004)

    Google Scholar 

  11. Pinto, D., Jiménez-Salazar, H., Rosso, P., Sanchis, E.: TPIRS: A System for Document Indexing Reduction on WebCLEF, Extended abstract in Working notes of CLEF 2005, Viena (2005)

    Google Scholar 

  12. Reyes-Aguirre, B., Moyotl-Hernández, E., Jiménez-Salazar, H.: Reducción de Términos Indice Usando el Punto de Transición. In: Proceedings of Facultad de Ciencias de Computación XX Anniversary Conferences, BUAP (2003)

    Google Scholar 

  13. Sigurbjörnsson, B., Kamps, J., de Rijke, M.: EuroGOV: Engineering a Multilingual Web Corpus. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 825–836. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Sigurbjörnsson, B., Kamps, J., de Rijke, M.: WebCLEF 2005: Cross-Lingual Web Retrieval. In: Proceedings of CLEF 2005 (2005)

    Google Scholar 

  15. TextCat: Language identification tool (2005), http://odur.let.rug.nl/~vannord/TextCat/

  16. Tovar, M., Carrillo, M., Pinto, D., Jimenez, H.: Combining Keyword Identification Techniques. Research on Computing Science Journal (2005) ISSN 1665-9899

    Google Scholar 

  17. Urbizagástegui, R.: Las posibilidades de la Ley de Zipf en la indización automática, Research report of the California Riverside University (1999)

    Google Scholar 

  18. Zipf, G.K.: Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge (1949)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pinto, D., Jiménez-Salazar, H., Rosso, P., Sanchis, E. (2006). BUAP-UPV TPIRS: A System for Document Indexing Reduction at WebCLEF. In: Peters, C., et al. Accessing Multilingual Information Repositories. CLEF 2005. Lecture Notes in Computer Science, vol 4022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11878773_96

Download citation

  • DOI: https://doi.org/10.1007/11878773_96

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45697-1

  • Online ISBN: 978-3-540-45700-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics