Skip to main content

GIR with Language Modeling and DFR Using Terrier

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5706))

Abstract

This paper reports on additional experiments in the Monolingual English, German and Portuguese collections tasks to those described in CLEF2008 Working Notes. Experiments were performed using the language modeling approach and the Divergence From Randomness (DFR) InL2 model as implemented in Terrier (TERabyte RetrIEveR) version 2.1. The main purpose was twofold: 1) to compare these approaches to determine their impact on performance retrieval and 2) to compare results from these experiments with the results generated in the first set of experiments to determine whether query expansion and the presence or absence of diacritic marks have an impact on performance retrieval. The stopword list provided by Terrier was used to index all the collections. We removed diacritic marks from the topics and collections for German and Portuguese before indexing and retrieval. Topics were processed automatically and the query tags specified were the title and the description. Query expansion was included using the 20 top ranked documents and 40 terms. These parameters were selected arbitrarily. Results show that the DFR InL2 model outperformed language modeling for all the languages. Results of the new experiments with query expansion show an improvement in performance retrieval for all the languages. They also suggest that removing diacritic marks may also have an impact in the case of German and Portuguese.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Larson, R.R.: Geographic Information Retrieval and Spatial Browsing. In: Smith, L., Gluck, M. (eds.) GIS and Libraries: Patrons, Maps and Spatial Information, pp. 81–124. University of Illinois (1996)

    Google Scholar 

  2. Purves, R., Jones, C. (eds.): SIGIR 2004: Workshop on Geographic Information Retrieval, Sheffield, UK (2004)

    Google Scholar 

  3. Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)

    Google Scholar 

  4. Ounis, I., Liorna, C., Macdonald, C., Plachouras, V.: Research Directions in Terrier: a Search Engine for Advanced Retrieval on the Web. Novatica/UPGRADE Special Issue on Next Generation Web Search 8(1), 49–56 (2007)

    Google Scholar 

  5. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Liorna, C.: Performance and Scalable Information Retrieval Platform. In: Proceedings of ACM SIGIR 2006 Workshop on Open Source Retrieval (OSIR 2006), Seattle, Washington, USA, August 10 (2006)

    Google Scholar 

  6. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier Information Retrieval Platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. http://trec.nist.gov/trec_eval/

  8. Harter, S.P.: A probabilistic approach to automatic keyword indexing. Journal of the American Society for Information Science 26, 197–206, 280-289 (1975)

    Article  Google Scholar 

  9. Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring divergence from randomness. ACM Transactions on Information Systems (TOIS) 20(4), 357–389 (2002)

    Article  Google Scholar 

  10. Mandl, T., Carvalho, P., Gey, F., Larson, R., Santos, D., Womser-Hacker, C., Nunzio, G.D., Ferro, N.: GeoCLEF 2008: the CLEF2008 Cross-Language Geographic Information Retrieval Track Overview. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2008 Workshop, Aarhus, Denmark (2008)

    Google Scholar 

  11. Guillén, R.: CSUSM Experiments at GeoCLEF2005: 6th Workshop of the Cross-Language Evaluation Forum. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 956–962. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Guillén, R.: Monolingual and Bilingual Experiments in GeoCLEF 2006: Evaluation of Multilingual and Multi-modal Information Retrieval Cross-Language Information Forum. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 893–900. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Guillén, R.: GeoCLEF 2007 Experiments in Query Parsing and Cross-language GIR: CLEF 2007 Working Notes. In: Nardi, A., Peters, C. (eds.) ISSN per Working Notes and CD: 1818-8044 (2007), ISBN Abstracts: 2-912335-31-0

    Google Scholar 

  14. Guillén, R.: Cross-lingual Geographical Information Retrieval: CLEF 2008 Working Notes. In: Peters, C. (ed.), Aarhus, Denmark (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guillén, R. (2009). GIR with Language Modeling and DFR Using Terrier. In: Peters, C., et al. Evaluating Systems for Multilingual and Multimodal Information Access. CLEF 2008. Lecture Notes in Computer Science, vol 5706. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04447-2_107

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04447-2_107

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04446-5

  • Online ISBN: 978-3-642-04447-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics