Abstract
This paper reports on additional experiments in the Monolingual English, German and Portuguese collections tasks to those described in CLEF2008 Working Notes. Experiments were performed using the language modeling approach and the Divergence From Randomness (DFR) InL2 model as implemented in Terrier (TERabyte RetrIEveR) version 2.1. The main purpose was twofold: 1) to compare these approaches to determine their impact on performance retrieval and 2) to compare results from these experiments with the results generated in the first set of experiments to determine whether query expansion and the presence or absence of diacritic marks have an impact on performance retrieval. The stopword list provided by Terrier was used to index all the collections. We removed diacritic marks from the topics and collections for German and Portuguese before indexing and retrieval. Topics were processed automatically and the query tags specified were the title and the description. Query expansion was included using the 20 top ranked documents and 40 terms. These parameters were selected arbitrarily. Results show that the DFR InL2 model outperformed language modeling for all the languages. Results of the new experiments with query expansion show an improvement in performance retrieval for all the languages. They also suggest that removing diacritic marks may also have an impact in the case of German and Portuguese.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Larson, R.R.: Geographic Information Retrieval and Spatial Browsing. In: Smith, L., Gluck, M. (eds.) GIS and Libraries: Patrons, Maps and Spatial Information, pp. 81–124. University of Illinois (1996)
Purves, R., Jones, C. (eds.): SIGIR 2004: Workshop on Geographic Information Retrieval, Sheffield, UK (2004)
Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)
Ounis, I., Liorna, C., Macdonald, C., Plachouras, V.: Research Directions in Terrier: a Search Engine for Advanced Retrieval on the Web. Novatica/UPGRADE Special Issue on Next Generation Web Search 8(1), 49–56 (2007)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Liorna, C.: Performance and Scalable Information Retrieval Platform. In: Proceedings of ACM SIGIR 2006 Workshop on Open Source Retrieval (OSIR 2006), Seattle, Washington, USA, August 10 (2006)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier Information Retrieval Platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005)
Harter, S.P.: A probabilistic approach to automatic keyword indexing. Journal of the American Society for Information Science 26, 197–206, 280-289 (1975)
Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring divergence from randomness. ACM Transactions on Information Systems (TOIS) 20(4), 357–389 (2002)
Mandl, T., Carvalho, P., Gey, F., Larson, R., Santos, D., Womser-Hacker, C., Nunzio, G.D., Ferro, N.: GeoCLEF 2008: the CLEF2008 Cross-Language Geographic Information Retrieval Track Overview. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2008 Workshop, Aarhus, Denmark (2008)
Guillén, R.: CSUSM Experiments at GeoCLEF2005: 6th Workshop of the Cross-Language Evaluation Forum. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 956–962. Springer, Heidelberg (2006)
Guillén, R.: Monolingual and Bilingual Experiments in GeoCLEF 2006: Evaluation of Multilingual and Multi-modal Information Retrieval Cross-Language Information Forum. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 893–900. Springer, Heidelberg (2007)
Guillén, R.: GeoCLEF 2007 Experiments in Query Parsing and Cross-language GIR: CLEF 2007 Working Notes. In: Nardi, A., Peters, C. (eds.) ISSN per Working Notes and CD: 1818-8044 (2007), ISBN Abstracts: 2-912335-31-0
Guillén, R.: Cross-lingual Geographical Information Retrieval: CLEF 2008 Working Notes. In: Peters, C. (ed.), Aarhus, Denmark (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guillén, R. (2009). GIR with Language Modeling and DFR Using Terrier. In: Peters, C., et al. Evaluating Systems for Multilingual and Multimodal Information Access. CLEF 2008. Lecture Notes in Computer Science, vol 5706. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04447-2_107
Download citation
DOI: https://doi.org/10.1007/978-3-642-04447-2_107
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04446-5
Online ISBN: 978-3-642-04447-2
eBook Packages: Computer ScienceComputer Science (R0)