Abstract
This work describes a variation on the traditional Information Retrieval paradigm, where instead of text documents being indexed according to their content, they are indexed according to the search terms previous users have used in finding them. We determine the effectiveness of this approach by indexing a sample of query logs from the European Library, and describe its usefulness for multilingual searching. In our analysis of the search logs, we determine the language of the past queries automatically, and annotate the search logs accordingly. From this information, we derive matrices to show that a) users tend to persist with the same query language throughout a query session, and b) submit queries in the same language as the interface they have selected.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hoi, C.-H., Lyu, M.R.: A Novel Log-Based Relevance Feedback Technique in Content-Based Image Retrieval. J. ACM Multimedia, 24–31 (2004)
Cui, H., Wen, J.-R., Nie, J.-Y., Ma, W.-Y.: Query Expansion by Mining User Logs. J. IEEE Transactions on Knowledge and Data Engineering 15(4), 829–839 (2003)
Mandl, T., Agosti, M., di Nunzio, G., Yeh, A., Mani, I., Doran, C., Schulz, J.M.: LogCLEF 2009: The CLEF 2009 Multilingual Logfile Analysis Track Overview. In: Working Notes for the CLEF 2009 Workshop, Corfu, Greece (2009)
Souter, C., Churcher, G., Hayes, J., Hughes, J., Johnson, S.: Natural Language Identification Using Corpus-Based Models. J. HERMES Journal of Linguistics 13, 183–203 (1994)
Europarl Parallel Corpus, http://www.statmt.org/europarl
Benedetto, D., Caglioti, E., Loreto, V.: Language Trees and Zipping. J. Physical Review Letters 88(4) (2002)
Goodman, J.: Extended Comment on “Language Trees and Zipping”, http://research.microsoft.com/en-us/um/people/joshuago/physicslongcomment.ps
Exalead Search Engine, http://www.exalead.com/search
Belga News Agency, http://www.belga.be/picture-home/index.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oakes, M., Xu, Y. (2010). A Search Engine Based on Query Logs, and Search Log Analysis by Automatic Language Identification. In: Peters, C., et al. Multilingual Information Access Evaluation I. Text Retrieval Experiments. CLEF 2009. Lecture Notes in Computer Science, vol 6241. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15754-7_64
Download citation
DOI: https://doi.org/10.1007/978-3-642-15754-7_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15753-0
Online ISBN: 978-3-642-15754-7
eBook Packages: Computer ScienceComputer Science (R0)