A Search Engine Based on Query Logs, and Search Log Analysis by Automatic Language Identification

Oakes, Michael; Xu, Yan

doi:10.1007/978-3-642-15754-7_64

A Search Engine Based on Query Logs, and Search Log Analysis by Automatic Language Identification

Michael Oakes²³ &
Yan Xu²³

Conference paper

691 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6241))

Abstract

This work describes a variation on the traditional Information Retrieval paradigm, where instead of text documents being indexed according to their content, they are indexed according to the search terms previous users have used in finding them. We determine the effectiveness of this approach by indexing a sample of query logs from the European Library, and describe its usefulness for multilingual searching. In our analysis of the search logs, we determine the language of the past queries automatically, and annotate the search logs accordingly. From this information, we derive matrices to show that a) users tend to persist with the same query language throughout a query session, and b) submit queries in the same language as the interface they have selected.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hoi, C.-H., Lyu, M.R.: A Novel Log-Based Relevance Feedback Technique in Content-Based Image Retrieval. J. ACM Multimedia, 24–31 (2004)
Google Scholar
Cui, H., Wen, J.-R., Nie, J.-Y., Ma, W.-Y.: Query Expansion by Mining User Logs. J. IEEE Transactions on Knowledge and Data Engineering 15(4), 829–839 (2003)
Article Google Scholar
Mandl, T., Agosti, M., di Nunzio, G., Yeh, A., Mani, I., Doran, C., Schulz, J.M.: LogCLEF 2009: The CLEF 2009 Multilingual Logfile Analysis Track Overview. In: Working Notes for the CLEF 2009 Workshop, Corfu, Greece (2009)
Google Scholar
Souter, C., Churcher, G., Hayes, J., Hughes, J., Johnson, S.: Natural Language Identification Using Corpus-Based Models. J. HERMES Journal of Linguistics 13, 183–203 (1994)
Google Scholar
Europarl Parallel Corpus, http://www.statmt.org/europarl
Benedetto, D., Caglioti, E., Loreto, V.: Language Trees and Zipping. J. Physical Review Letters 88(4) (2002)
Google Scholar
Goodman, J.: Extended Comment on “Language Trees and Zipping”, http://research.microsoft.com/en-us/um/people/joshuago/physicslongcomment.ps
Exalead Search Engine, http://www.exalead.com/search
Belga News Agency, http://www.belga.be/picture-home/index.html

Download references

Author information

Authors and Affiliations

Dept. of Computing, Engineering and Technology, DGIC, University of Sunderland, St. Peter’s Campus, Sunderland, SR6 0DD, England
Michael Oakes & Yan Xu

Authors

Michael Oakes
View author publications
You can also search for this author in PubMed Google Scholar
Yan Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISTI-CNR, Area Ricerca CNR, Via Moruzzi, 1, 56124, Pisa, Italy
Carol Peters
Department of Information Engineering, University of Padua, Via gradenigo, 6/a, 35131, Padova, Italy
Giorgio Maria Di Nunzio
Aalto Univesity, P.O. Box 15400, 00076, Aalto, Finland
Mikko Kurimo
University of Hildesheim, 31141, Hildesheim, Germany
Thomas Mandl
ELDA/ELRA, 75013, Paris, France
Djamel Mostefa
LSI-UNED, 28040, Madrid, Spain
Anselmo Peñas
Matrixware, 1060, Vienna, Austria
Giovanna Roda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oakes, M., Xu, Y. (2010). A Search Engine Based on Query Logs, and Search Log Analysis by Automatic Language Identification. In: Peters, C., et al. Multilingual Information Access Evaluation I. Text Retrieval Experiments. CLEF 2009. Lecture Notes in Computer Science, vol 6241. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15754-7_64

Download citation

DOI: https://doi.org/10.1007/978-3-642-15754-7_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15753-0
Online ISBN: 978-3-642-15754-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics