Article

Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval

Authors:
Elad Yom-Tov

IBM Haifa Research Labs, Haifa, Israel

IBM Haifa Research Labs, Haifa, Israel
View Profile

,
Shai Fine

IBM Haifa Research Labs, Haifa, Israel

IBM Haifa Research Labs, Haifa, Israel
View Profile

,
David Carmel

IBM Haifa Research Labs, Haifa, Israel

IBM Haifa Research Labs, Haifa, Israel
View Profile

,
Adam Darlow

IBM Haifa Research Labs, Haifa, Israel

IBM Haifa Research Labs, Haifa, Israel
View Profile

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrievalAugust 2005Pages 512–519https://doi.org/10.1145/1076034.1076121

Published:15 August 2005Publication History

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 512–519

ABSTRACT

In this article we present novel learning methods for estimating the quality of results returned by a search engine in response to a query. Estimation is based on the agreement between the top results of the full query and the top results of its sub-queries. We demonstrate the usefulness of quality estimation for several applications, among them improvement of retrieval, detecting queries for which no relevant content exists in the document collection, and distributed information retrieval. Experiments on TREC data demonstrate the robustness and the effectiveness of our learning algorithms.

References

G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness and selective application of query expansion. In Proceedings of the 25th European Conference on Information Retrieval (ECIR 2004), pages 127--137, Sunderland, Great Britain, 2004.Google ScholarCross Ref
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarDigital Library
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Chapman and Hall, 1993.Google Scholar
C. Buckley. Why current IR engines fail. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 584--585. ACM Press, 2004. Google ScholarDigital Library
J. Callan, Z. Lu, and W. Croft. Searching distributed collections with inference networks. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21--28, Seattle, Washington, 1995. Google ScholarDigital Library
D. Carmel, E. Farchi, Y. Petruschka, and A. Soffer. Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 283--290. ACM Press, 2002. Google ScholarDigital Library
J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, pages 37--46, 1960.Google Scholar
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 299--306. ACM Press, 2002. Google ScholarDigital Library
F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 18--24. ACM Press, 2004. Google ScholarDigital Library
R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley and Sons, Inc, New-York, USA, 2001. Google ScholarDigital Library
D. Harman and C. Buckley. The NRRC reliable information access (RIA) workshop. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 528--529. ACM Press, 2004. Google ScholarDigital Library
B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In A. Apostolico and M. Melucci, editors, String Processing and Information Retrieval, 11th International Conference, SPIRE 2004, volume 3246 of Lecture Notes in Computer Science, 2004.Google Scholar
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). Association of Computer Machinery, 2002. Google ScholarDigital Library
K. Kwok, L. Grunfeld, H. Sun, P. Deng, and N. Dinstl. TREC 2004 Robust Track Experiments using PIRCS. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
C. Piatko, J. Mayfield, P. McNamee, and S. Cost. JHU/APL at TREC 2004: Robust and Terabyte Tracks. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
B. Scholkopf and A. Smola. Leaning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA, USA, 2002. Google ScholarDigital Library
D. Stork and E. Yom-Tov. Computer manual to accompany pattern classification. John Wiley and Sons, Inc, New-York, USA, 2004. Google ScholarDigital Library
B. Swen, X.-Q. Lu, H.-Y. Zan, Q. Su, Z.-G. Lai, K. Xiang, and J.-H. Hu. Part-of-Speech Sense Matrix Model Experiments in the TREC 2004 Robust Track at ICL, PKU. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
S. Tomlinson. Robust, Web and Terabyte Retrieval with Hummingbird SearchServer at TREC 2004. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
G. Upton and I. Cook. Oxford dictionary of statistics. Oxford university press, Oxford, UK, 2002.Google Scholar
B. H. Vassilis~Plachouras and I. Ounis. University of Glasgow at TREC 2004: Experiments in Web, Robust and Terabyte tracks with Terrier. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
E. M. Voorhees. Overview of the TREC 2004 Robust Retrieval Track. In Proceedings of the 13th Text Retrieval Conference (TREC-13). National Institute of Standards and Technology (NIST), 2004.Google Scholar
J. Xu and D. Lee. Cluster-based language models for distributed retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 254--261, Berkeley, California, 1999. Google ScholarDigital Library
E. Yom-Tov, S. Fine, D. Carmel, A. Darlow, and E. Amitay. Juru at TREC 2004: Experiments with Prediction of Query Difficulty. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar

Index Terms

Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval
1. Information systems
  1. Information retrieval

Recommendations

Estimating the query difficulty for information retrieval
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Many information retrieval (IR) systems suffer from a radical variance in performance when responding to users' queries. Even for systems that succeed very well on average, the quality of results returned for some of the queries is poor. Thus, it is ...
Read More
Query difficulty estimation via relevance prediction for image retrieval

Query difficulty estimation (QDE) attempts to automatically predict the performance of the search results returned for a given query. QDE has long been of interest in text retrieval. However, few research works have been conducted in image retrieval. ...
Read More
Query difficulty estimation for image retrieval

Query difficulty estimation predicts the performance of the search result of the given query. It is a powerful tool for multimedia retrieval and receives increasing attention. It can guide the pseudo relevance feedback to rerank the image search results ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
General Chairs:
Ricardo Baeza-Yates
University of Chile, Chile
,
Nivio Ziviani
Federal University of Minas Gerais, Brazil
,
Program Chairs:
Gary Marchionini
University of North Carolina, USA
,
Alistair Moffat
University of Melbourne, Australia
,
John Tait
University of Sunderland, UK
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 August 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
query difficulty estimation
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 154
  Total Citations
  View Citations
- 2,274
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Estimating the query difficulty for information retrieval

Query difficulty estimation via relevance prediction for image retrieval

Query difficulty estimation for image retrieval