ABSTRACT
In this article we present novel learning methods for estimating the quality of results returned by a search engine in response to a query. Estimation is based on the agreement between the top results of the full query and the top results of its sub-queries. We demonstrate the usefulness of quality estimation for several applications, among them improvement of retrieval, detecting queries for which no relevant content exists in the document collection, and distributed information retrieval. Experiments on TREC data demonstrate the robustness and the effectiveness of our learning algorithms.
- G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness and selective application of query expansion. In Proceedings of the 25th European Conference on Information Retrieval (ECIR 2004), pages 127--137, Sunderland, Great Britain, 2004.Google ScholarCross Ref
- L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarDigital Library
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Chapman and Hall, 1993.Google Scholar
- C. Buckley. Why current IR engines fail. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 584--585. ACM Press, 2004. Google ScholarDigital Library
- J. Callan, Z. Lu, and W. Croft. Searching distributed collections with inference networks. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21--28, Seattle, Washington, 1995. Google ScholarDigital Library
- D. Carmel, E. Farchi, Y. Petruschka, and A. Soffer. Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 283--290. ACM Press, 2002. Google ScholarDigital Library
- J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, pages 37--46, 1960.Google Scholar
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 299--306. ACM Press, 2002. Google ScholarDigital Library
- F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 18--24. ACM Press, 2004. Google ScholarDigital Library
- R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley and Sons, Inc, New-York, USA, 2001. Google ScholarDigital Library
- D. Harman and C. Buckley. The NRRC reliable information access (RIA) workshop. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 528--529. ACM Press, 2004. Google ScholarDigital Library
- B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In A. Apostolico and M. Melucci, editors, String Processing and Information Retrieval, 11th International Conference, SPIRE 2004, volume 3246 of Lecture Notes in Computer Science, 2004.Google Scholar
- T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). Association of Computer Machinery, 2002. Google ScholarDigital Library
- K. Kwok, L. Grunfeld, H. Sun, P. Deng, and N. Dinstl. TREC 2004 Robust Track Experiments using PIRCS. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
- C. Piatko, J. Mayfield, P. McNamee, and S. Cost. JHU/APL at TREC 2004: Robust and Terabyte Tracks. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
- B. Scholkopf and A. Smola. Leaning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA, USA, 2002. Google ScholarDigital Library
- D. Stork and E. Yom-Tov. Computer manual to accompany pattern classification. John Wiley and Sons, Inc, New-York, USA, 2004. Google ScholarDigital Library
- B. Swen, X.-Q. Lu, H.-Y. Zan, Q. Su, Z.-G. Lai, K. Xiang, and J.-H. Hu. Part-of-Speech Sense Matrix Model Experiments in the TREC 2004 Robust Track at ICL, PKU. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
- S. Tomlinson. Robust, Web and Terabyte Retrieval with Hummingbird SearchServer at TREC 2004. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
- G. Upton and I. Cook. Oxford dictionary of statistics. Oxford university press, Oxford, UK, 2002.Google Scholar
- B. H. Vassilis~Plachouras and I. Ounis. University of Glasgow at TREC 2004: Experiments in Web, Robust and Terabyte tracks with Terrier. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
- E. M. Voorhees. Overview of the TREC 2004 Robust Retrieval Track. In Proceedings of the 13th Text Retrieval Conference (TREC-13). National Institute of Standards and Technology (NIST), 2004.Google Scholar
- J. Xu and D. Lee. Cluster-based language models for distributed retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 254--261, Berkeley, California, 1999. Google ScholarDigital Library
- E. Yom-Tov, S. Fine, D. Carmel, A. Darlow, and E. Amitay. Juru at TREC 2004: Experiments with Prediction of Query Difficulty. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google Scholar
Index Terms
- Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval
Recommendations
Estimating the query difficulty for information retrieval
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalMany information retrieval (IR) systems suffer from a radical variance in performance when responding to users' queries. Even for systems that succeed very well on average, the quality of results returned for some of the queries is poor. Thus, it is ...
Query difficulty estimation via relevance prediction for image retrieval
Query difficulty estimation (QDE) attempts to automatically predict the performance of the search results returned for a given query. QDE has long been of interest in text retrieval. However, few research works have been conducted in image retrieval. ...
Query difficulty estimation for image retrieval
Query difficulty estimation predicts the performance of the search result of the given query. It is a powerful tool for multimedia retrieval and receives increasing attention. It can guide the pseudo relevance feedback to rerank the image search results ...
Comments