skip to main content
10.1145/1076034.1076121acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval

Published:15 August 2005Publication History

ABSTRACT

In this article we present novel learning methods for estimating the quality of results returned by a search engine in response to a query. Estimation is based on the agreement between the top results of the full query and the top results of its sub-queries. We demonstrate the usefulness of quality estimation for several applications, among them improvement of retrieval, detecting queries for which no relevant content exists in the document collection, and distributed information retrieval. Experiments on TREC data demonstrate the robustness and the effectiveness of our learning algorithms.

References

  1. G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness and selective application of query expansion. In Proceedings of the 25th European Conference on Information Retrieval (ECIR 2004), pages 127--137, Sunderland, Great Britain, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  2. L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Chapman and Hall, 1993.Google ScholarGoogle Scholar
  4. C. Buckley. Why current IR engines fail. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 584--585. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Callan, Z. Lu, and W. Croft. Searching distributed collections with inference networks. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21--28, Seattle, Washington, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Carmel, E. Farchi, Y. Petruschka, and A. Soffer. Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 283--290. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, pages 37--46, 1960.Google ScholarGoogle Scholar
  8. S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 299--306. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 18--24. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley and Sons, Inc, New-York, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Harman and C. Buckley. The NRRC reliable information access (RIA) workshop. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 528--529. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In A. Apostolico and M. Melucci, editors, String Processing and Information Retrieval, 11th International Conference, SPIRE 2004, volume 3246 of Lecture Notes in Computer Science, 2004.Google ScholarGoogle Scholar
  13. T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). Association of Computer Machinery, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Kwok, L. Grunfeld, H. Sun, P. Deng, and N. Dinstl. TREC 2004 Robust Track Experiments using PIRCS. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google ScholarGoogle Scholar
  15. C. Piatko, J. Mayfield, P. McNamee, and S. Cost. JHU/APL at TREC 2004: Robust and Terabyte Tracks. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google ScholarGoogle Scholar
  16. B. Scholkopf and A. Smola. Leaning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Stork and E. Yom-Tov. Computer manual to accompany pattern classification. John Wiley and Sons, Inc, New-York, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Swen, X.-Q. Lu, H.-Y. Zan, Q. Su, Z.-G. Lai, K. Xiang, and J.-H. Hu. Part-of-Speech Sense Matrix Model Experiments in the TREC 2004 Robust Track at ICL, PKU. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google ScholarGoogle Scholar
  19. S. Tomlinson. Robust, Web and Terabyte Retrieval with Hummingbird SearchServer at TREC 2004. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google ScholarGoogle Scholar
  20. G. Upton and I. Cook. Oxford dictionary of statistics. Oxford university press, Oxford, UK, 2002.Google ScholarGoogle Scholar
  21. B. H. Vassilis~Plachouras and I. Ounis. University of Glasgow at TREC 2004: Experiments in Web, Robust and Terabyte tracks with Terrier. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google ScholarGoogle Scholar
  22. E. M. Voorhees. Overview of the TREC 2004 Robust Retrieval Track. In Proceedings of the 13th Text Retrieval Conference (TREC-13). National Institute of Standards and Technology (NIST), 2004.Google ScholarGoogle Scholar
  23. J. Xu and D. Lee. Cluster-based language models for distributed retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 254--261, Berkeley, California, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Yom-Tov, S. Fine, D. Carmel, A. Darlow, and E. Amitay. Juru at TREC 2004: Experiments with Prediction of Query Difficulty. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.Google ScholarGoogle Scholar

Index Terms

  1. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
      August 2005
      708 pages
      ISBN:1595930345
      DOI:10.1145/1076034

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 August 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader