skip to main content
10.1145/2983323.2983737acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Selective Cluster-Based Document Retrieval

Authors Info & Claims
Published:24 October 2016Publication History

ABSTRACT

We address the long standing challenge of selective cluster-based retrieval; namely, deciding on a per-query basis whether to apply cluster-based document retrieval or standard document retrieval. To address this classification task, we propose a few sets of features based on those utilized by the cluster-based ranker, query-performance predictors, and properties of the clustering structure. Empirical evaluation shows that our method outperforms state-of-the-art retrieval approaches, including cluster-based, query expansion, and term proximity methods.

References

  1. N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 -- novelty and hard. In Proc. of TREC-13, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  2. N. Balasubramanian and J. Allan. Learning to select rankers. In Proc. of SIGIR, pages 855--856, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Bendersky, W. B. Croft, and Y. Diao. Quality-biased ranking of web documents. In Proc. of WSDM, pages 95--104, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Carmel and E. Yom-Tov. Estimating the Query Difficulty for Information Retrieval. Synthesis lectures on information concepts, retrieval, and services. Morgan & Claypool, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Collins-Thompson, P. N. Bennett, F. Diaz, C. Clarke, and E. M. Voorhees. TREC 2013 web track overview. In Proc. of TREC, 2013.Google ScholarGoogle Scholar
  6. G. V. Cormack, C. L. A. Clarke, and S. Büttcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc. of SIGIR, pages 758--759, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval Journal, 14(5):441--465, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  9. W. B. Croft and R. Thompson. The use of adaptive mechanisms for selection of search strategies in document retrieval systems. In Proc. of SIGIR, pages 95--110, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. B. Croft and R. H. Thompson. I 3 R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science and Technology, 38(6):389--404, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proc. of SIGIR, pages 299--306, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Diaz. Regularizing ad hoc retrieval scores. In Proc. of CIKM, pages 672--679, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Gaonkar, A. Sotiras, and C. Davatzikos. Deriving statistical significance maps for support vector regression using medical imaging data. In International Workshop on Pattern Recognition in Neuroimaging, PRNI, pages 13--16, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Griffiths, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS), 37(1):3--11, 1986.Google ScholarGoogle Scholar
  15. M. A. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. SIGKDD Explorations, 11(1):10--18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In Proc. of SPIRE, pages 43--54, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  17. N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  18. O. Kurland. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proc. of SIGIR, pages 171--178, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Kurland. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, 12(4):437--460, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. O. Kurland and C. Domshlak. A rank-aggregation approach to searching for optimal query-specific clusters. In Proc. of SIGIR, pages 547--554, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. O. Kurland and L. Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proc. of SIGIR, pages 306--313, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. O. Kurland and L. Lee. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proc. of SIGIR, pages 83--90, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. Kurland, F. Raiber, and A. Shtok. Query-performance prediction and cluster ranking: Two sides of the same coin. In Proc. of CIKM, pages 2459--2462, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR, pages 111--119, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of SIGIR, pages 120--127, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K.-S. Lee, W. B. Croft, and J. Allan. A cluster-based resampling method for pseudo-relevance feedback. In Proc. of SIGIR, pages 235--242, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K.-S. Lee, Y.-C. Park, and K.-S. Choi. Re-ranking model based on document clusters. Information Processing and Management, 37(1):1--14, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proc. of SIGIR, pages 186--193, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, University of Massachusetts, 2006.Google ScholarGoogle Scholar
  30. X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR, pages 454--462, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Macdonald, R. L. T. Santos, and I. Ounis. On the usefulness of query features for learning to rank. In Proc. of CIKM, pages 2559--2562, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Meister, O. Kurland, and I. G. Kalmanovich. Re-ranking search results using an additional retrieved list. Information Retrieval, 14(4):413--437, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Metzler and W. B. Croft. A Markov random field model for term dependencies. In Proc. of SIGIR, pages 472--479, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report, Advances in Kernel Methods - Support Vector Learning, 1998.Google ScholarGoogle Scholar
  35. F. Raiber and O. Kurland. Ranking document clusters using markov random fields. In Proc. of SIGIR, pages 333--342, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. Raiber and O. Kurland. Query-performance prediction: setting the expectations straight. In Proc. of SIGIR, pages 13--22, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Shtok, O. Kurland, and D. Carmel. Predicting query performance by query-drift estimation. In Proc. of ICTIR, pages 305--312, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Tombros, R. Villa, and C. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management, 38(4):559--582, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. V. Vinay, I. J. Cox, N. Milic-Frayling, and K. R. Wood. On ranking the effectiveness of searches. In Proc. of SIGIR, pages 398--404, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. M. Voorhees. The cluster hypothesis revisited. In Proc. of SIGIR, pages 188--196, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.Google ScholarGoogle Scholar
  42. P. Willett. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management, 24(5):577--97, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. Yang, D. Ji, G. Zhou, Y. Nie, and G. Xiao. Document re-ranking using cluster validation and label propagation. In Proc. of CIKM, pages 690--697, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. of SIGIR, pages 334--342, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Y. Zhao, F. Scholer, and Y. Tsegay. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proc. of ECIR, pages 52--64, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Selective Cluster-Based Document Retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
      October 2016
      2566 pages
      ISBN:9781450340731
      DOI:10.1145/2983323

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 October 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader