skip to main content
10.1145/2872518.2889370acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
poster

Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations

Authors Info & Claims
Published:11 April 2016Publication History

ABSTRACT

Offline evaluation for information retrieval aims to compare the performance of retrieval systems based on relevance judgments for a set of test queries. Since manual judgments are expensive, selective labeling has been developed to semi-automatically label documents, in the wake of the similarity relationship among retrieved documents. Intuitively, the agreement w.r.t the cluster hypothesis can directly determine the amount of manual judgments that can be saved by creating labels with a semi-automatic method. Meanwhile, in representing documents, certain information is lost. We argue that better document representation can lead to better agreement with the cluster hypothesis. To this end, we investigate different document representations on established benchmarks in the context of low-cost evaluation, showing that different document representations vary in how well they capture document similarity relative to a query.

References

  1. B. Carterette and J. Allan. Semiautomatic evaluation of retrieval systems using document similarities. CIKM 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Hui and K. Berberich. Selective labeling and incomplete label mitigation for low-cost evaluation. SPIRE 2015.Google ScholarGoogle Scholar
  3. N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information storage and retrieval 1971.Google ScholarGoogle Scholar
  4. T. K. Landauer, P. W. Foltz, and D. Laham. An introduction to latent semantic analysis. Discourse processes 1998.Google ScholarGoogle Scholar
  5. D. M. Blei, A. Y. Ng and M. I. Jordan. Latent dirichlet allocation. JMLR 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Q. V. Le and T. Mikolov. Distributed Representations of Sentences and Documents. ICML 2014.Google ScholarGoogle Scholar
  7. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. NIPS 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. M. Voorhees. The cluster hypothesis revisited. SIGIR 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web
                April 2016
                1094 pages
                ISBN:9781450341448

                Copyright © 2016 Copyright is held by the owner/author(s)

                Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

                Publisher

                International World Wide Web Conferences Steering Committee

                Republic and Canton of Geneva, Switzerland

                Publication History

                • Published: 11 April 2016

                Check for updates

                Qualifiers

                • poster

                Acceptance Rates

                WWW '16 Companion Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader