poster

Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations

Authors:
Kai Hui

Max Planck Institute for Informatics, Saarbrücken,Germany

Max Planck Institute for Informatics, Saarbrücken,Germany
View Profile

,
Klaus Berberich

Max Planck Institute for Informatics, Saarbrücken,Germany

Max Planck Institute for Informatics, Saarbrücken,Germany
View Profile

WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebApril 2016Pages 47–48https://doi.org/10.1145/2872518.2889370

Published:11 April 2016Publication History

WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web

Pages 47–48

ABSTRACT

Offline evaluation for information retrieval aims to compare the performance of retrieval systems based on relevance judgments for a set of test queries. Since manual judgments are expensive, selective labeling has been developed to semi-automatically label documents, in the wake of the similarity relationship among retrieved documents. Intuitively, the agreement w.r.t the cluster hypothesis can directly determine the amount of manual judgments that can be saved by creating labels with a semi-automatic method. Meanwhile, in representing documents, certain information is lost. We argue that better document representation can lead to better agreement with the cluster hypothesis. To this end, we investigate different document representations on established benchmarks in the context of low-cost evaluation, showing that different document representations vary in how well they capture document similarity relative to a query.

References

B. Carterette and J. Allan. Semiautomatic evaluation of retrieval systems using document similarities. CIKM 2007. Google ScholarDigital Library
K. Hui and K. Berberich. Selective labeling and incomplete label mitigation for low-cost evaluation. SPIRE 2015.Google Scholar
N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information storage and retrieval 1971.Google Scholar
T. K. Landauer, P. W. Foltz, and D. Laham. An introduction to latent semantic analysis. Discourse processes 1998.Google Scholar
D. M. Blei, A. Y. Ng and M. I. Jordan. Latent dirichlet allocation. JMLR 2003. Google ScholarDigital Library
Q. V. Le and T. Mikolov. Distributed Representations of Sentences and Documents. ICML 2014.Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. NIPS 2013.Google ScholarDigital Library
E. M. Voorhees. The cluster hypothesis revisited. SIGIR 1985. Google ScholarDigital Library

Index Terms

Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations
1. Information systems
  1. Information retrieval

Recommendations

Cluster-Based Document Retrieval with Multiple Queries
ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

The merits of using multiple queries representing the same information need to improve retrieval effectiveness have recently been demonstrated in several studies. In this paper we present the first study of utilizing multiple queries in cluster-based ...
Read More
Testing the Cluster Hypothesis with Focused and Graded Relevance Judgments
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

The cluster hypothesis is a fundamental concept in ad hoc retrieval. Heretofore, cluster hypothesis tests were applied to documents using binary relevance judgments. We present novel tests that utilize graded and focused relevance judgments; the latter ...
Read More
Using Text Segmentation to Enhance the Cluster Hypothesis
AIMSA '08: Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications

An alternative way to tackle Information Retrieval, called Passage Retrieval, considers text fragments independently rather than assessing global relevance of documents. In such a context, the fact that relevant information is surrounded by parts of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web
April 2016
1094 pages
ISBN:9781450341448
General Chairs:
Jacqueline Bourdeau
Tele-university (TELUQ), Montreal, QC, Canada
,
Jim A. Hendler
Rensselaer Polytechnic Institute, Troy, NY, USA
,
Roger Nkambou Nkambou
Université du Québec à Montréal, Montreal, QC, Canada
,
Program Chairs:
Ian Horrocks
University of Oxford, UK
,
Ben Y. Zhao
University of California at Santa Barbara, CA, USA
Copyright © 2016 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 11 April 2016
Check for updates
Author Tags
cluster hypothesis
document representation
low-cost evaluation
Qualifiers
- poster
Conference

Acceptance Rates
WWW '16 Companion Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 65
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations

WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cluster-Based Document Retrieval with Multiple Queries

Testing the Cluster Hypothesis with Focused and Graded Relevance Judgments

Using Text Segmentation to Enhance the Cluster Hypothesis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cluster Hypothesis in Low-Cost IR Evaluation with Different Document Representations

WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cluster-Based Document Retrieval with Multiple Queries

Testing the Cluster Hypothesis with Focused and Graded Relevance Judgments

Using Text Segmentation to Enhance the Cluster Hypothesis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media