Abstract
We present a study of the contributions of three classes of ranking signals: BM25F, a retrieval function that is based on words in the content of web pages and the anchors that link to them; SALSA, a link-based feature that takes all or part of the result set to a query as input; and matching-anchor count (MAC), a feature that measures precise matches between queries and anchors pointing to result pages. All three features incorporate both link and textual features, but in varying degrees. BM25F is the state-of-the art exponent of Salton’s term-vector model, and is based on a solid theoretical foundation; the two other features are somewhat more ad-hoc. We studied the impact of two factors that go into the formation of SALSA’s “base” set: whether to use conjunctive or disjunctive query semantics, and how many results to include into the base set. We found that the choice of query semantics has little impact on the effectiveness of SALSA (with conjunctive semantics having a slight edge); more surprisingly, we found that limiting the size of the base set to a few hundred results of high expected quality maximizes performance. Furthermore, we experimented with various linear combinations of BM25F, MAC and SALSA. In doing so, we made a remarkable observation: adding BM25F to a two-way weighted linear combination of MAC and SALSA does not increase performance in any statistically significant way.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Clarke, C., Craswell, N., Soboroff, I.: Report on the TREC 2009 Web Track. In: 18th Text Retrieval Conference (2009)
The ClueWeb 09 Dataset, http://boston.lti.cs.cmu.edu/Data/clueweb09/
Hansell, S.: Google keeps tweaking its search engine. New York Times (2007), http://www.nytimes.com/2007/06/03/business/yourmoney/03google.html
Craswell, N., Fetterly, D., Najork, M., Robertson, S., Yilmaz, E.: Microsoft Research at TREC 2009: Web and relevance feedback tracks. In: 18th Text Retrieval Conference (2009)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proc. 9th Annual ACM-SIAM Symposium on Discrete Algorithms (1998)
Lempel, R., Moran, S.: The stochastic approach for link-structure analysis (SALSA) and the TKC effect. In: 9th International World Wide Web Conference (2000)
Najork, M., Gollapudi, S., Panigrahy, R.: Less is More: Sampling the neighborhood graph makes SALSA better and faster. In: 2nd ACM International Conference on Web Search and Data Mining (2009)
Zaragoza, H., Craswell, N., Taylor, M., Saria, S., Robertson, S.: Microsoft Cambridge at TREC–13: Web and HARD tracks. In: 13th Text Retrieval Conference (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Craswell, N., Fetterly, D., Najork, M. (2011). The Power of Peers. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)