Skip to main content

The Power of Peers

  • Conference paper
Book cover Advances in Information Retrieval (ECIR 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Included in the following conference series:

  • 6681 Accesses

Abstract

We present a study of the contributions of three classes of ranking signals: BM25F, a retrieval function that is based on words in the content of web pages and the anchors that link to them; SALSA, a link-based feature that takes all or part of the result set to a query as input; and matching-anchor count (MAC), a feature that measures precise matches between queries and anchors pointing to result pages. All three features incorporate both link and textual features, but in varying degrees. BM25F is the state-of-the art exponent of Salton’s term-vector model, and is based on a solid theoretical foundation; the two other features are somewhat more ad-hoc. We studied the impact of two factors that go into the formation of SALSA’s “base” set: whether to use conjunctive or disjunctive query semantics, and how many results to include into the base set. We found that the choice of query semantics has little impact on the effectiveness of SALSA (with conjunctive semantics having a slight edge); more surprisingly, we found that limiting the size of the base set to a few hundred results of high expected quality maximizes performance. Furthermore, we experimented with various linear combinations of BM25F, MAC and SALSA. In doing so, we made a remarkable observation: adding BM25F to a two-way weighted linear combination of MAC and SALSA does not increase performance in any statistically significant way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Clarke, C., Craswell, N., Soboroff, I.: Report on the TREC 2009 Web Track. In: 18th Text Retrieval Conference (2009)

    Google Scholar 

  2. The ClueWeb 09 Dataset, http://boston.lti.cs.cmu.edu/Data/clueweb09/

  3. Hansell, S.: Google keeps tweaking its search engine. New York Times (2007), http://www.nytimes.com/2007/06/03/business/yourmoney/03google.html

  4. Craswell, N., Fetterly, D., Najork, M., Robertson, S., Yilmaz, E.: Microsoft Research at TREC 2009: Web and relevance feedback tracks. In: 18th Text Retrieval Conference (2009)

    Google Scholar 

  5. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proc. 9th Annual ACM-SIAM Symposium on Discrete Algorithms (1998)

    Google Scholar 

  6. Lempel, R., Moran, S.: The stochastic approach for link-structure analysis (SALSA) and the TKC effect. In: 9th International World Wide Web Conference (2000)

    Google Scholar 

  7. Najork, M., Gollapudi, S., Panigrahy, R.: Less is More: Sampling the neighborhood graph makes SALSA better and faster. In: 2nd ACM International Conference on Web Search and Data Mining (2009)

    Google Scholar 

  8. Zaragoza, H., Craswell, N., Taylor, M., Saria, S., Robertson, S.: Microsoft Cambridge at TREC–13: Web and HARD tracks. In: 13th Text Retrieval Conference (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Craswell, N., Fetterly, D., Najork, M. (2011). The Power of Peers. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20161-5_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20160-8

  • Online ISBN: 978-3-642-20161-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics