skip to main content
10.1145/1321440.1321465acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Comparing the effectiveness of hits and salsa

Published: 06 November 2007 Publication History

Abstract

This paper compares the effectiveness of two well-known query-dependent link-based ranking algorithms, "Hyperlink-Induced Topic Search" (HITS) and the "Stochastic Approach for Link-Structure Analysis" (SALSA). The two algorithms are evaluated on a very large web graph induced by 463 million crawled web pages and a set of 28,043 queries and 485,656 results labeled by human judges. We employed three different performance measures - mean average precision (MAP), mean reciprocal rank (MRR), and normalized discounted cumulative gain (NDCG). We found that as an isolated feature, SALSA substantially outperforms HITS. This is quite surprising, given that the two algorithms operate over the same neighborhood graph induced by the query result set. We also studied the combination of SALSA and HITS with BM25F, a state-of-the-art text-based scoring function that incorporates anchor text. We found that the combination of SALSA and BM25F outperforms the combination of HITS and BM25F. Finally, we broke down our query set by query specificity, and found that SALSA (and to a lesser extent HITS) is most effective for general queries.

References

[1]
B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. In Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 296--303, 2000.
[2]
A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas. Link analysis ranking: Algorithms, theory, and experiments. ACM Transactions on Interet Technology, 5(1):231--297, 2005.
[3]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107--117, 1998.
[4]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proc. of the 22nd International Conference on Machine Learning, pages 89--96, New York, NY, USA, 2005. ACM Press.
[5]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 668--677, 1998.
[6]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[7]
B. J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: a study of user queries on the web. ACM SIGIR Forum, 32(1):5--17, 1998.
[8]
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.
[9]
R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks and ISDN Systems, 33(1-6):387--401, 2000.
[10]
R. Lempel and S. Moran. SALSA: The stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 19(2):131--160, 2001.
[11]
M. Marchiori. The quest for correct information on the Web: Hyper search engines. In Computer Networks and ISDN Systems, 29(8-13):1225--1236, 1997.
[12]
M. Najork, H. Zaragoza and M. Taylor. HITS on the Web: How does it Compare? In 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 471--478, 2007.
[13]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[14]
K. Randall, R. Stata, J. Wiener and R. Wickremesinghe. The Link Database: Fast Access to Graphs of the Web. In Proc. of the Data Compression Conference, pages 122--131, 2002.
[15]
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC-13: Web and HARD tracks. In Proc. of the 13th Text Retrieval Conference, 2004.

Cited By

View all
  • (2023)Graph structure and homophily for label propagation in Graph Neural Networks2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC60832.2023.00037(194-201)Online publication date: 18-Dec-2023
  • (2022)PGPregelProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563474(386-402)Online publication date: 7-Nov-2022
  • (2021)Efficient Scalable Temporal Web Graph Store2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671984(263-273)Online publication date: 15-Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hits
  2. link-based ranking
  3. retrieval performance
  4. salsa
  5. web search

Qualifiers

  • Research-article

Conference

CIKM07

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Graph structure and homophily for label propagation in Graph Neural Networks2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC60832.2023.00037(194-201)Online publication date: 18-Dec-2023
  • (2022)PGPregelProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563474(386-402)Online publication date: 7-Nov-2022
  • (2021)Efficient Scalable Temporal Web Graph Store2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671984(263-273)Online publication date: 15-Dec-2021
  • (2019)Model-Based Prediction of the Size, the Language and the Quality of the Web DomainsIntelligent Systems Applications in Software Engineering10.1007/978-3-030-30329-7_20(209-225)Online publication date: 20-Sep-2019
  • (2018)Web Search Relevance RankingEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_463(4650-4655)Online publication date: 7-Dec-2018
  • (2017)Web Search Relevance RankingEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_463-3(1-5)Online publication date: 11-Feb-2017
  • (2016)Web Search Relevance RankingEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_463-2(1-5)Online publication date: 16-Nov-2016
  • (2015)A distributed platform to ease the development of recommendation algorithms on large-scale graphsProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832747.2832869(4353-4354)Online publication date: 25-Jul-2015
  • (2015)Unsupervised Spam Detection in Hyves Using SALSAProceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 201510.1007/978-81-322-2695-6_43(517-526)Online publication date: 25-Oct-2015
  • (2015)Summarizing Information by Means of Causal Sentences Through Causal Questions10th International Conference on Soft Computing Models in Industrial and Environmental Applications10.1007/978-3-319-19719-7_31(353-363)Online publication date: 27-May-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media