skip to main content
10.1145/1772690.1772785acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Visualizing differences in web search algorithms using the expected weighted hoeffding distance

Published: 26 April 2010 Publication History

Abstract

We introduce a new dissimilarity function for ranked lists, the expected weighted Hoeffding distance, that has several advantages over current dissimilarity measures for ranked search results. First, it is easily customized for users who pay varying degrees of attention to websites at different ranks. Second, unlike existing measures such as generalized Kendall's tau, it is based on a true metric, preserving meaningful embeddings when visualization techniques like multi-dimensional scaling are applied. Third, our measure can effectively handle partial or missing rank information while retaining a probabilistic interpretation. Finally, the measure can be made computationally tractable and we give a highly efficient algorithm for computing it. We then apply our new metric with multi-dimensional scaling to visualize and explore relationships between the result sets from different search engines, showing how the weighted Hoeffding distance can distinguish important differences in search engine behavior that are not apparent with other rank-distance metrics. Such visualizations are highly effective at summarizing and analyzing insights on which search engines to use, what search strategies users can employ, and how search results evolve over time. We demonstrate our techniques using a collection of popular search engines, a representative set of queries, and frequently used query manipulation methods.

References

[1]
M. Alvo and P. Cabilio. Rank correlation methods for missing data. The Canadian Journal of Statistics, 23(4):345--358, 1995.
[2]
J. Bar-Ilan, K. Keenoy, E. Yaari, and M. Levene. User rankings of search engine results. Journal of American Society for Information Science and Technology, 58(9):1254--1266, 2007.
[3]
I. Borg and P. J. F. Groenen. Modern Multidimensional Scaling. Springer, 2nd edition, 2005.
[4]
B. Carterette. On rank correlation and the distance between rankings. In Proc. of the 32nd ACM SIGIR Conference, 2009.
[5]
D. E. Critchlow. Metric Methods for Analyzing Partially Ranked Data. Lecture Notes in Statistics, volume 34, Springer, 1985.
[6]
R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. In Proc. of ACM SODA, 2003.
[7]
M. A. Fligner and J. S. Verducci. Distance based ranking models. Journal of the Royal Statistical Society B, 43:359--369, 1986.
[8]
M. Gordon and P. Pathak. Finding information on the world wide web: the retrieval effectiveness of search engines. Information processing and management, 35(2):141--180, 1999.
[9]
L. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in www search. In Proc. of the ACM-SIGIR conference, pages 478--479, 2004.
[10]
K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.
[11]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proc. of the ACM-SIGIR conference, pages 154--161, 2005.
[12]
P. Kidwell, G. Lebanon, and W. S. Cleveland. Visualizing incomplete and partially ranked data. IEEE Transactions on Visualization and Computer Graphics, 14(6):1356--1363, 2008.
[13]
J. H. Lee. Combining multiple evidence from different properties of weighting schemes. In Proc. of the 18th ACM SIGIR conference, 1995.
[14]
W. Liggett and C. Buckley. Query expansion seen through return order of relevant documents. Technical report, NIST, 2001.
[15]
J. I. Marden. Analyzing and modeling rank data. CRC Press, 1996.
[16]
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems, 27, 2008.
[17]
M. E. Rorvig. Images of similarity: A visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets. JASIS, 50(8):639--651, 1999.
[18]
Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 2000.
[19]
A. Spoerri. Metacrystal: A visual interface for meta searching. In Proceedings of ACM CHI, 2004.
[20]
L. Vaughan. New measurements for search engine evaluation proposed and tested. Information processing and management, 40(4):677--691, 2004.
[21]
E. Yilmaz, J. A. Aslam, and S. Robertson. A new rank correlation coefficient for information retrieval. In Proc. of the 31st ACM SIGIR conference, 2008.

Cited By

View all
  • (2023)Preference-Based Offline EvaluationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3572725(1248-1251)Online publication date: 27-Feb-2023
  • (2020)Offline Evaluation without GainProceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval10.1145/3409256.3409816(185-192)Online publication date: 14-Sep-2020
  • (2019)RankBrushers: interactive analysis of temporal ranking ensemblesJournal of Visualization10.1007/s12650-019-00598-x22:6(1241-1255)Online publication date: 23-Sep-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages
ISBN:9781605587998
DOI:10.1145/1772690

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. expected weighted hoeffding distance
  2. ranking
  3. search algorithm dissimilarity

Qualifiers

  • Research-article

Conference

WWW '10
WWW '10: The 19th International World Wide Web Conference
April 26 - 30, 2010
North Carolina, Raleigh, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Preference-Based Offline EvaluationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3572725(1248-1251)Online publication date: 27-Feb-2023
  • (2020)Offline Evaluation without GainProceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval10.1145/3409256.3409816(185-192)Online publication date: 14-Sep-2020
  • (2019)RankBrushers: interactive analysis of temporal ranking ensemblesJournal of Visualization10.1007/s12650-019-00598-x22:6(1241-1255)Online publication date: 23-Sep-2019
  • (2018)Dynamic Shard Cutoff Prediction for Selective SearchThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210005(85-94)Online publication date: 27-Jun-2018
  • (2017)Visualizing Rank Time Series of Wikipedia Top-Viewed PagesIEEE Computer Graphics and Applications10.1109/MCG.2017.2137:2(42-53)Online publication date: 1-Mar-2017
  • (2017)A study of metrics of distance and correlation between ranked lists for compositionality detectionCognitive Systems Research10.1016/j.cogsys.2017.03.00144:C(40-49)Online publication date: 1-Aug-2017
  • (2016)Visual exploration of latent ranking evolutions in time seriesJournal of Visualization10.1007/s12650-016-0349-719:4(783-795)Online publication date: 1-Nov-2016
  • (2015)Detecting and Visualizing Filter Bubbles in Google and BingProceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems10.1145/2702613.2732850(1851-1856)Online publication date: 18-Apr-2015
  • (2015)A Family of Rank Similarity Measures Based on Maximized Effectiveness DifferenceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.244854127:11(2865-2877)Online publication date: 1-Nov-2015
  • (2014)An Axiomatic Approach to Constructing Distances for Rank Comparison and AggregationIEEE Transactions on Information Theory10.1109/TIT.2014.234576060:10(6417-6439)Online publication date: Oct-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

EPUB

View this article in ePub.

ePub

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media