research-article

Visualizing differences in web search algorithms using the expected weighted hoeffding distance

Authors:

Kevyn Collins-ThompsonAuthors Info & Claims

WWW '10: Proceedings of the 19th international conference on World wide web

Pages 931 - 940

https://doi.org/10.1145/1772690.1772785

Published: 26 April 2010 Publication History

Abstract

We introduce a new dissimilarity function for ranked lists, the expected weighted Hoeffding distance, that has several advantages over current dissimilarity measures for ranked search results. First, it is easily customized for users who pay varying degrees of attention to websites at different ranks. Second, unlike existing measures such as generalized Kendall's tau, it is based on a true metric, preserving meaningful embeddings when visualization techniques like multi-dimensional scaling are applied. Third, our measure can effectively handle partial or missing rank information while retaining a probabilistic interpretation. Finally, the measure can be made computationally tractable and we give a highly efficient algorithm for computing it. We then apply our new metric with multi-dimensional scaling to visualize and explore relationships between the result sets from different search engines, showing how the weighted Hoeffding distance can distinguish important differences in search engine behavior that are not apparent with other rank-distance metrics. Such visualizations are highly effective at summarizing and analyzing insights on which search engines to use, what search strategies users can employ, and how search results evolve over time. We demonstrate our techniques using a collection of popular search engines, a representative set of queries, and frequently used query manipulation methods.

References

[1]

M. Alvo and P. Cabilio. Rank correlation methods for missing data. The Canadian Journal of Statistics, 23(4):345--358, 1995.

[2]

J. Bar-Ilan, K. Keenoy, E. Yaari, and M. Levene. User rankings of search engine results. Journal of American Society for Information Science and Technology, 58(9):1254--1266, 2007.

Digital Library

[3]

I. Borg and P. J. F. Groenen. Modern Multidimensional Scaling. Springer, 2nd edition, 2005.

[4]

B. Carterette. On rank correlation and the distance between rankings. In Proc. of the 32nd ACM SIGIR Conference, 2009.

Digital Library

[5]

D. E. Critchlow. Metric Methods for Analyzing Partially Ranked Data. Lecture Notes in Statistics, volume 34, Springer, 1985.

[6]

R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. In Proc. of ACM SODA, 2003.

Digital Library

[7]

M. A. Fligner and J. S. Verducci. Distance based ranking models. Journal of the Royal Statistical Society B, 43:359--369, 1986.

[8]

M. Gordon and P. Pathak. Finding information on the world wide web: the retrieval effectiveness of search engines. Information processing and management, 35(2):141--180, 1999.

Digital Library

[9]

L. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in www search. In Proc. of the ACM-SIGIR conference, pages 478--479, 2004.

Digital Library

[10]

K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.

Digital Library

[11]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proc. of the ACM-SIGIR conference, pages 154--161, 2005.

Digital Library

[12]

P. Kidwell, G. Lebanon, and W. S. Cleveland. Visualizing incomplete and partially ranked data. IEEE Transactions on Visualization and Computer Graphics, 14(6):1356--1363, 2008.

Digital Library

[13]

J. H. Lee. Combining multiple evidence from different properties of weighting schemes. In Proc. of the 18th ACM SIGIR conference, 1995.

Digital Library

[14]

W. Liggett and C. Buckley. Query expansion seen through return order of relevant documents. Technical report, NIST, 2001.

[15]

J. I. Marden. Analyzing and modeling rank data. CRC Press, 1996.

[16]

A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems, 27, 2008.

Digital Library

[17]

M. E. Rorvig. Images of similarity: A visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets. JASIS, 50(8):639--651, 1999.

Digital Library

[18]

Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 2000.

Digital Library

[19]

A. Spoerri. Metacrystal: A visual interface for meta searching. In Proceedings of ACM CHI, 2004.

Digital Library

[20]

L. Vaughan. New measurements for search engine evaluation proposed and tested. Information processing and management, 40(4):677--691, 2004.

Digital Library

[21]

E. Yilmaz, J. A. Aslam, and S. Robertson. A new rank correlation coefficient for information retrieval. In Proc. of the 31st ACM SIGIR conference, 2008.

Digital Library

Cited By

Clarke CDiaz FArabzadeh NChua TLauw HSi LTerzi ETsaparas P(2023)Preference-Based Offline EvaluationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3572725(1248-1251)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3572725
Clarke CVtyurina ASmucker MBalog KSetty VLioma CLiu YZhang MBerberich K(2020)Offline Evaluation without GainProceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval10.1145/3409256.3409816(185-192)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.1145/3409256.3409816
Han DPan JGuo FLuo XWu YZheng WChen W(2019)RankBrushers: interactive analysis of temporal ranking ensemblesJournal of Visualization10.1007/s12650-019-00598-x22:6(1241-1255)Online publication date: 23-Sep-2019
https://doi.org/10.1007/s12650-019-00598-x
Show More Cited By

Index Terms

Visualizing differences in web search algorithms using the expected weighted hoeffding distance
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
2. Mathematics of computing
  1. Discrete mathematics
    1. Combinatorics
      1. Permutations and combinations

Recommendations

Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Enhancing mobile search using web search log data
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Mobile search is still in infancy compared with general purpose web search. With limited training data and weak relevance features, the ranking performance in mobile search is far from satisfactory. To address this problem, we propose to leverage the ...
Intent-based diversification of web search results: metrics and algorithms

We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '10: Proceedings of the 19th international conference on World wide web

April 2010

1407 pages

ISBN:9781605587998

DOI:10.1145/1772690

General Chairs:
Michael Rappa
North Carolina State University, USA
,
Paul Jones
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Juliana Freire
University of Utah, USA
,
Soumen Chakrabarti
Indian Institute of Technology, India

Copyright © 2010 International World Wide Web Conference Committee (IW3C2).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '10

WWW '10: The 19th International World Wide Web Conference

April 26 - 30, 2010

North Carolina, Raleigh, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
368
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Clarke CDiaz FArabzadeh NChua TLauw HSi LTerzi ETsaparas P(2023)Preference-Based Offline EvaluationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3572725(1248-1251)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3572725
Clarke CVtyurina ASmucker MBalog KSetty VLioma CLiu YZhang MBerberich K(2020)Offline Evaluation without GainProceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval10.1145/3409256.3409816(185-192)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.1145/3409256.3409816
Han DPan JGuo FLuo XWu YZheng WChen W(2019)RankBrushers: interactive analysis of temporal ranking ensemblesJournal of Visualization10.1007/s12650-019-00598-x22:6(1241-1255)Online publication date: 23-Sep-2019
https://doi.org/10.1007/s12650-019-00598-x
Mohammad HXu KCallan JCulpepper JCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Dynamic Shard Cutoff Prediction for Selective SearchThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210005(85-94)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210005
Xia JHou YChen YQian ZEbert DChen W(2017)Visualizing Rank Time Series of Wikipedia Top-Viewed PagesIEEE Computer Graphics and Applications10.1109/MCG.2017.2137:2(42-53)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1109/MCG.2017.21
Lioma CHansen N(2017)A study of metrics of distance and correlation between ranked lists for compositionality detectionCognitive Systems Research10.1016/j.cogsys.2017.03.00144:C(40-49)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1016/j.cogsys.2017.03.001
Lei HXia JGuo FZou YChen WLiu Z(2016)Visual exploration of latent ranking evolutions in time seriesJournal of Visualization10.1007/s12650-016-0349-719:4(783-795)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1007/s12650-016-0349-7
Dillahunt TBrooks CGulati SBegole BKim JInkpen KWoo W(2015)Detecting and Visualizing Filter Bubbles in Google and BingProceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems10.1145/2702613.2732850(1851-1856)Online publication date: 18-Apr-2015
https://dl.acm.org/doi/10.1145/2702613.2732850
Luchen Tan Clarke C(2015)A Family of Rank Similarity Measures Based on Maximized Effectiveness DifferenceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.244854127:11(2865-2877)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1109/TKDE.2015.2448541
Farnoud Hassanzadeh FMilenkovic O(2014)An Axiomatic Approach to Constructing Distances for Rank Comparison and AggregationIEEE Transactions on Information Theory10.1109/TIT.2014.234576060:10(6417-6439)Online publication date: Oct-2014
https://doi.org/10.1109/TIT.2014.2345760
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

EPUB

View this article in ePub.

Figures

Tables

Media

View Table of Conten