skip to main content
10.1145/1498759.1498824acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Generating labels from clicks

Published: 09 February 2009 Publication History

Abstract

The ranking function used by search engines to order results is learned from labeled training data. Each training point is a (query, URL) pair that is labeled by a human judge who assigns a score of Perfect, Excellent, etc., depending on how well the URL matches the query. In this paper, we study whether clicks can be used to automatically generate good labels. Intuitively, documents that are clicked (resp., skipped) in aggregate can indicate relevance (resp., lack of relevance). We give a novel way of transforming clicks into weighted, directed graphs inspired by eye-tracking studies and then devise an objective function for finding cuts in these graphs that induce a good labeling. In its full generality, the problem is NP-hard, but we show that, in the case of two labels, an optimum labeling can be found in linear time. For the more general case, we propose heuristic solutions. Experiments on real click logs show that click-based labels align with the opinion of a panel of judges, especially as the consensus of the panel grows stronger.

References

[1]
Rakesh Agrawal, Ralf Rantzau, and Evimaria Terzi. Context-sensitive ranking. In SIGMOD, pages 383--394, 2006.
[2]
Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. In STOC, pages 684--693, 2005.
[3]
Christopher J. C. Burges, Robert Ragno, and Quoc Viet Le. Learning to rank with nonsmooth cost functions. In NIPS, pages 193--200, 2006.
[4]
Christopher J. C. Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N. Hullender. Learning to rank using gradient descent. In ICML, volume 119, pages 89--96, 2005.
[5]
Charles L. A. Clarke, Eugene Agichtein, Susan T. Dumais, and Ryen W. White. The influence of caption features on clickthrough patterns in web search. In SIGIR, pages 135--142, 2007.
[6]
Nick Craswell and Martin Szummer. Random walks on the click graph. In SIGIR, pages 239--246, 2007.
[7]
Edward Cutrell. Private communication. 2008.
[8]
Edward Cutrell and Zhiwei Guan. What are you looking for?: an eye-tracking study of information usage in web search. In CHI, pages 407--416, 2007.
[9]
Zhicheng Dou, Ruihua Song, Xiaojie Yuan, and Ji-Rong Wen. Are click-through data adequate for learning web search rankings? In CIKM, pages 73--82, 2008.
[10]
Georges Dupret and Benjamin Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR, pages 331--338, 2008.
[11]
Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW, pages 613--622, 2001.
[12]
Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D. Sivakumar, and Erik Vee. Comparing and aggregating rankings with ties. In PODS, pages 47--58, 2004.
[13]
Jianlin Feng, Qiong Fang, and Wilfred Ng. Discovering bucket orders from full rankings. In SIGMOD, pages 55--66, 2008.
[14]
Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White. Evaluating implicit measures to improve web search. ACM Trans. Inf. Syst., 23(2):147--168, 2005.
[15]
Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. An efficient boosting algorithm for combining preferences. JMLR, 4:933--969, 2003.
[16]
Aristides Gionis, Heikki Mannila, Kai Puolamäki, and Antti Ukkonen. Algorithms for discovering bucket orders from data. In KDD, pages 561--566, 2006.
[17]
Zhiwei Guan and Edward Cutrell. An eye tracking study of the effect of target rank on web search. In CHI, pages 417--420, 2007.
[18]
Nicole Immorlica, Kamal Jain, Mohammad Mahdian, and Kunal Talwar. Click fraud resistant methods for learning click-through rates. In WINE, pages 34--45, 2005.
[19]
Bernard J. Jansen. Click fraud. IEEE Computer, 40(7):85--86, 2008.
[20]
Thorsten Joachims. Optimizing search engines using clickthrough data. In KDD, pages 133--142, 2002.
[21]
Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, and Geri Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR, pages 154--161, 2005.
[22]
Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst, 25(2), 2007.
[23]
Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT press, Cambridge, Massachusetts, 1994.
[24]
Nathan N. Liu and Qiang Yang. Eigenrank: a ranking-oriented approach to collaborative filtering. In SIGIR, pages 83--90, 2008.
[25]
Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicit feedback. In KDD, pages 239--248, 2005.
[26]
Filip Radlinski and Thorsten Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In AAAI, 2006.
[27]
Filip Radlinski and Thorsten Joachims. Active exploration for learning rankings from clickthrough data. In KDD, pages 570--579, 2007.
[28]
Michael J. Taylor, John Guiver, Stephen Robertson, and Tom Minka. Softrank: optimizing non-smooth rank metrics. In WSDM, pages 77--86, 2008.
[29]
Ramakrishna Varadarajan and Vagelis Hristidis. A system for query-specific document summarization. In CIKM, pages 622--631, 2006.
[30]
Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu. Learning to rank with ties. In SIGIR, pages 275--282, 2008.

Cited By

View all
  • (2024)Dynamic correlation clustering in sublinear update timeProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692438(9230-9274)Online publication date: 21-Jul-2024
  • (2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
  • (2024)Understanding the Cluster Linear Program for Correlation ClusteringProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649749(1605-1616)Online publication date: 10-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining
February 2009
314 pages
ISBN:9781605583907
DOI:10.1145/1498759
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. generating training data
  2. graph partitioning

Qualifiers

  • Research-article

Conference

WSDM'09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Dynamic correlation clustering in sublinear update timeProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692438(9230-9274)Online publication date: 21-Jul-2024
  • (2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
  • (2024)Understanding the Cluster Linear Program for Correlation ClusteringProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649749(1605-1616)Online publication date: 10-Jun-2024
  • (2024)Combinatorial Correlation ClusteringProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649712(1617-1628)Online publication date: 10-Jun-2024
  • (2024)Hierarchical Correlation Clustering and Tree Preserving Embedding2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02178(23083-23093)Online publication date: 16-Jun-2024
  • (2023)Handling Correlated Rounding Error via Preclustering: A 1.73-approximation for Correlation Clustering2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00065(1082-1104)Online publication date: 6-Nov-2023
  • (2022)Near-optimal correlation clustering with privacyProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602712(33702-33715)Online publication date: 28-Nov-2022
  • (2022)Almost 3-Approximate Correlation Clustering in Constant Rounds2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00074(720-731)Online publication date: Oct-2022
  • (2022)Correlation Clustering with Sherali-Adams2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00068(651-661)Online publication date: Oct-2022
  • (2021)Robust online correlation clusteringProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540619(4688-4698)Online publication date: 6-Dec-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media