research-article

Generating labels from clicks

Authors:

P. TsaparasAuthors Info & Claims

WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Pages 172 - 181

https://doi.org/10.1145/1498759.1498824

Published: 09 February 2009 Publication History

Abstract

The ranking function used by search engines to order results is learned from labeled training data. Each training point is a (query, URL) pair that is labeled by a human judge who assigns a score of Perfect, Excellent, etc., depending on how well the URL matches the query. In this paper, we study whether clicks can be used to automatically generate good labels. Intuitively, documents that are clicked (resp., skipped) in aggregate can indicate relevance (resp., lack of relevance). We give a novel way of transforming clicks into weighted, directed graphs inspired by eye-tracking studies and then devise an objective function for finding cuts in these graphs that induce a good labeling. In its full generality, the problem is NP-hard, but we show that, in the case of two labels, an optimum labeling can be found in linear time. For the more general case, we propose heuristic solutions. Experiments on real click logs show that click-based labels align with the opinion of a panel of judges, especially as the consensus of the panel grows stronger.

References

[1]

Rakesh Agrawal, Ralf Rantzau, and Evimaria Terzi. Context-sensitive ranking. In SIGMOD, pages 383--394, 2006.

Digital Library

[2]

Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. In STOC, pages 684--693, 2005.

Digital Library

[3]

Christopher J. C. Burges, Robert Ragno, and Quoc Viet Le. Learning to rank with nonsmooth cost functions. In NIPS, pages 193--200, 2006.

Digital Library

[4]

Christopher J. C. Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N. Hullender. Learning to rank using gradient descent. In ICML, volume 119, pages 89--96, 2005.

Digital Library

[5]

Charles L. A. Clarke, Eugene Agichtein, Susan T. Dumais, and Ryen W. White. The influence of caption features on clickthrough patterns in web search. In SIGIR, pages 135--142, 2007.

Digital Library

[6]

Nick Craswell and Martin Szummer. Random walks on the click graph. In SIGIR, pages 239--246, 2007.

Digital Library

[7]

Edward Cutrell. Private communication. 2008.

[8]

Edward Cutrell and Zhiwei Guan. What are you looking for?: an eye-tracking study of information usage in web search. In CHI, pages 407--416, 2007.

Digital Library

[9]

Zhicheng Dou, Ruihua Song, Xiaojie Yuan, and Ji-Rong Wen. Are click-through data adequate for learning web search rankings? In CIKM, pages 73--82, 2008.

Digital Library

[10]

Georges Dupret and Benjamin Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR, pages 331--338, 2008.

Digital Library

[11]

Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW, pages 613--622, 2001.

Digital Library

[12]

Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D. Sivakumar, and Erik Vee. Comparing and aggregating rankings with ties. In PODS, pages 47--58, 2004.

Digital Library

[13]

Jianlin Feng, Qiong Fang, and Wilfred Ng. Discovering bucket orders from full rankings. In SIGMOD, pages 55--66, 2008.

Digital Library

[14]

Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White. Evaluating implicit measures to improve web search. ACM Trans. Inf. Syst., 23(2):147--168, 2005.

Digital Library

[15]

Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. An efficient boosting algorithm for combining preferences. JMLR, 4:933--969, 2003.

Digital Library

[16]

Aristides Gionis, Heikki Mannila, Kai Puolamäki, and Antti Ukkonen. Algorithms for discovering bucket orders from data. In KDD, pages 561--566, 2006.

Digital Library

[17]

Zhiwei Guan and Edward Cutrell. An eye tracking study of the effect of target rank on web search. In CHI, pages 417--420, 2007.

Digital Library

[18]

Nicole Immorlica, Kamal Jain, Mohammad Mahdian, and Kunal Talwar. Click fraud resistant methods for learning click-through rates. In WINE, pages 34--45, 2005.

Digital Library

[19]

Bernard J. Jansen. Click fraud. IEEE Computer, 40(7):85--86, 2008.

Digital Library

[20]

Thorsten Joachims. Optimizing search engines using clickthrough data. In KDD, pages 133--142, 2002.

Digital Library

[21]

Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, and Geri Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR, pages 154--161, 2005.

Digital Library

[22]

Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst, 25(2), 2007.

Digital Library

[23]

Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT press, Cambridge, Massachusetts, 1994.

Digital Library

[24]

Nathan N. Liu and Qiang Yang. Eigenrank: a ranking-oriented approach to collaborative filtering. In SIGIR, pages 83--90, 2008.

Digital Library

[25]

Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicit feedback. In KDD, pages 239--248, 2005.

Digital Library

[26]

Filip Radlinski and Thorsten Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In AAAI, 2006.

Digital Library

[27]

Filip Radlinski and Thorsten Joachims. Active exploration for learning rankings from clickthrough data. In KDD, pages 570--579, 2007.

Digital Library

[28]

Michael J. Taylor, John Guiver, Stephen Robertson, and Tom Minka. Softrank: optimizing non-smooth rank metrics. In WSDM, pages 77--86, 2008.

Digital Library

[29]

Ramakrishna Varadarajan and Vagelis Hristidis. A system for query-specific document summarization. In CIKM, pages 622--631, 2006.

Digital Library

[30]

Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu. Learning to rank with ties. In SIGIR, pages 275--282, 2008.

Digital Library

Cited By

Cohen-Addad VLattanzi SMaggiori AParotsidis NSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Dynamic correlation clustering in sublinear update timeProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692438(9230-9274)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692438
Tavakoli LTrippas JZamani HScholer FSanderson M(2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3681786
Cao NCohen-Addad VLee ELi SNewman AVogl LMohar BShinkar IO'Donnell R(2024)Understanding the Cluster Linear Program for Correlation ClusteringProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649749(1605-1616)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649749
Show More Cited By

Index Terms

Generating labels from clicks
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Generating true relevance labels in chinese search engine using clickthrough data
AAAI'11: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence

In current search engines, ranking functions are learned from a large number of labeled <query, URL> pairs in which the labels are assigned by human judges, describing how well the URLs match the different queries. However in commercial search engines, ...
Can Clicks Be Both Labels and Features?: Unbiased Behavior Feature Collection and Uncertainty-aware Learning to Rank
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Using implicit feedback collected from user clicks as training labels for learning-to-rank algorithms is a well-developed paradigm that has been extensively studied and used in modern IR systems. Using user clicks as ranking features, on the other hand, ...
Using clicks as implicit judgments: expectations versus observations
ECIR'08: Proceedings of the IR research, 30th European conference on Advances in information retrieval

Clickthrough data has been the subject of increasing popularity as an implicit indicator of user feedback. Previous analysis has suggested that user click behaviour is subject to a quality bias--that is, users click at different rank positions when ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

February 2009

314 pages

ISBN:9781605583907

DOI:10.1145/1498759

Editors:
Ricardo Baeza-Yates
Yahoo! Research, Spain
,
Paolo Boldi
Universita degli Studi di Milano, Italy
,
Berthier Ribeiro-Neto
Google Engineering, Brazil & CS Dept., Univ. Fed. de Minas Gerais, Brazil
,
B. Barla Cambazoglu
Yahoo! Research

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Yahoo! Research
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Nokia
Google Inc.
SIGIR: ACM Special Interest Group on Information Retrieval
Microsoft: Microsoft

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM'09

Sponsor:

WSDM'09: Second ACM International Conference on Web Search and Web Data Mining

February 9 - 12, 2009

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
726
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cohen-Addad VLattanzi SMaggiori AParotsidis NSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Dynamic correlation clustering in sublinear update timeProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692438(9230-9274)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692438
Tavakoli LTrippas JZamani HScholer FSanderson M(2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3681786
Cao NCohen-Addad VLee ELi SNewman AVogl LMohar BShinkar IO'Donnell R(2024)Understanding the Cluster Linear Program for Correlation ClusteringProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649749(1605-1616)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649749
Cohen-Addad VLolck DPilipczuk MThorup MYan SZhang HMohar BShinkar IO'Donnell R(2024)Combinatorial Correlation ClusteringProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649712(1617-1628)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649712
Chehreghani MChehreghani M(2024)Hierarchical Correlation Clustering and Tree Preserving Embedding2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02178(23083-23093)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02178
Cohen-Addad VLee ELi SNewman A(2023)Handling Correlated Rounding Error via Preclustering: A 1.73-approximation for Correlation Clustering2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00065(1082-1104)Online publication date: 6-Nov-2023
https://doi.org/10.1109/FOCS57990.2023.00065
Cohen-Addad VFan CLattanzi SMitrović SNorouzi-Fard AParotsidis NTarnawski JKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Near-optimal correlation clustering with privacyProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602712(33702-33715)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602712
Behnezhad SCharikar MMa WTan L(2022)Almost 3-Approximate Correlation Clustering in Constant Rounds2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00074(720-731)Online publication date: Oct-2022
https://doi.org/10.1109/FOCS54457.2022.00074
Cohen-Addad VLee ENewman A(2022)Correlation Clustering with Sherali-Adams2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00068(651-661)Online publication date: Oct-2022
https://doi.org/10.1109/FOCS54457.2022.00068
Lattanzi SMoseley BVassilvitskii SWang YZhou RRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Robust online correlation clusteringProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540619(4688-4698)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540619
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten