skip to main content
10.1145/3209978.3210092acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Sogou-QCL: A New Dataset with Click Relevance Label

Published: 27 June 2018 Publication History

Abstract

Data is of vital importance in the development of machine learning technologies. Recently, within the information retrieval field, a number of neural ranking frameworks have been proposed to address the ad-hoc search. These models usually need a large amount of query-document relevance judgments for training. However, obtaining this kind of relevance judgments needs a lot of money and manual effort. To shed light on this problem, researchers seek to use implicit feedback from users of search engines to improve the ranking performance. In this paper, we present a new dataset, Sogou-QCL, which contains 537,366 queries and five kinds of weak relevance labels for over 12 million query-document pairs. We apply Sogou-QCL dataset to train recent neural ranking models and show its potential to serve as weak supervision for ranking. We believe that Sogou-QCL will have a broad impact on corresponding areas.

References

[1]
Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. In WWW '09.
[2]
Kevyn Collins-Thompson, Craig Macdonald, Paul Bennett, Fernando Diaz, and Ellen M Voorhees. 2015. TREC 2014 Web track overview. Technical Report.
[3]
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W Bruce Croft. 2017. Neural ranking models with weak supervision. In SIGIR '17.
[4]
Georges E Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. In SIGIR '08.
[5]
Yixing Fan, Liang Pang, JianPeng Hou, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. 2017. MatchZoo: A Toolkit for Deep Text Matching. arXiv preprint arXiv:1707.07270 (2017).
[6]
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In CIKM '16.
[7]
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In NIPS '14.
[8]
Tie-Yan Liu, Jun Xu, Tao Qin, Wenying Xiong, and Hang Li. 2007. LETOR: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR '07 workshop on learning to rank for information retrieval.
[9]
Yiqun Liu, Ruihua Song, Min Zhang, Zhicheng Dou, Takehiro Yamamoto, Makoto P Kato, Hiroaki Ohshima, and Ke Zhou. 2014. Overview of the NTCIR-11 IMine Task. In NTCIR '14.
[10]
Yiqun Liu, Xiaohui Xie, Chao Wang, Jian-Yun Nie, Min Zhang, and Shaoping Ma. 2017. Time-aware click model. TOIS 35, 3 (2017), 16.
[11]
Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou, Chenyan Xiong, and Jingfang Xu. 2017. Overview of the NTCIR-13 We Want Web task. Proc. NTCIR-13 (2017).
[12]
Sean MacAvaney, Kai Hui, and Andrew Yates. 2017. An Approach for WeaklySupervised Deep Information Retrieval. arXiv preprint arXiv:1707.00189 (2017).
[13]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS '13.
[14]
Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In InfoScale '06.
[15]
Pavel Serdyukov, Georges Dupret, and Nick Craswell. 2014. Log-based personalization: The 4th web search click data (WSCD) workshop. In WSDM '14.
[16]
Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating non-sequential behavior into click models. In SIGIR '15.
[17]
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In SIGIR '17.
[18]
Wanhong Xu, Eren Manavoglu, and Erick Cantu-Paz. 2010. Temporal click model for sponsored search. In SIGIR '10.
[19]
Yuchen Zhang, Weizhu Chen, Dong Wang, and Qiang Yang. 2011. User-click modeling for understanding and predicting search-behavior. In SIGKDD '11.
[20]
Yuye Zhang and Alistair Moffat. 2006. Some Observations on User Search Behaviour. Austr. J. Intelligent Information Processing Systems 9, 2 (2006), 1--8.

Cited By

View all
  • (2024)CWRCzech: 100M Query-Document Czech Click Dataset and Its Application to Web Relevance RankingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657851(1221-1231)Online publication date: 10-Jul-2024
  • (2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
  • (2023)PSLOG: Pretraining with Search Logs for Document RankingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599477(2072-2082)Online publication date: 6-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
June 2018
1509 pages
ISBN:9781450356572
DOI:10.1145/3209978
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document ranking
  2. search evaluation
  3. test collection

Qualifiers

  • Short-paper

Funding Sources

  • Natural Science Foundation of China
  • National Key Basic Research Program

Conference

SIGIR '18
Sponsor:

Acceptance Rates

SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)5
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)CWRCzech: 100M Query-Document Czech Click Dataset and Its Application to Web Relevance RankingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657851(1221-1231)Online publication date: 10-Jul-2024
  • (2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
  • (2023)PSLOG: Pretraining with Search Logs for Document RankingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599477(2072-2082)Online publication date: 6-Aug-2023
  • (2023)LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search EvaluationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591921(3086-3094)Online publication date: 19-Jul-2023
  • (2023)T2Ranking: A Large-scale Chinese Benchmark for Passage RankingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591874(2681-2690)Online publication date: 19-Jul-2023
  • (2022)A Cooperative Neural Information Retrieval Pipeline with Knowledge Enhanced Automatic Query ReformulationProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498516(553-561)Online publication date: 11-Feb-2022
  • (2022)The Istella22 DatasetProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531740(3099-3107)Online publication date: 6-Jul-2022
  • (2022)Multi-CPRProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531736(3046-3056)Online publication date: 6-Jul-2022
  • (2022)SRTEF: Test Function Recommendation With Scenarios and Latent Semantic for Implementing Stepwise Test CaseIEEE Transactions on Reliability10.1109/TR.2022.316464571:2(1127-1140)Online publication date: Jun-2022
  • (2022)Understanding the role of human-inspired heuristics for retrieval modelsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-0016-y16:1Online publication date: 1-Feb-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media