short-paper

TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions

Authors:

Shaoping MaAuthors Info & Claims

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Pages 2485 - 2488

https://doi.org/10.1145/3357384.3358158

Published: 03 November 2019 Publication History

Abstract

Web search session data is precious for a wide range of Information Retrieval (IR) tasks, such as session search, query suggestion, click through rate (CTR) prediction and so on. Numerous studies have shown the great potential of considering context information for search system optimization. The well-known TREC Session Tracks have enhanced the development in this domain to a great extent. However, they are mainly collected via user studies or crowdsourcing experiments and normally contain only tens to thousands sessions, which are deficient for the investigation with more sophisticated models. To tackle this obstacle, we present a new dataset that contains 147,155 refined web search sessions with both click-based and human-annotated relevance labels. The sessions are sampled from a huge search log thus can reflect real search scenarios. The proposed dataset can support a wide range of session-level or task-based IR studies. As an example, we test several interactive search models with both the PSCM and human relevance labels provided by this dataset and report the performance as a reference for future studies of session search.

References

[1]

David J Brenes and Daniel Gayo-Avello. 2009. Stratified analysis of AOL query log. Information Sciences 179, 12 (2009), 1844--1858.

Digital Library

[2]

Ben Carterette, Paul Clough, Mark Hall, Evangelos Kanoulas, and Mark Sanderson. 2016. Evaluating retrieval over sessions: The TREC session track 2011--2014. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 685--688.

Digital Library

[3]

Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. the web conference (2009), 1--10.

[4]

Georges Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. (2008), 331--338.

[5]

Thorsten Joachims. 1996. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Technical Report. Carnegie-mellon univ pittsburgh pa dept of computer science.

[6]

Mengyang Liu, Yiqun Liu, Jiaxin Mao, Cheng Luo, and Shaoping Ma. 2018. Towards Designing Better Session Search Evaluation Metrics. In SIGIR. 1121--1124.

[7]

Yiqun Liu, Xiaohui Xie, Chao Wang, Jianyun Nie, Min Zhang, and Shaoping Ma. 2017. Time-Aware Click Model. ACM Transactions on Information Systems 35, 3 (2017), 16.

Digital Library

[8]

Jiyun Luo, Sicong Zhang, and Hui Yang. 2014. Win-win search: dual-agent stochastic game in session search. (2014), 587--596.

[9]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. (2014), 1532--1543.

[10]

Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jianyun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. (2015), 283--292.

[11]

Kuansan Wang, Nikolas Gloy, and Xiaolong Li. 2010. Inferring search behaviors using partially observable Markov (POM) model. (2010), 211--220.

[12]

BinWu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2018. Query Suggestion With Feedback Memory Network. the web conference (2018), 1563--1571.

[13]

Danqing Xu, Yiqun Liu, Min Zhang, Shaoping Ma, and Liyun Ru. 2012. Incorporating revisiting behaviors into click models. (2012), 303--312.

[14]

Grace Hui Yang and Ian Soboroff. 2016. TREC 2016 Dynamic Domain Track Overview. In TREC.

[15]

Sicong Zhang, Dongyi Guan, and Hui Yang. 2013. Query change as relevance feedback in session search. (2013), 821--824.

[16]

Yukun Zheng, Zhen Fan, Yiqun Liu, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Sogou-QCL: A New Dataset with Click Relevance Label. (2018), 1117--1120.

Cited By

Huang ZZhu YDou ZWen J(2025)CAGS: Context-Aware Document Ranking With Contrastive Graph SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.349199637:1(89-101)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3491996
Chen MLiu CLiu ZLi ZSun JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Identifiability mattersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692342(7057-7080)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692342
Jianping LYingfei WJian WMeng WXintao C(2024)A topic relevance-aware click model for web searchJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23689446:4(8961-8974)Online publication date: 18-Apr-2024
https://doi.org/10.3233/JIFS-236894
Show More Cited By

Index Terms

TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment
      2. Test collections
  2. World Wide Web
    1. Web mining
      1. Web log analysis

Recommendations

Query change as relevance feedback in session search
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Session search is the Information Retrieval (IR) task that performs document retrieval for an entire session. During a session, users often change queries to explore and investigate the information needs. In this paper, we propose to use query change as ...
Emphasizing temporal-based user profile modeling in the context of session search
SAC '17: Proceedings of the Symposium on Applied Computing

In this paper, we aim at modeling the user profile containing timely relevant information extracted from his interactions with search engines. We considered a time-sensitive user profile that provides relevant and fresh information inferred from his ...
Deriving a test collection for clinical information retrieval from systematic reviews
DTMBIO '10: Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics

In this paper, we describe the construction of a test collection for evaluating clinical information retrieval. The purpose of this test collection is to provide a basis for researchers to experiment with PECO-structured queries. Systematic reviews are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

November 2019

3373 pages

ISBN:9781450369763

DOI:10.1145/3357384

General Chairs:
Wenwu Zhu
Tsinghua University, China
,
Dacheng Tao
University of Massachusetts, USA
,
Xueqi Cheng
Institute of Computing Technology, CAS, China
,
Program Chairs:
Peng Cui
Tsinghua University, China
,
Elke Rundensteiner
Worcester Polytechnic Institute, USA
,
David Carmel
Amazon Research, USA
,
Qi He
LinkedIn, USA
,
Jeffrey Xu Yu
Chinese University of Hong Kong, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Natural Science Foundation of China
National Key Research and Development Program of China

Conference

CIKM '19

Sponsor:

CIKM '19: The 28th ACM International Conference on Information and Knowledge Management

November 3 - 7, 2019

Beijing, China

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
225
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)2

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang ZZhu YDou ZWen J(2025)CAGS: Context-Aware Document Ranking With Contrastive Graph SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.349199637:1(89-101)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3491996
Chen MLiu CLiu ZLi ZSun JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Identifiability mattersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692342(7057-7080)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692342
Jianping LYingfei WJian WMeng WXintao C(2024)A topic relevance-aware click model for web searchJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23689446:4(8961-8974)Online publication date: 18-Apr-2024
https://doi.org/10.3233/JIFS-236894
Wu STu QZhong MLiu HXu JGu JYan RSerra ESpezzano F(2024)Bridge the Gap between Past and Future: Siamese Model Optimization for Context-Aware Document RankingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679661(2564-2574)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679661
Wu STu QLiu HXu JLiu ZZhang GWang RChen XYan RChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Unify Graph Learning with Text: Unleashing LLM Potentials for Session SearchProceedings of the ACM Web Conference 202410.1145/3589334.3645574(1509-1518)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645574
Chen HDou ZZhu YWen J(2024)Query-Oriented Data Augmentation for Session SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341913136:11(6877-6888)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3419131
Liu JWang YWang JWang MChu X(2024)Probabilistic graph model and neural network perspective of click models for web searchKnowledge and Information Systems10.1007/s10115-024-02145-z66:10(5829-5873)Online publication date: 6-Jun-2024
https://doi.org/10.1007/s10115-024-02145-z
Zhu ZQin RHuang JDai XYu YYu YZhang W(2023)Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender SystemsACM Transactions on Information Systems10.1145/363786942:4(1-32)Online publication date: 15-Dec-2023
https://dl.acm.org/doi/10.1145/3637869
Li YLiu YDai XLin JLai HLiu YYu Y(2023)Adversarially Trained Environment Models Are Effective Policy Evaluators and Improvers - An Application to Information RetrievalProceedings of the Fifth International Conference on Distributed Artificial Intelligence10.1145/3627676.3627680(1-12)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3627676.3627680
Xiao TKveton BKatariya SGangwani TRangi A(2023)Towards Sequential Counterfactual Learning to RankProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625325(122-128)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625325
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents