skip to main content
10.1145/3357384.3358158acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions

Published: 03 November 2019 Publication History

Abstract

Web search session data is precious for a wide range of Information Retrieval (IR) tasks, such as session search, query suggestion, click through rate (CTR) prediction and so on. Numerous studies have shown the great potential of considering context information for search system optimization. The well-known TREC Session Tracks have enhanced the development in this domain to a great extent. However, they are mainly collected via user studies or crowdsourcing experiments and normally contain only tens to thousands sessions, which are deficient for the investigation with more sophisticated models. To tackle this obstacle, we present a new dataset that contains 147,155 refined web search sessions with both click-based and human-annotated relevance labels. The sessions are sampled from a huge search log thus can reflect real search scenarios. The proposed dataset can support a wide range of session-level or task-based IR studies. As an example, we test several interactive search models with both the PSCM and human relevance labels provided by this dataset and report the performance as a reference for future studies of session search.

References

[1]
David J Brenes and Daniel Gayo-Avello. 2009. Stratified analysis of AOL query log. Information Sciences 179, 12 (2009), 1844--1858.
[2]
Ben Carterette, Paul Clough, Mark Hall, Evangelos Kanoulas, and Mark Sanderson. 2016. Evaluating retrieval over sessions: The TREC session track 2011--2014. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 685--688.
[3]
Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. the web conference (2009), 1--10.
[4]
Georges Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. (2008), 331--338.
[5]
Thorsten Joachims. 1996. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Technical Report. Carnegie-mellon univ pittsburgh pa dept of computer science.
[6]
Mengyang Liu, Yiqun Liu, Jiaxin Mao, Cheng Luo, and Shaoping Ma. 2018. Towards Designing Better Session Search Evaluation Metrics. In SIGIR. 1121--1124.
[7]
Yiqun Liu, Xiaohui Xie, Chao Wang, Jianyun Nie, Min Zhang, and Shaoping Ma. 2017. Time-Aware Click Model. ACM Transactions on Information Systems 35, 3 (2017), 16.
[8]
Jiyun Luo, Sicong Zhang, and Hui Yang. 2014. Win-win search: dual-agent stochastic game in session search. (2014), 587--596.
[9]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. (2014), 1532--1543.
[10]
Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jianyun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. (2015), 283--292.
[11]
Kuansan Wang, Nikolas Gloy, and Xiaolong Li. 2010. Inferring search behaviors using partially observable Markov (POM) model. (2010), 211--220.
[12]
BinWu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2018. Query Suggestion With Feedback Memory Network. the web conference (2018), 1563--1571.
[13]
Danqing Xu, Yiqun Liu, Min Zhang, Shaoping Ma, and Liyun Ru. 2012. Incorporating revisiting behaviors into click models. (2012), 303--312.
[14]
Grace Hui Yang and Ian Soboroff. 2016. TREC 2016 Dynamic Domain Track Overview. In TREC.
[15]
Sicong Zhang, Dongyi Guan, and Hui Yang. 2013. Query change as relevance feedback in session search. (2013), 821--824.
[16]
Yukun Zheng, Zhen Fan, Yiqun Liu, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Sogou-QCL: A New Dataset with Click Relevance Label. (2018), 1117--1120.

Cited By

View all
  • (2025)CAGS: Context-Aware Document Ranking With Contrastive Graph SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.349199637:1(89-101)Online publication date: Jan-2025
  • (2024)Identifiability mattersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692342(7057-7080)Online publication date: 21-Jul-2024
  • (2024)A topic relevance-aware click model for web searchJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23689446:4(8961-8974)Online publication date: 18-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information retrieval
  2. session search
  3. test collection

Qualifiers

  • Short-paper

Funding Sources

  • Natural Science Foundation of China
  • National Key Research and Development Program of China

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)CAGS: Context-Aware Document Ranking With Contrastive Graph SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.349199637:1(89-101)Online publication date: Jan-2025
  • (2024)Identifiability mattersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692342(7057-7080)Online publication date: 21-Jul-2024
  • (2024)A topic relevance-aware click model for web searchJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23689446:4(8961-8974)Online publication date: 18-Apr-2024
  • (2024)Bridge the Gap between Past and Future: Siamese Model Optimization for Context-Aware Document RankingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679661(2564-2574)Online publication date: 21-Oct-2024
  • (2024)Unify Graph Learning with Text: Unleashing LLM Potentials for Session SearchProceedings of the ACM Web Conference 202410.1145/3589334.3645574(1509-1518)Online publication date: 13-May-2024
  • (2024)Query-Oriented Data Augmentation for Session SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341913136:11(6877-6888)Online publication date: 1-Nov-2024
  • (2024)Probabilistic graph model and neural network perspective of click models for web searchKnowledge and Information Systems10.1007/s10115-024-02145-z66:10(5829-5873)Online publication date: 6-Jun-2024
  • (2023)Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender SystemsACM Transactions on Information Systems10.1145/363786942:4(1-32)Online publication date: 15-Dec-2023
  • (2023)Adversarially Trained Environment Models Are Effective Policy Evaluators and Improvers - An Application to Information RetrievalProceedings of the Fifth International Conference on Distributed Artificial Intelligence10.1145/3627676.3627680(1-12)Online publication date: 30-Nov-2023
  • (2023)Towards Sequential Counterfactual Learning to RankProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625325(122-128)Online publication date: 26-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media