skip to main content
10.1145/3357384.3358158acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions

Published:03 November 2019Publication History

ABSTRACT

Web search session data is precious for a wide range of Information Retrieval (IR) tasks, such as session search, query suggestion, click through rate (CTR) prediction and so on. Numerous studies have shown the great potential of considering context information for search system optimization. The well-known TREC Session Tracks have enhanced the development in this domain to a great extent. However, they are mainly collected via user studies or crowdsourcing experiments and normally contain only tens to thousands sessions, which are deficient for the investigation with more sophisticated models. To tackle this obstacle, we present a new dataset that contains 147,155 refined web search sessions with both click-based and human-annotated relevance labels. The sessions are sampled from a huge search log thus can reflect real search scenarios. The proposed dataset can support a wide range of session-level or task-based IR studies. As an example, we test several interactive search models with both the PSCM and human relevance labels provided by this dataset and report the performance as a reference for future studies of session search.

References

  1. David J Brenes and Daniel Gayo-Avello. 2009. Stratified analysis of AOL query log. Information Sciences 179, 12 (2009), 1844--1858.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ben Carterette, Paul Clough, Mark Hall, Evangelos Kanoulas, and Mark Sanderson. 2016. Evaluating retrieval over sessions: The TREC session track 2011--2014. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 685--688.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. the web conference (2009), 1--10.Google ScholarGoogle Scholar
  4. Georges Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. (2008), 331--338.Google ScholarGoogle Scholar
  5. Thorsten Joachims. 1996. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Technical Report. Carnegie-mellon univ pittsburgh pa dept of computer science.Google ScholarGoogle Scholar
  6. Mengyang Liu, Yiqun Liu, Jiaxin Mao, Cheng Luo, and Shaoping Ma. 2018. Towards Designing Better Session Search Evaluation Metrics.. In SIGIR. 1121--1124.Google ScholarGoogle Scholar
  7. Yiqun Liu, Xiaohui Xie, Chao Wang, Jianyun Nie, Min Zhang, and Shaoping Ma. 2017. Time-Aware Click Model. ACM Transactions on Information Systems 35, 3 (2017), 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jiyun Luo, Sicong Zhang, and Hui Yang. 2014. Win-win search: dual-agent stochastic game in session search. (2014), 587--596.Google ScholarGoogle Scholar
  9. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. (2014), 1532--1543.Google ScholarGoogle Scholar
  10. Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jianyun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. (2015), 283--292.Google ScholarGoogle Scholar
  11. Kuansan Wang, Nikolas Gloy, and Xiaolong Li. 2010. Inferring search behaviors using partially observable Markov (POM) model. (2010), 211--220.Google ScholarGoogle Scholar
  12. BinWu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2018. Query Suggestion With Feedback Memory Network. the web conference (2018), 1563--1571.Google ScholarGoogle Scholar
  13. Danqing Xu, Yiqun Liu, Min Zhang, Shaoping Ma, and Liyun Ru. 2012. Incorporating revisiting behaviors into click models. (2012), 303--312.Google ScholarGoogle Scholar
  14. Grace Hui Yang and Ian Soboroff. 2016. TREC 2016 Dynamic Domain Track Overview. In TREC.Google ScholarGoogle Scholar
  15. Sicong Zhang, Dongyi Guan, and Hui Yang. 2013. Query change as relevance feedback in session search. (2013), 821--824.Google ScholarGoogle Scholar
  16. Yukun Zheng, Zhen Fan, Yiqun Liu, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Sogou-QCL: A New Dataset with Click Relevance Label. (2018), 1117--1120.Google ScholarGoogle Scholar

Index Terms

  1. TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
          November 2019
          3373 pages
          ISBN:9781450369763
          DOI:10.1145/3357384

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 November 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          CIKM '19 Paper Acceptance Rate202of1,031submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader