ABSTRACT
Web search session data is precious for a wide range of Information Retrieval (IR) tasks, such as session search, query suggestion, click through rate (CTR) prediction and so on. Numerous studies have shown the great potential of considering context information for search system optimization. The well-known TREC Session Tracks have enhanced the development in this domain to a great extent. However, they are mainly collected via user studies or crowdsourcing experiments and normally contain only tens to thousands sessions, which are deficient for the investigation with more sophisticated models. To tackle this obstacle, we present a new dataset that contains 147,155 refined web search sessions with both click-based and human-annotated relevance labels. The sessions are sampled from a huge search log thus can reflect real search scenarios. The proposed dataset can support a wide range of session-level or task-based IR studies. As an example, we test several interactive search models with both the PSCM and human relevance labels provided by this dataset and report the performance as a reference for future studies of session search.
- David J Brenes and Daniel Gayo-Avello. 2009. Stratified analysis of AOL query log. Information Sciences 179, 12 (2009), 1844--1858.Google ScholarDigital Library
- Ben Carterette, Paul Clough, Mark Hall, Evangelos Kanoulas, and Mark Sanderson. 2016. Evaluating retrieval over sessions: The TREC session track 2011--2014. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 685--688.Google ScholarDigital Library
- Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. the web conference (2009), 1--10.Google Scholar
- Georges Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. (2008), 331--338.Google Scholar
- Thorsten Joachims. 1996. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Technical Report. Carnegie-mellon univ pittsburgh pa dept of computer science.Google Scholar
- Mengyang Liu, Yiqun Liu, Jiaxin Mao, Cheng Luo, and Shaoping Ma. 2018. Towards Designing Better Session Search Evaluation Metrics.. In SIGIR. 1121--1124.Google Scholar
- Yiqun Liu, Xiaohui Xie, Chao Wang, Jianyun Nie, Min Zhang, and Shaoping Ma. 2017. Time-Aware Click Model. ACM Transactions on Information Systems 35, 3 (2017), 16.Google ScholarDigital Library
- Jiyun Luo, Sicong Zhang, and Hui Yang. 2014. Win-win search: dual-agent stochastic game in session search. (2014), 587--596.Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. (2014), 1532--1543.Google Scholar
- Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jianyun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. (2015), 283--292.Google Scholar
- Kuansan Wang, Nikolas Gloy, and Xiaolong Li. 2010. Inferring search behaviors using partially observable Markov (POM) model. (2010), 211--220.Google Scholar
- BinWu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2018. Query Suggestion With Feedback Memory Network. the web conference (2018), 1563--1571.Google Scholar
- Danqing Xu, Yiqun Liu, Min Zhang, Shaoping Ma, and Liyun Ru. 2012. Incorporating revisiting behaviors into click models. (2012), 303--312.Google Scholar
- Grace Hui Yang and Ian Soboroff. 2016. TREC 2016 Dynamic Domain Track Overview. In TREC.Google Scholar
- Sicong Zhang, Dongyi Guan, and Hui Yang. 2013. Query change as relevance feedback in session search. (2013), 821--824.Google Scholar
- Yukun Zheng, Zhen Fan, Yiqun Liu, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Sogou-QCL: A New Dataset with Click Relevance Label. (2018), 1117--1120.Google Scholar
Index Terms
- TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions
Recommendations
Query change as relevance feedback in session search
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalSession search is the Information Retrieval (IR) task that performs document retrieval for an entire session. During a session, users often change queries to explore and investigate the information needs. In this paper, we propose to use query change as ...
Emphasizing temporal-based user profile modeling in the context of session search
SAC '17: Proceedings of the Symposium on Applied ComputingIn this paper, we aim at modeling the user profile containing timely relevant information extracted from his interactions with search engines. We considered a time-sensitive user profile that provides relevant and fresh information inferred from his ...
Deriving a test collection for clinical information retrieval from systematic reviews
DTMBIO '10: Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informaticsIn this paper, we describe the construction of a test collection for evaluating clinical information retrieval. The purpose of this test collection is to provide a basis for researchers to experiment with PECO-structured queries. Systematic reviews are ...
Comments