ABSTRACT
Search clarification has recently attracted much attention due to its applications in search engines. It has also been recognized as a major component in conversational information seeking systems. Despite its importance, the research community still feels the lack of a large-scale dataset for studying different aspects of search clarification. In this paper, we introduce MIMICS, a collection of search clarification datasets for real web search queries sampled from the Bing query logs. Each clarification in MIMICS is generated by a Bing production algorithm and consists of a clarifying question and up to five candidate answers. MIMICS contains three datasets: (1) MIMICS-Click includes over 400k unique queries, their associated clarification panes, and the corresponding aggregated user interaction signals (i.e., clicks). (2) MIMICS-ClickExplore is an exploration data that includes aggregated user interaction signals for over 60k unique queries, each with multiple clarification panes. (3) MIMICS-Manual includes over 2k unique real search queries. Each query-clarification pair in this dataset has been manually labeled by at least three trained annotators. It contains graded quality labels for the clarifying question, the candidate answer set, and the landing result page for each candidate answer.
MIMICS is publicly available for research purposes, thus enables researchers to study a number of tasks related to search clarification, including clarification generation and selection, user engagement prediction for clarification, click models for clarification, and analyzing user interactions with search clarification. We also release the results returned by the Bing's web search API for all the queries in MIMICS. This would allow researchers to utilize search results for the tasks related to search clarification.
Supplemental Material
- Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W. Bruce Croft. 2019. Asking Clarifying Questions in Open-Domain Information-Seeking Conversations. In SIGIR '19. 475--484.Google Scholar
- James Allan. 2004. HARD Track Overview in TREC 2004: High Accuracy Retrieval from Documents. In TREC '04.Google ScholarCross Ref
- Avishek Anand, Lawrence Cavedon, Matthias Hagen, Hideo Joho, Mark Sanderson, and Benno Stein. 2020. Conversational Search - A Report from Dagstuhl Seminar 19461. arXiv preprint arXiv:2005.08658 (2020).Google Scholar
- Marco De Boni and Suresh Manandhar. 2003. An Analysis of Clarification Dialogue for Question Answering. In NAACL '03. 48--55.Google Scholar
- Pavel Braslavski, Denis Savenkov, Eugene Agichtein, and Alina Dubatovka. 2017. What Do You Mean Exactly?: Analyzing Clarification Questions in CQA. In CHIIR '17. 345--348.Google ScholarDigital Library
- D. Carmel and E. Yom-Tov. 2010. Estimating the Query Difficulty for Information Retrieval 1st ed.). Morgan and Claypool Publishers.Google Scholar
- Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question Answering in Context. In EMNLP '18. 2174--2184.Google Scholar
- Anni Coden, Daniel Gruhl, Neal Lewis, and Pablo N. Mendes. 2015. Did you mean A or B? Supporting Clarification Dialog for Entity Disambiguation. In SumPre '15.Google Scholar
- Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, and Bodo Billerbeck. 2020. ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search. In CIKM '20.Google Scholar
- Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-Bias Models. In WSDM '08. 87--94.Google Scholar
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. 2002. Predicting Query Performance. In SIGIR '02. 299--306.Google Scholar
- Jeffrey Dalton, Chenyan Xiong, and Jamie Callan. 2019. TREC CAsT 2019: The Conversational Assistance Track Overview. In TREC '19.Google Scholar
- Marco De Boni and Suresh Manandhar. 2005. Implementing Clarification Dialogues in Open Domain Question Answering. Nat. Lang. Eng., Vol. 11, 4 (2005).Google Scholar
- Helia Hashemi, Hamed Zamani, and W. Bruce Croft. 2020. Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search. In SIGIR '20. 1131--1140.Google Scholar
- Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Trans. Inf. Syst. , Vol. 20, 4 (2002), 422--446.Google Scholar
- Johannes Kiesel, Arefeh Bahrami, Benno Stein, Avishek Anand, and Matthias Hagen. 2018. Toward Voice Query Clarification. In SIGIR '18. 1257--1260.Google Scholar
- Mounia Lalmas, Heather O'Brien, and Elad Yom-Tov. 2014. Measuring User Engagement .Morgan & Claypool Publishers.Google Scholar
- Chin-Yew Lin and Eduard Hovy. 2003. Automatic Evaluation of Summaries Using N-Gram Co-Occurrence Statistics. In NAACL '03. 71--78.Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In ACL '02. 311--318.Google Scholar
- Luis Quintano and Irene Pimenta Rodrigues. 2008. Question/Answering Clarification Dialogues. In MICAI '08. 155--164.Google Scholar
- Filip Radlinski, Krisztian Balog, Bill Byrne, and Karthik Krishnamoorthi. 2019. Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences. In SIGDIAL '19.Google Scholar
- Filip Radlinski and Nick Craswell. 2017. A Theoretical Framework for Conversational Search. CHIIR '17. 117--126.Google ScholarDigital Library
- Sudha Rao and Hal Daumé III. 2018. Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information. ACL '18. 2737--2746.Google ScholarCross Ref
- Sudha Rao and Hal Daumé III. 2019. Answer-based Adversarial Training for Generating Clarification Questions. In NAACL '19. 143--155.Google Scholar
- Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A Conversational Question Answering Challenge. TACL, Vol. 7 (2019), 249--266.Google Scholar
- Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg. 2014. Towards Natural Clarification Questions in Dialogue Systems. In AISB '14, Vol. 20.Google Scholar
- Paul Thomas, Daniel McDuff, Mary Czerwinski, and Nick Craswell. 2017. MISC: A data set of information-seeking conversations. In CAIR '17.Google Scholar
- Hamed Zamani and Nick Craswell. 2020. Macaw: An Extensible Conversational Information Seeking Platform. In SIGIR '20. 2193--2196.Google Scholar
- Hamed Zamani, Susan T. Dumais, Nick Craswell, Paul N. Bennett, and Gord Lueck. 2020 a. Generating Clarifying Questions for Information Retrieval. In WWW '20. 418--428.Google Scholar
- Hamed Zamani, Bhaskar Mitra, Everest Chen, Gord Lueck, Fernando Diaz, Paul N. Bennett, Nick Craswell, and Susan T. Dumais. 2020 b. Analyzing and Learning from User Interactions with Search Clarification. In SIGIR '20. 1181--1190.Google Scholar
- Hamed Zamani, Pooya Moradi, and Azadeh Shakery. 2015. Adaptive User Engagement Evaluation via Multi-Task Learning. In SIGIR '15. 1011--1014.Google ScholarDigital Library
- Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W. Bruce Croft. 2018. Towards Conversational Search and Recommendation: System Ask, User Respond. In CIKM '18. 177--186.Google Scholar
Index Terms
- MIMICS: A Large-Scale Data Collection for Search Clarification
Recommendations
Generating Clarifying Questions in Conversational Search Systems
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementAsking a clarifying question can be a key element improving the performance of information seeking systems, particularly conversational search systems due to their limited bandwidth interfaces. While generating and asking clarifying questions is ...
Generating Clarifying Questions with Web Search Results
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalAsking clarifying questions is an interactive way to effectively clarify user intent. When a user submits a query, the search engine will return a clarifying question with several clickable items of sub-intents for clarification. According to the ...
Improving Search Clarification with Structured Information Extracted from Search Results
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningSearch clarification in conversational search systems exhibits a clarification pane composed of several candidate aspect items and a clarifying question. To generate a pane, existing studies usually rely on unstructured document texts. However, ...
Comments