research-article

MIMICS: A Large-Scale Data Collection for Search Clarification

Authors:
Hamed Zamani

Microsoft, Bellevue, WA, USA

Microsoft, Bellevue, WA, USA
View Profile

,
Gord Lueck

Microsoft, Bellevue, WA, USA

Microsoft, Bellevue, WA, USA
View Profile

,
Everest Chen

Facebook Inc., Seattle, WA, USA

Facebook Inc., Seattle, WA, USA
View Profile

,
Rodolfo Quispe

Microsoft, Bellevue, WA, USA

Microsoft, Bellevue, WA, USA
View Profile

,
Flint Luu

Microsoft, Bellevue, WA, USA

Microsoft, Bellevue, WA, USA
View Profile

,
Nick Craswell

Microsoft, Bellevue, WA, USA

Microsoft, Bellevue, WA, USA
View Profile

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementOctober 2020Pages 3189–3196https://doi.org/10.1145/3340531.3412772

Published:19 October 2020Publication History

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 3189–3196

ABSTRACT

Search clarification has recently attracted much attention due to its applications in search engines. It has also been recognized as a major component in conversational information seeking systems. Despite its importance, the research community still feels the lack of a large-scale dataset for studying different aspects of search clarification. In this paper, we introduce MIMICS, a collection of search clarification datasets for real web search queries sampled from the Bing query logs. Each clarification in MIMICS is generated by a Bing production algorithm and consists of a clarifying question and up to five candidate answers. MIMICS contains three datasets: (1) MIMICS-Click includes over 400k unique queries, their associated clarification panes, and the corresponding aggregated user interaction signals (i.e., clicks). (2) MIMICS-ClickExplore is an exploration data that includes aggregated user interaction signals for over 60k unique queries, each with multiple clarification panes. (3) MIMICS-Manual includes over 2k unique real search queries. Each query-clarification pair in this dataset has been manually labeled by at least three trained annotators. It contains graded quality labels for the clarifying question, the candidate answer set, and the landing result page for each candidate answer.

MIMICS is publicly available for research purposes, thus enables researchers to study a number of tasks related to search clarification, including clarification generation and selection, user engagement prediction for clarification, click models for clarification, and analyzing user interactions with search clarification. We also release the results returned by the Bing's web search API for all the queries in MIMICS. This would allow researchers to utilize search results for the tasks related to search clarification.

Supplemental Material

3340531.3412772.mp4

mp4

73.4 MB

Download

References

Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W. Bruce Croft. 2019. Asking Clarifying Questions in Open-Domain Information-Seeking Conversations. In SIGIR '19. 475--484.Google Scholar
James Allan. 2004. HARD Track Overview in TREC 2004: High Accuracy Retrieval from Documents. In TREC '04.Google ScholarCross Ref
Avishek Anand, Lawrence Cavedon, Matthias Hagen, Hideo Joho, Mark Sanderson, and Benno Stein. 2020. Conversational Search - A Report from Dagstuhl Seminar 19461. arXiv preprint arXiv:2005.08658 (2020).Google Scholar
Marco De Boni and Suresh Manandhar. 2003. An Analysis of Clarification Dialogue for Question Answering. In NAACL '03. 48--55.Google Scholar
Pavel Braslavski, Denis Savenkov, Eugene Agichtein, and Alina Dubatovka. 2017. What Do You Mean Exactly?: Analyzing Clarification Questions in CQA. In CHIIR '17. 345--348.Google ScholarDigital Library
D. Carmel and E. Yom-Tov. 2010. Estimating the Query Difficulty for Information Retrieval 1st ed.). Morgan and Claypool Publishers.Google Scholar
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question Answering in Context. In EMNLP '18. 2174--2184.Google Scholar
Anni Coden, Daniel Gruhl, Neal Lewis, and Pablo N. Mendes. 2015. Did you mean A or B? Supporting Clarification Dialog for Entity Disambiguation. In SumPre '15.Google Scholar
Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, and Bodo Billerbeck. 2020. ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search. In CIKM '20.Google Scholar
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-Bias Models. In WSDM '08. 87--94.Google Scholar
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. 2002. Predicting Query Performance. In SIGIR '02. 299--306.Google Scholar
Jeffrey Dalton, Chenyan Xiong, and Jamie Callan. 2019. TREC CAsT 2019: The Conversational Assistance Track Overview. In TREC '19.Google Scholar
Marco De Boni and Suresh Manandhar. 2005. Implementing Clarification Dialogues in Open Domain Question Answering. Nat. Lang. Eng., Vol. 11, 4 (2005).Google Scholar
Helia Hashemi, Hamed Zamani, and W. Bruce Croft. 2020. Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search. In SIGIR '20. 1131--1140.Google Scholar
Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Trans. Inf. Syst. , Vol. 20, 4 (2002), 422--446.Google Scholar
Johannes Kiesel, Arefeh Bahrami, Benno Stein, Avishek Anand, and Matthias Hagen. 2018. Toward Voice Query Clarification. In SIGIR '18. 1257--1260.Google Scholar
Mounia Lalmas, Heather O'Brien, and Elad Yom-Tov. 2014. Measuring User Engagement .Morgan & Claypool Publishers.Google Scholar
Chin-Yew Lin and Eduard Hovy. 2003. Automatic Evaluation of Summaries Using N-Gram Co-Occurrence Statistics. In NAACL '03. 71--78.Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In ACL '02. 311--318.Google Scholar
Luis Quintano and Irene Pimenta Rodrigues. 2008. Question/Answering Clarification Dialogues. In MICAI '08. 155--164.Google Scholar
Filip Radlinski, Krisztian Balog, Bill Byrne, and Karthik Krishnamoorthi. 2019. Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences. In SIGDIAL '19.Google Scholar
Filip Radlinski and Nick Craswell. 2017. A Theoretical Framework for Conversational Search. CHIIR '17. 117--126.Google ScholarDigital Library
Sudha Rao and Hal Daumé III. 2018. Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information. ACL '18. 2737--2746.Google ScholarCross Ref
Sudha Rao and Hal Daumé III. 2019. Answer-based Adversarial Training for Generating Clarification Questions. In NAACL '19. 143--155.Google Scholar
Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A Conversational Question Answering Challenge. TACL, Vol. 7 (2019), 249--266.Google Scholar
Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg. 2014. Towards Natural Clarification Questions in Dialogue Systems. In AISB '14, Vol. 20.Google Scholar
Paul Thomas, Daniel McDuff, Mary Czerwinski, and Nick Craswell. 2017. MISC: A data set of information-seeking conversations. In CAIR '17.Google Scholar
Hamed Zamani and Nick Craswell. 2020. Macaw: An Extensible Conversational Information Seeking Platform. In SIGIR '20. 2193--2196.Google Scholar
Hamed Zamani, Susan T. Dumais, Nick Craswell, Paul N. Bennett, and Gord Lueck. 2020 a. Generating Clarifying Questions for Information Retrieval. In WWW '20. 418--428.Google Scholar
Hamed Zamani, Bhaskar Mitra, Everest Chen, Gord Lueck, Fernando Diaz, Paul N. Bennett, Nick Craswell, and Susan T. Dumais. 2020 b. Analyzing and Learning from User Interactions with Search Clarification. In SIGIR '20. 1181--1190.Google Scholar
Hamed Zamani, Pooya Moradi, and Azadeh Shakery. 2015. Adaptive User Engagement Evaluation via Multi-Task Learning. In SIGIR '15. 1011--1014.Google ScholarDigital Library
Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W. Bruce Croft. 2018. Towards Conversational Search and Recommendation: System Ask, User Respond. In CIKM '18. 177--186.Google Scholar

Index Terms

MIMICS: A Large-Scale Data Collection for Search Clarification
1. Information systems
  1. Information retrieval

Recommendations

Generating Clarifying Questions in Conversational Search Systems
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Asking a clarifying question can be a key element improving the performance of information seeking systems, particularly conversational search systems due to their limited bandwidth interfaces. While generating and asking clarifying questions is ...
Read More
Generating Clarifying Questions with Web Search Results
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Asking clarifying questions is an interactive way to effectively clarify user intent. When a user submits a query, the search engine will return a clarifying question with several clickable items of sub-intents for clarification. According to the ...
Read More
Improving Search Clarification with Structured Information Extracted from Search Results
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Search clarification in conversational search systems exhibits a clarification pane composed of several candidate aspect items and a clarifying question. To generate a pane, existing studies usually rely on unstructured document texts. However, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clarifying question
conversational information seeking
mixed-initiative conversation
search clarification
web search
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 352
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MIMICS: A Large-Scale Data Collection for Search Clarification

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Generating Clarifying Questions in Conversational Search Systems

Generating Clarifying Questions with Web Search Results

Improving Search Clarification with Structured Information Extracted from Search Results