research-article

Revisiting Open Domain Query Facet Extraction and Generation

Authors:
Chris Samarinas

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

,
Arkin Dharawat

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

,
Hamed Zamani

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information RetrievalAugust 2022Pages 43–50https://doi.org/10.1145/3539813.3545138

Published:25 August 2022Publication History

ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval

Pages 43–50

ABSTRACT

Web search queries can often be characterized by various facets. Extracting and generating query facets has various real-world applications, such as displaying facets to users in a search interface, search result diversification, clarifying question generation, and enabling exploratory search. In this work, we revisit the task of query facet extraction and generation and study various formulations of this task, including facet extraction as sequence labeling, facet generation as autoregressive text generation or extreme multi-label classification. We conduct extensive experiments and demonstrate that these approaches lead to complementary sets of facets. We also explored various aggregation approaches based on relevance and diversity to combine the facet sets produced by different formulations of the task. The approaches presented in this paper outperform state-of-the-art baselines in terms of both precision and recall. We confirm the quality of the proposed methods through manual annotation. Since there is no open-source software for facet extraction and generation, we release a toolkit named Faspect, that includes various model implementations for this task.

References

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020). https://arxiv.org/abs/2005.14165Google Scholar
Jaime Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia) (SIGIR '98). Association for Computing Machinery, New York, NY, USA, 335--336. https://doi.org/10.1145/290941.291025Google ScholarDigital Library
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2020. Overview of the TREC 2020 Deep Learning Track. In TREC.Google Scholar
Wisam Dakka and Panagiotis G. Ipeirotis. 2008. Automatic Extraction of Useful Facet Hierarchies from Text Databases. 2008 IEEE 24th International Conference on Data Engineering (2008), 466--475.Google ScholarDigital Library
Romain Deveaud, Eric SanJuan, and Patrice Bellot. 2014. Accurate and effective latent concept modeling for ad hoc information retrieval. Document Numérique 17 (2014), 61--84.Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423Google Scholar
Zhicheng Dou, Sha Hu, Yulong Luo, Ruihua Song, and Ji-RongWen. 2011. Finding dimensions for queries. In CIKM '11.Google ScholarDigital Library
Zhicheng Dou, Zhengbao Jiang, Sha Hu, Ji-Rong Wen, and Ruihua Song. 2016. Automatically Mining Facets for Queries from Their Search Results. IEEE Transactions on Knowledge and Data Engineering 28, 2 (2016), 385--397. https://doi.org/10.1109/TKDE.2015.2475735Google ScholarDigital Library
Kai Hakala and Sampo Pyysalo. 2019. Biomedical Named Entity Recognition with Multilingual BERT. In EMNLP.Google Scholar
Helia Hashemi, Hamed Zamani, and W. Bruce Croft. 2021. Learning Multiple Intent Representations for Search Queries. Proceedings of the 30th ACM International Conference on Information & Knowledge Management (2021).Google ScholarDigital Library
Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Gregory S. Corrado, László Lukács, Marina Ganea, Peter Young, and Vivek Ramavajjala. 2016. Smart Reply: Automated Response Suggestion for Email. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016).Google ScholarDigital Library
Christian Kohlschtter, de, Paul-Alexandru Chirita, and Wolfgang Nejdl. 2006. Prototype Demonstration: Using Link Analysis to Identify Aspects in Faceted Web Search.Google Scholar
Weize Kong and James Allan. 2013. Extracting Query Facets from Search Results. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (Dublin, Ireland) (SIGIR '13). Association for Computing Machinery, New York, NY, USA, 93--102. https://doi.org/10.1145/2484028.2484097Google ScholarDigital Library
Weize Kong and James Allan. 2016. Precision-Oriented Query Facet Extraction. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (2016).Google ScholarDigital Library
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. CoRR abs/1910.13461 (2019). arXiv:1910.13461 http://arxiv.org/abs/1910.13461Google Scholar
Chengkai Li, Ning Yan, Senjuti Basu Roy, Lekhendro Lisham, and Gautam Das. 2010. Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia. In WWW '10.Google ScholarDigital Library
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692 (2019).Google Scholar
Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - Volume 1 (Philadelphia, Pennsylvania) (ETMTNLP '02). Association for Computational Linguistics, USA, 63--70. https://doi.org/10.3115/1118108.1118117Google ScholarDigital Library
Eyal Oren, Renaud Delbru, and S. Decker. 2006. Extending Faceted Navigation for RDF Data. In SEMWEB.Google Scholar
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084Google ScholarCross Ref
Dhruva Sahrawat, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma, Rakesh Gosangi, Amanda Stent, Yaman Kumar, Rajiv Ratn Shah, and Roger Zimmermann. 2020. Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings. In Advances in Information Retrieval, Joemon M. Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro, Mário J. Silva, and Flávio Martins (Eds.). Springer International Publishing, Cham, 328--335.Google Scholar
Emilia Stoica, Marti A. Hearst, and Megan Richardson. 2007. Automating Creation of Hierarchical Faceted Metadata Structures. In NAACL.Google Scholar
Qinglei Wang, Ya nan Qian, Ruihua Song, Zhicheng Dou, Fan Zhang, Tetsuya Sakai, and Qinghua Zheng. 2013. Mining subtopics from text fragments for a web query. Information Retrieval 16 (2013), 484--503.Google ScholarDigital Library
Xiao Wei, Xiangfeng Luo, and Qing Li. 2012. Automatic Facet Extraction Based on Multidimensional Semantic Index. 2012 Eighth International Conference on Semantics, Knowledge and Grids (2012), 64--71.Google Scholar
Xiaobing Xue and W. Bruce Croft. 2013. Modeling Reformulation Using Query Distributions. ACM Trans. Inf. Syst. 31, 2, Article 6 (may 2013), 34 pages. https://doi.org/10.1145/2457465.2457466Google ScholarDigital Library
Wonjin Yoon, Richard Jackson, Jaewoo Kang, and Aron Lagerberg. 2021. Sequence tagging for biomedical extractive question answering. arXiv preprint arXiv:2104.07535 (2021).Google Scholar
Hamed Zamani, Susan Dumais, Nick Craswell, Paul Bennett, and Gord Lueck. 2020. Generating Clarifying Questions for Information Retrieval. Association for Computing Machinery, New York, NY, USA, 418--428. https://doi.org/10.1145/ 3366423.3380126Google ScholarDigital Library
Hamed Zamani, Gord Lueck, Everest Chen, Rodolfo Quispe, Flint Luu, and Nick Craswell. 2020. MIMICS: A Large-Scale Data Collection for Search Clarification. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020).Google ScholarDigital Library
Hamed Zamani, Johanne R. Trippas, Jeff Dalton, and Filip Radlinski. 2022. Conversational Information Seeking. In arxiv.Google Scholar

Index Terms

Revisiting Open Domain Query Facet Extraction and Generation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
2. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query intent
    2. Retrieval tasks and goals
      1. Information extraction

Recommendations

Stochastic Optimization of Text Set Generation for Learning Multiple Query Intent Representations
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Learning multiple intent representations for queries has potential applications in facet generation, document ranking, search result diversification, and search explanation. The state-of-the-art model for this task assumes that there is a sequence of ...
Read More
Automatic Facet Extraction Based on Multidimensional Semantic Index
SKG '12: Proceedings of the 2012 Eighth International Conference on Semantics, Knowledge and Grids

Faceted search on web pages needs exact facets. However, it is difficult to extract facets exactly from web pages because the web pages are unstructured and lack of facet information. Therefore, facet extraction is a key to faceted search. This paper ...
Read More
Building the Multidimensional Semantic Index of Webpages for Facet Extraction

Faceted search is an efficient search method to use the big data and one of its key issues is to extract facets from unstructured webpages automatically. It is still a problem to extract facets from massive unstructured webpages exactly and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval
August 2022
289 pages
ISBN:9781450394123
DOI:10.1145/3539813
Program Chairs:
Fabio Crestani
Università della Svizzera Italiana - USI, Switzerland
,
Gabriella Pasi
Univ. Milano-Bicocca, Italy
,
Eric Gaussier
Univ. Grenoble-Alpes, France
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
facet extraction
facet generation
language models
Qualifiers
- research-article
Conference

Acceptance Rates
ICTIR '22 Paper Acceptance Rate32of80submissions,40%Overall Acceptance Rate209of482submissions,43%
More
Upcoming Conference
ICTIR '24

Sponsor:

sigir

The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval

July 13, 2024

Washington DC , DC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 188
  Total Downloads
- Downloads (Last 12 months)87
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Revisiting Open Domain Query Facet Extraction and Generation

ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stochastic Optimization of Text Set Generation for Learning Multiple Query Intent Representations

Automatic Facet Extraction Based on Multidimensional Semantic Index

Building the Multidimensional Semantic Index of Webpages for Facet Extraction