ABSTRACT
Web search queries can often be characterized by various facets. Extracting and generating query facets has various real-world applications, such as displaying facets to users in a search interface, search result diversification, clarifying question generation, and enabling exploratory search. In this work, we revisit the task of query facet extraction and generation and study various formulations of this task, including facet extraction as sequence labeling, facet generation as autoregressive text generation or extreme multi-label classification. We conduct extensive experiments and demonstrate that these approaches lead to complementary sets of facets. We also explored various aggregation approaches based on relevance and diversity to combine the facet sets produced by different formulations of the task. The approaches presented in this paper outperform state-of-the-art baselines in terms of both precision and recall. We confirm the quality of the proposed methods through manual annotation. Since there is no open-source software for facet extraction and generation, we release a toolkit named Faspect, that includes various model implementations for this task.
- Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020). https://arxiv.org/abs/2005.14165Google Scholar
- Jaime Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia) (SIGIR '98). Association for Computing Machinery, New York, NY, USA, 335--336. https://doi.org/10.1145/290941.291025Google ScholarDigital Library
- Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2020. Overview of the TREC 2020 Deep Learning Track. In TREC.Google Scholar
- Wisam Dakka and Panagiotis G. Ipeirotis. 2008. Automatic Extraction of Useful Facet Hierarchies from Text Databases. 2008 IEEE 24th International Conference on Data Engineering (2008), 466--475.Google ScholarDigital Library
- Romain Deveaud, Eric SanJuan, and Patrice Bellot. 2014. Accurate and effective latent concept modeling for ad hoc information retrieval. Document Numérique 17 (2014), 61--84.Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423Google Scholar
- Zhicheng Dou, Sha Hu, Yulong Luo, Ruihua Song, and Ji-RongWen. 2011. Finding dimensions for queries. In CIKM '11.Google ScholarDigital Library
- Zhicheng Dou, Zhengbao Jiang, Sha Hu, Ji-Rong Wen, and Ruihua Song. 2016. Automatically Mining Facets for Queries from Their Search Results. IEEE Transactions on Knowledge and Data Engineering 28, 2 (2016), 385--397. https://doi.org/10.1109/TKDE.2015.2475735Google ScholarDigital Library
- Kai Hakala and Sampo Pyysalo. 2019. Biomedical Named Entity Recognition with Multilingual BERT. In EMNLP.Google Scholar
- Helia Hashemi, Hamed Zamani, and W. Bruce Croft. 2021. Learning Multiple Intent Representations for Search Queries. Proceedings of the 30th ACM International Conference on Information & Knowledge Management (2021).Google ScholarDigital Library
- Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Gregory S. Corrado, László Lukács, Marina Ganea, Peter Young, and Vivek Ramavajjala. 2016. Smart Reply: Automated Response Suggestion for Email. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016).Google ScholarDigital Library
- Christian Kohlschtter, de, Paul-Alexandru Chirita, and Wolfgang Nejdl. 2006. Prototype Demonstration: Using Link Analysis to Identify Aspects in Faceted Web Search.Google Scholar
- Weize Kong and James Allan. 2013. Extracting Query Facets from Search Results. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (Dublin, Ireland) (SIGIR '13). Association for Computing Machinery, New York, NY, USA, 93--102. https://doi.org/10.1145/2484028.2484097Google ScholarDigital Library
- Weize Kong and James Allan. 2016. Precision-Oriented Query Facet Extraction. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (2016).Google ScholarDigital Library
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. CoRR abs/1910.13461 (2019). arXiv:1910.13461 http://arxiv.org/abs/1910.13461Google Scholar
- Chengkai Li, Ning Yan, Senjuti Basu Roy, Lekhendro Lisham, and Gautam Das. 2010. Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia. In WWW '10.Google ScholarDigital Library
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692 (2019).Google Scholar
- Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - Volume 1 (Philadelphia, Pennsylvania) (ETMTNLP '02). Association for Computational Linguistics, USA, 63--70. https://doi.org/10.3115/1118108.1118117Google ScholarDigital Library
- Eyal Oren, Renaud Delbru, and S. Decker. 2006. Extending Faceted Navigation for RDF Data. In SEMWEB.Google Scholar
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084Google ScholarCross Ref
- Dhruva Sahrawat, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma, Rakesh Gosangi, Amanda Stent, Yaman Kumar, Rajiv Ratn Shah, and Roger Zimmermann. 2020. Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings. In Advances in Information Retrieval, Joemon M. Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro, Mário J. Silva, and Flávio Martins (Eds.). Springer International Publishing, Cham, 328--335.Google Scholar
- Emilia Stoica, Marti A. Hearst, and Megan Richardson. 2007. Automating Creation of Hierarchical Faceted Metadata Structures. In NAACL.Google Scholar
- Qinglei Wang, Ya nan Qian, Ruihua Song, Zhicheng Dou, Fan Zhang, Tetsuya Sakai, and Qinghua Zheng. 2013. Mining subtopics from text fragments for a web query. Information Retrieval 16 (2013), 484--503.Google ScholarDigital Library
- Xiao Wei, Xiangfeng Luo, and Qing Li. 2012. Automatic Facet Extraction Based on Multidimensional Semantic Index. 2012 Eighth International Conference on Semantics, Knowledge and Grids (2012), 64--71.Google Scholar
- Xiaobing Xue and W. Bruce Croft. 2013. Modeling Reformulation Using Query Distributions. ACM Trans. Inf. Syst. 31, 2, Article 6 (may 2013), 34 pages. https://doi.org/10.1145/2457465.2457466Google ScholarDigital Library
- Wonjin Yoon, Richard Jackson, Jaewoo Kang, and Aron Lagerberg. 2021. Sequence tagging for biomedical extractive question answering. arXiv preprint arXiv:2104.07535 (2021).Google Scholar
- Hamed Zamani, Susan Dumais, Nick Craswell, Paul Bennett, and Gord Lueck. 2020. Generating Clarifying Questions for Information Retrieval. Association for Computing Machinery, New York, NY, USA, 418--428. https://doi.org/10.1145/ 3366423.3380126Google ScholarDigital Library
- Hamed Zamani, Gord Lueck, Everest Chen, Rodolfo Quispe, Flint Luu, and Nick Craswell. 2020. MIMICS: A Large-Scale Data Collection for Search Clarification. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020).Google ScholarDigital Library
- Hamed Zamani, Johanne R. Trippas, Jeff Dalton, and Filip Radlinski. 2022. Conversational Information Seeking. In arxiv.Google Scholar
Index Terms
- Revisiting Open Domain Query Facet Extraction and Generation
Recommendations
Stochastic Optimization of Text Set Generation for Learning Multiple Query Intent Representations
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementLearning multiple intent representations for queries has potential applications in facet generation, document ranking, search result diversification, and search explanation. The state-of-the-art model for this task assumes that there is a sequence of ...
Automatic Facet Extraction Based on Multidimensional Semantic Index
SKG '12: Proceedings of the 2012 Eighth International Conference on Semantics, Knowledge and GridsFaceted search on web pages needs exact facets. However, it is difficult to extract facets exactly from web pages because the web pages are unstructured and lack of facet information. Therefore, facet extraction is a key to faceted search. This paper ...
Building the Multidimensional Semantic Index of Webpages for Facet Extraction
Faceted search is an efficient search method to use the big data and one of its key issues is to extract facets from unstructured webpages automatically. It is still a problem to extract facets from massive unstructured webpages exactly and ...
Comments