ABSTRACT
Search systems in online content platforms are typically biased toward a minority of highly consumed items, reflecting the most common user behavior of navigating toward content that is already familiar and popular. Query suggestions are a powerful tool to support query formulation and to encourage exploratory search and content discovery. However, classic approaches for query suggestions typically rely either on semantic similarity, which lacks diversity and does not reflect user searching behavior, or on a collaborative similarity measure mined from search logs, which suffers from data sparsity and is biased by highly popular queries. In this work, we argue that the task of query suggestion can be modelled as a link prediction task on a heterogeneous graph including queries and documents, enabling Graph Learning methods to effectively generate query suggestions encompassing both semantic and collaborative information. We perform an offline evaluation on an internal Spotify dataset of search logs and on two public datasets, showing that node2vec leads to an accurate and diversified set of results, especially on the large scale real-world data. We then describe the implementation in an instant search scenario and discuss a set of additional challenges tied to the specific production environment. Finally, we report the results of a large scale A/B test involving millions of users and prove that node2vec query suggestions lead to an increase in online metrics such as coverage (+1.42% shown search results pages with suggestions) and engagement (+1.21% clicks), with a specifically notable boost in the number of clicks on exploratory search queries (+9.37%).
- Kumaripaba Athukorala, Antti Oulasvirta, Dorota Głowacka, Jilles Vreeken, and Giulio Jacucci. 2014. Narrow or broad? Estimating subjective specificity in exploratory search. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 819--828.Google Scholar
- Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2005. Query recommendation using query logs in search engines. In Current Trends in Database Technology-EDBT 2004 Workshops: EDBT 2004 Workshops PhD, DataX, PIM, P2P&DB, and ClustWeb, Heraklion, Crete, Greece, March 14--18, 2004. Revised Selected Papers 9. Springer, 588--596.Google Scholar
- Ricardo Baeza-Yates and Alessandro Tiberi. 2007. Extracting semantic relations from query logs. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 76--85.Google ScholarDigital Library
- Ranieri Baraglia, Fidel Cacheda, Victor Carneiro, Diego Fernandez, Vreixo Formoso, Raffaele Perego, and Fabrizio Silvestri. 2009. Search shortcuts: a new approach to the recommendation of queries. In Proceedings of the third ACM Conference on Recommender Systems. 77--84.Google ScholarDigital Library
- Doug Beeferman and Adam Berger. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 407--416.Google ScholarDigital Library
- Sumit Bhatia, Debapriyo Majumdar, and Prasenjit Mitra. 2011. Query suggestions in the absence of query logs. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 795--804.Google ScholarDigital Library
- Paolo Boldi, Francesco Bonchi, Carlos Castillo, Debora Donato, and Sebastiano Vigna. 2009. Query suggestions using query-flow graphs. In Proceedings of the 2009 workshop on Web Search Click Data. 56--63.Google ScholarDigital Library
- Nick Craswell and Martin Szummer. 2007. Random walks on the click graph. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 239--246.Google ScholarDigital Library
- Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. 39--46.Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Wai-Tat Fu, Thomas G Kannampallil, and Ruogu Kang. 2010. Facilitating exploratory search by model-based navigational cues. In Proceedings of the 15th international conference on Intelligent user interfaces. 199--208.Google ScholarDigital Library
- Palash Goyal and Emilio Ferrara. 2018. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, Vol. 151 (2018), 78--94.Google ScholarCross Ref
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.Google ScholarDigital Library
- Chien-Kang Huang, Lee-Feng Chien, and Yen-Jen Oyang. 2003. Relevant term suggestion in interactive web search based on contextual information in query session logs. Journal of the American Society for Information Science and Technology, Vol. 54, 7 (2003), 638--649.Google ScholarDigital Library
- Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web. 271--279.Google ScholarDigital Library
- Jyun-Yu Jiang and Wei Wang. 2018. RIN: Reformulation inference network for context-aware query suggestion. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 197--206.Google ScholarDigital Library
- Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. 2006. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web. 387--396.Google ScholarDigital Library
- Ang Li, Jennifer Thom, Praveen Chandar, Christine Hosey, Brian St Thomas, and Jean Garcia-Gathright. 2019. Search mindsets: Understanding focused and non-focused information seeking in music search. In The World Wide Web Conference. 2971--2977.Google ScholarDigital Library
- Xinyi Liu, Wanxian Guan, Lianyun Li, Hui Li, Chen Lin, Xubin Li, Si Chen, Jian Xu, Hongbo Deng, and Bo Zheng. 2022. Pretraining Representations of Multi-modal Multi-query E-commerce Search. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3429--3437.Google ScholarDigital Library
- Hehuan Ma, Yu Rong, and Junzhou Huang. 2022. Graph Neural Networks: Scalability. In Graph Neural Networks: Foundations, Frontiers, and Applications, Lingfei Wu, Peng Cui, Jian Pei, and Liang Zhao (Eds.). Springer Singapore, Singapore, 99--119.Google Scholar
- Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, Vol. 42, 4 (2018), 824--836.Google Scholar
- Christopher D Manning. 2008. Introduction to information retrieval. Syngress Publishing,.Google Scholar
- Gary Marchionini. 2006. Exploratory search: from finding to understanding. Commun. ACM, Vol. 49, 4 (2006), 41--46.Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, Vol. 26 (2013).Google ScholarDigital Library
- Agnès Mustar, Sylvain Lamprier, and Benjamin Piwowarski. 2021. On the study of transformers for query suggestion. ACM Transactions on Information Systems (TOIS), Vol. 40, 1 (2021), 1--27.Google ScholarDigital Library
- Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In CoCo@ NIPs.Google Scholar
- Emilie Palagi, Fabien Gandon, Alain Giboin, and Raphaël Troncy. 2017. A survey of definitions and models of exploratory search. In Proceedings of the 2017 ACM workshop on exploratory search and interactive data analytics. 3--8.Google ScholarDigital Library
- Enrico Palumbo, Andrea Mezzalira, Cristina Sánchez-Marco, Alessandro Manzotti, and Daniele Amberti. 2020a. Semantic Diversity for Natural Language Understanding Evaluation in Dialog Systems. In Proceedings of the 28th International Conference on Computational Linguistics: Industry Track. 44--49.Google Scholar
- Enrico Palumbo, Diego Monti, Giuseppe Rizzo, Raphaël Troncy, and Elena Baralis. 2020b. entity2rec: Property-specific knowledge graph embeddings for item recommendation. Expert Systems with Applications, Vol. 151 (2020), 113235.Google ScholarCross Ref
- Enrico Palumbo, Giuseppe Rizzo, Raphaël Troncy, Elena Baralis, Michele Osella, and Enrico Ferro. 2018. Knowledge graph embeddings with node2vec for item recommendation. In European semantic web conference. Springer, 117--120.Google Scholar
- Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In Proceedings of the 1st international conference on Scalable information systems. 1--es.Google ScholarDigital Library
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710.Google ScholarDigital Library
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).Google Scholar
- Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2012. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012).Google Scholar
- Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google Scholar
- Harald Steck. 2013. Evaluation of recommendations: rating-prediction and ranking. In Proceedings of the 7th ACM conference on Recommender systems. 213--220.Google ScholarDigital Library
- Federico Tomasi, Rishabh Mehrotra, Aasish Pappu, Judith Bütepage, Brian Brost, Hugo Galv ao, and Mounia Lalmas. 2020. Query Understanding for Surfacing Under-served Music Content. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2765--2772.Google ScholarDigital Library
- Ryen W White and Resa A Roth. 2009. Exploratory search: Beyond the query-response paradigm. Synthesis lectures on information concepts, retrieval, and services, Vol. 1, 1 (2009), 1--98.Google ScholarDigital Library
- Bin Wu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2018. Query suggestion with feedback memory network. In Proceedings of the 2018 World Wide Web Conference. 1563--1571.Google ScholarDigital Library
- Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, Vol. 32, 1 (2020), 4--24.Google ScholarCross Ref
Recommendations
Query Suggestions as Summarization in Exploratory Search
CHIIR '21: Proceedings of the 2021 Conference on Human Information Interaction and RetrievalQuery suggestions have been shown to benefit users performing information retrieval tasks. In exploratory search, however, users may lack the necessary domain knowledge to assess the relevance of query suggestions with respect to their information ...
Dynamic Personalized Ranking of Facets for Exploratory Search
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalFaceted Search Systems (FSS) have gained prominence in research as one of the exploratory search approaches that support complex search tasks. They provide facets to educate users about the information space and allow them to refine their search query ...
Organizing query completions for web search
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementAll state-of-the-art web search engines implement an auto-completion mechanism - an assistive technology enabling users to effectively formulate their search queries by predicting the next characters or words that they are likely to type. Query ...
Comments