skip to main content
10.1145/3511808.3557068acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Query Rewriting in TaoBao Search

Published: 17 October 2022 Publication History

Abstract

In e-commerce search engines, query rewriting (QR) is a crucial technique that improves shopping experience by reducing the vocabulary gap between user queries and product catalog. Recent works have mainly adopted the generative paradigm. However, they hardly ensure high-quality generated rewrites and do not consider personalization, which leads to degraded search relevance. In this work, we present Contrastive Learning Enhanced Query Rewriting (CLE-QR), the solution used in Taobao product search. It uses a novel contrastive learning enhanced architecture based on "query retrieval-semantic relevance ranking-online ranking". It finds the rewrites from hundreds of millions of historical queries while considering relevance and personalization. Specifically, we first alleviate the representation degeneration problem during the query retrieval stage by using an unsupervised contrastive loss, and then further propose an interaction-aware matching method to find the beneficial and incremental candidates, thus improving the quality and relevance of candidate queries. We then present a relevance-oriented contrastive pre-training paradigm on the noisy user feedback data to improve semantic ranking performance. Finally, we rank these candidates online with the user profile to model personalization for the retrieval of more relevant products. We evaluate CLE-QR on Taobao Product Search, one of the largest e-commerce platforms in China. Significant metrics gains are observed in online A/B tests. CLE-QR has been deployed to our large-scale commercial retrieval system and serviced hundreds of millions of users since December 2021. We also introduce its online deployment scheme, and share practical lessons and optimization tricks of our lexical match system.

References

[1]
Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th OSDI. 265--283.
[2]
Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. 2008. Simrank query rewriting through link analysis of the clickgraph. In 17th WWW. 1177--1178.
[3]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. ArXiv Preprint ArXiv:1409.0473 (2014).
[4]
Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting good expansion terms for pseudo-relevance feedback. In 31st SIGIR. 243--250.
[5]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In 22nd SIGKDD. 785--794.
[6]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020b. A simple framework for contrastive learning of visual representations. In ICML. 1597--1607.
[7]
Zheng Chen, Xing Fan, and Yuan Ling. 2020a. Pre-training for query rewriting in a spoken language understanding system. In ICASSP. 7969--7973.
[8]
Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. 2002. Probabilistic query expansion using query logs. In 11th WWW. 325--332.
[9]
Janez Demsar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. JMLR, Vol. 7 (2006), 1--30.
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv Preprint ArXiv:1810.04805 (2018).
[11]
Doug Downey, Susan Dumais, and Eric Horvitz. 2007. Heads and tails: studies of web search with common and rare queries. In 30th SIGIR. 847--848.
[12]
Bruno M Fonseca, Paulo Golgher, Bruno Pôssas, Berthier Ribeiro-Neto, and Nivio Ziviani. 2005. Concept-based interactive query expansion. In 14th CIKM. 696--703.
[13]
Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tieyan Liu. 2019. Representation Degeneration Problem in Training Natural Language Generation Models. In ICLR.
[14]
Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. ArXiv Preprint ArXiv:2104.08821 (2021).
[15]
Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, and Narayan Bhamidipati. 2015. Context- and content-aware embeddings for query rewriting in sponsored search. In 38th SIGIR. 383--392.
[16]
Yunlong He, Jiliang Tang, Hua Ouyang, Changsung Kang, Dawei Yin, and Yi Chang. 2016. Learning to rewrite queries. In 25th CIKM. 1443--1452.
[17]
Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020. Embedding-based retrieval in facebook search. In 26th SIGKDD. 2553--2561.
[18]
Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. 2006. Generating query substitutions. In 15th WWW. 387--396.
[19]
Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In 43rd SIGIR. 39--48.
[20]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. ArXiv Preprint ArXiv:1412.6980 (2014).
[21]
Mu-Chu Lee, Bin Gao, and Ruofei Zhang. 2018. Rare query expansion through generative adversarial networks in search advertising. In 24th SIGKDD. 500--508.
[22]
Sen Li, Fuyu Lv, Taiwei Jin, Guli Lin, Keping Yang, Xiaoyi Zeng, Xiao-Ming Wu, and Qianli Ma. 2021. Embedding-Based Product Retrieval in Taobao Search. In 27th SIGKDD. 3181----3189.
[23]
Yijiang Lian, Zhijie Chen, Jinlong Hu, Kefeng Zhang, Chunwei Yan, Muchenxuan Tong, Wenying Han, Hanju Guan, Ying Li, Ying Cao, et al. 2019. An end-to-end Generative Retrieval Method for Sponsored Search Engine--Decoding Efficiently into a Closed Target Domain. ArXiv Preprint ArXiv:1902.00592 (2019).
[24]
Xusheng Luo, Le Bo, Jinhang Wu, Lin Li, Zhiy Luo, Yonghua Yang, and Keping Yang. 2021. AliCoCo2: Commonsense Knowledge Extraction, Representation and Application in E-commerce. In 27th SIGKDD. 3385--3393.
[25]
Saurav Manchanda, Mohit Sharma, and George Karypis. 2019. Intent term weighting in e-commerce queries. In 28th CIKM. 2345--2348.
[26]
Aritra Mandal, Ishita K Khan, and Prathyusha Senthil Kumar. 2019. Query Rewriting using Automatic Synonym Extraction for E-commerce Search. In eCOM@ SIGIR.
[27]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NeurIPS. 3111--3119.
[28]
Akash Kumar Mohankumar, Nikit Begwani, and Amit Singh. 2021. Diversity driven Query Rewriting in Search Advertising. ArXiv Preprint ArXiv:2106.03816 (2021).
[29]
Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019. Semantic product search. In 25th SIGKDD. 2876--2885.
[30]
Matjaz Perc. 2014. The Matthew effect in empirical data. Journal of the Royal Society Interface, Vol. 11, 98 (2014), 20140378--20140378.
[31]
Yiming Qiu, Kang Zhang, Han Zhang, Songlin Wang, Sulong Xu, Yun Xiao, Bo Long, and Wen-Yun Yang. 2021. Query Rewriting via Cycle-Consistent Translation for E-Commerce Search. In 37th ICDE. IEEE, 2435--2446.
[32]
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
[33]
Joseph Rocchio. 1971. Relevance feedback in information retrieval. The Smart Retrieval System-experiments in Automatic Document Processing (1971), 313--323.
[34]
Amit Singhal et al. 2001. Modern information retrieval: A brief overview. IEEE Data Eng. Bull., Vol. 24, 4 (2001), 35--43.
[35]
Zhenqiao Song, Jiaze Chen, Hao Zhou, and Lei Li. 2021. Triangular Bidword Generation for Sponsored Search Auction. In 14th WSDM. 707--715.
[36]
Tao Tao and ChengXiang Zhai. 2006. Regularized estimation of mixture models for robust pseudo-relevance feedback. In 29th SIGIR. 162--169.
[37]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR, Vol. 9, 11 (2008).
[38]
Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML. 9929--9939.
[39]
Wei Wang, Bin Bi, Ming Yan, Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, and Luo Si. 2020. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding. In ICLR.
[40]
Yaxuan Wang, Hanqing Lu, Yunwen Xu, Rahul Goutam, Yiwei Song, and Bing Yin. 2021. QUEEN: Neural Query Rewriting in E-commerce. (2021).
[41]
Jiewen Wu, Ihab Ilyas, and Grant Weddell. 2011. A study of ontology-based query expansion. In Technical Report CS-2011--04.
[42]
Rong Xiao, Jianhui Ji, Baoliang Cui, Haihong Tang, Wenwu Ou, Yanghua Xiao, Jiwei Tan, and Xuan Ju. 2019. Weakly Supervised Co-Training of Query Rewriting and Semantic Matching for e-Commerce. In 12th WSDM. 402--410.
[43]
Jinxi Xu and W Bruce Croft. 2017. Quary expansion using local and global document analysis. In Acm Sigir Forum, Vol. 51. 168--175.
[44]
Xiaoyong Yang, Yadong Zhu, Yi Zhang, Xiaobo Wang, and Quan Yuan. 2020. Large Scale Product Graph Construction for Recommendation in E-commerce. ArXiv Preprint ArXiv:2010.05525 (2020).
[45]
Yatao Yang, Jun Tan, Hongbo Deng, Zibin Zheng, Yutong Lu, and Xiangke Liao. 2019. An Active and Deep Semantic Matching Framework for Query Rewrite in E-Commercial Search Engine. In 28th CIKM. 309--318.
[46]
Shaowei Yao, Jiwei Tan, Xi Chen, Keping Yang, Rong Xiao, Hongbo Deng, and Xiaojun Wan. 2021. Learning a Product Relevance Model from Click-Through Data in E-Commerce. In 30th WWW. 2890--2899.
[47]
Han Zhang, Songlin Wang, Kang Zhang, Zhiling Tang, Yunjiang Jiang, Yun Xiao, Weipeng Yan, and Wen-Yun Yang. 2020. Towards Personalized and Semantic Retrieval: An End-to-End Solution for E-commerce Search via Embedding Learning. In 43rd SIGIR. 2407--2416.

Cited By

View all
  • (2024)Embedding Based Deduplication in E-commerce AutoCompleteProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661373(2955-2959)Online publication date: 10-Jul-2024
  • (2024)COSMO: A Large-Scale E-commerce Common Sense Knowledge Generation and Serving System at AmazonCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653398(148-160)Online publication date: 9-Jun-2024
  • (2024)Large Language Model based Long-tail Query Rewriting in Taobao SearchCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648298(20-28)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
October 2022
5274 pages
ISBN:9781450392365
DOI:10.1145/3511808
  • General Chairs:
  • Mohammad Al Hasan,
  • Li Xiong
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. e-commerce search
  2. lexical match
  3. query rewriting

Qualifiers

  • Research-article

Conference

CIKM '22
Sponsor:

Acceptance Rates

CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)182
  • Downloads (Last 6 weeks)11
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Embedding Based Deduplication in E-commerce AutoCompleteProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661373(2955-2959)Online publication date: 10-Jul-2024
  • (2024)COSMO: A Large-Scale E-commerce Common Sense Knowledge Generation and Serving System at AmazonCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653398(148-160)Online publication date: 9-Jun-2024
  • (2024)Large Language Model based Long-tail Query Rewriting in Taobao SearchCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648298(20-28)Online publication date: 13-May-2024
  • (2024)Data augmented large language models for medical record generationApplied Intelligence10.1007/s10489-024-05934-955:2Online publication date: 6-Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media