research-article

Query Rewriting in TaoBao Search

Authors:

Qianli MaAuthors Info & Claims

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 3262 - 3271

https://doi.org/10.1145/3511808.3557068

Published: 17 October 2022 Publication History

Abstract

In e-commerce search engines, query rewriting (QR) is a crucial technique that improves shopping experience by reducing the vocabulary gap between user queries and product catalog. Recent works have mainly adopted the generative paradigm. However, they hardly ensure high-quality generated rewrites and do not consider personalization, which leads to degraded search relevance. In this work, we present Contrastive Learning Enhanced Query Rewriting (CLE-QR), the solution used in Taobao product search. It uses a novel contrastive learning enhanced architecture based on "query retrieval-semantic relevance ranking-online ranking". It finds the rewrites from hundreds of millions of historical queries while considering relevance and personalization. Specifically, we first alleviate the representation degeneration problem during the query retrieval stage by using an unsupervised contrastive loss, and then further propose an interaction-aware matching method to find the beneficial and incremental candidates, thus improving the quality and relevance of candidate queries. We then present a relevance-oriented contrastive pre-training paradigm on the noisy user feedback data to improve semantic ranking performance. Finally, we rank these candidates online with the user profile to model personalization for the retrieval of more relevant products. We evaluate CLE-QR on Taobao Product Search, one of the largest e-commerce platforms in China. Significant metrics gains are observed in online A/B tests. CLE-QR has been deployed to our large-scale commercial retrieval system and serviced hundreds of millions of users since December 2021. We also introduce its online deployment scheme, and share practical lessons and optimization tricks of our lexical match system.

References

[1]

Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th OSDI. 265--283.

[2]

Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. 2008. Simrank query rewriting through link analysis of the clickgraph. In 17th WWW. 1177--1178.

[3]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. ArXiv Preprint ArXiv:1409.0473 (2014).

[4]

Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting good expansion terms for pseudo-relevance feedback. In 31st SIGIR. 243--250.

[5]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In 22nd SIGKDD. 785--794.

Digital Library

[6]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020b. A simple framework for contrastive learning of visual representations. In ICML. 1597--1607.

[7]

Zheng Chen, Xing Fan, and Yuan Ling. 2020a. Pre-training for query rewriting in a spoken language understanding system. In ICASSP. 7969--7973.

[8]

Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. 2002. Probabilistic query expansion using query logs. In 11th WWW. 325--332.

[9]

Janez Demsar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. JMLR, Vol. 7 (2006), 1--30.

Digital Library

[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv Preprint ArXiv:1810.04805 (2018).

[11]

Doug Downey, Susan Dumais, and Eric Horvitz. 2007. Heads and tails: studies of web search with common and rare queries. In 30th SIGIR. 847--848.

[12]

Bruno M Fonseca, Paulo Golgher, Bruno Pôssas, Berthier Ribeiro-Neto, and Nivio Ziviani. 2005. Concept-based interactive query expansion. In 14th CIKM. 696--703.

[13]

Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tieyan Liu. 2019. Representation Degeneration Problem in Training Natural Language Generation Models. In ICLR.

[14]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. ArXiv Preprint ArXiv:2104.08821 (2021).

[15]

Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, and Narayan Bhamidipati. 2015. Context- and content-aware embeddings for query rewriting in sponsored search. In 38th SIGIR. 383--392.

[16]

Yunlong He, Jiliang Tang, Hua Ouyang, Changsung Kang, Dawei Yin, and Yi Chang. 2016. Learning to rewrite queries. In 25th CIKM. 1443--1452.

[17]

Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020. Embedding-based retrieval in facebook search. In 26th SIGKDD. 2553--2561.

[18]

Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. 2006. Generating query substitutions. In 15th WWW. 387--396.

[19]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In 43rd SIGIR. 39--48.

[20]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. ArXiv Preprint ArXiv:1412.6980 (2014).

[21]

Mu-Chu Lee, Bin Gao, and Ruofei Zhang. 2018. Rare query expansion through generative adversarial networks in search advertising. In 24th SIGKDD. 500--508.

[22]

Sen Li, Fuyu Lv, Taiwei Jin, Guli Lin, Keping Yang, Xiaoyi Zeng, Xiao-Ming Wu, and Qianli Ma. 2021. Embedding-Based Product Retrieval in Taobao Search. In 27th SIGKDD. 3181----3189.

[23]

Yijiang Lian, Zhijie Chen, Jinlong Hu, Kefeng Zhang, Chunwei Yan, Muchenxuan Tong, Wenying Han, Hanju Guan, Ying Li, Ying Cao, et al. 2019. An end-to-end Generative Retrieval Method for Sponsored Search Engine--Decoding Efficiently into a Closed Target Domain. ArXiv Preprint ArXiv:1902.00592 (2019).

[24]

Xusheng Luo, Le Bo, Jinhang Wu, Lin Li, Zhiy Luo, Yonghua Yang, and Keping Yang. 2021. AliCoCo2: Commonsense Knowledge Extraction, Representation and Application in E-commerce. In 27th SIGKDD. 3385--3393.

[25]

Saurav Manchanda, Mohit Sharma, and George Karypis. 2019. Intent term weighting in e-commerce queries. In 28th CIKM. 2345--2348.

[26]

Aritra Mandal, Ishita K Khan, and Prathyusha Senthil Kumar. 2019. Query Rewriting using Automatic Synonym Extraction for E-commerce Search. In eCOM@ SIGIR.

[27]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NeurIPS. 3111--3119.

[28]

Akash Kumar Mohankumar, Nikit Begwani, and Amit Singh. 2021. Diversity driven Query Rewriting in Search Advertising. ArXiv Preprint ArXiv:2106.03816 (2021).

[29]

Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019. Semantic product search. In 25th SIGKDD. 2876--2885.

[30]

Matjaz Perc. 2014. The Matthew effect in empirical data. Journal of the Royal Society Interface, Vol. 11, 98 (2014), 20140378--20140378.

[31]

Yiming Qiu, Kang Zhang, Han Zhang, Songlin Wang, Sulong Xu, Yun Xiao, Bo Long, and Wen-Yun Yang. 2021. Query Rewriting via Cycle-Consistent Translation for E-Commerce Search. In 37th ICDE. IEEE, 2435--2446.

[32]

Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).

[33]

Joseph Rocchio. 1971. Relevance feedback in information retrieval. The Smart Retrieval System-experiments in Automatic Document Processing (1971), 313--323.

[34]

Amit Singhal et al. 2001. Modern information retrieval: A brief overview. IEEE Data Eng. Bull., Vol. 24, 4 (2001), 35--43.

[35]

Zhenqiao Song, Jiaze Chen, Hao Zhou, and Lei Li. 2021. Triangular Bidword Generation for Sponsored Search Auction. In 14th WSDM. 707--715.

[36]

Tao Tao and ChengXiang Zhai. 2006. Regularized estimation of mixture models for robust pseudo-relevance feedback. In 29th SIGIR. 162--169.

[37]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR, Vol. 9, 11 (2008).

[38]

Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML. 9929--9939.

[39]

Wei Wang, Bin Bi, Ming Yan, Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, and Luo Si. 2020. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding. In ICLR.

[40]

Yaxuan Wang, Hanqing Lu, Yunwen Xu, Rahul Goutam, Yiwei Song, and Bing Yin. 2021. QUEEN: Neural Query Rewriting in E-commerce. (2021).

[41]

Jiewen Wu, Ihab Ilyas, and Grant Weddell. 2011. A study of ontology-based query expansion. In Technical Report CS-2011--04.

[42]

Rong Xiao, Jianhui Ji, Baoliang Cui, Haihong Tang, Wenwu Ou, Yanghua Xiao, Jiwei Tan, and Xuan Ju. 2019. Weakly Supervised Co-Training of Query Rewriting and Semantic Matching for e-Commerce. In 12th WSDM. 402--410.

[43]

Jinxi Xu and W Bruce Croft. 2017. Quary expansion using local and global document analysis. In Acm Sigir Forum, Vol. 51. 168--175.

Digital Library

[44]

Xiaoyong Yang, Yadong Zhu, Yi Zhang, Xiaobo Wang, and Quan Yuan. 2020. Large Scale Product Graph Construction for Recommendation in E-commerce. ArXiv Preprint ArXiv:2010.05525 (2020).

[45]

Yatao Yang, Jun Tan, Hongbo Deng, Zibin Zheng, Yutong Lu, and Xiangke Liao. 2019. An Active and Deep Semantic Matching Framework for Query Rewrite in E-Commercial Search Engine. In 28th CIKM. 309--318.

[46]

Shaowei Yao, Jiwei Tan, Xi Chen, Keping Yang, Rong Xiao, Hongbo Deng, and Xiaojun Wan. 2021. Learning a Product Relevance Model from Click-Through Data in E-Commerce. In 30th WWW. 2890--2899.

[47]

Han Zhang, Songlin Wang, Kang Zhang, Zhiling Tang, Yunjiang Jiang, Yun Xiao, Weipeng Yan, and Wen-Yun Yang. 2020. Towards Personalized and Semantic Retrieval: An End-to-End Solution for E-commerce Search via Embedding Learning. In 43rd SIGIR. 2407--2416.

Cited By

Zhai SChen YLi YHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Embedding Based Deduplication in E-commerce AutoCompleteProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661373(2955-2959)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3661373
Yu CLiu XMaia JLi YCao TGao YSong YGoutam RZhang HYin BLi ZBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)COSMO: A Large-Scale E-commerce Common Sense Knowledge Generation and Serving System at AmazonCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653398(148-160)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653398
Peng WLi GJiang YWang ZOu DZeng XXu DXu TChen EChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Large Language Model based Long-tail Query Rewriting in Taobao SearchCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648298(20-28)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648298
Show More Cited By

Index Terms

Query Rewriting in TaoBao Search
1. Applied computing
  1. Electronic commerce
    1. Online shopping
2. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query reformulation

Recommendations

Query rewriting using active learning for sponsored search
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Sponsored search is a major revenue source for search companies. Web searchers can issue any queries, while advertisement keywords are limited. Query rewriting technique effectively matches user queries with relevant advertisement keywords, thus ...
Diversity driven Query Rewriting in Search Advertising
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Retrieving keywords (bidwords) with the same intent as query, referred to as close variant keywords, is of prime importance for effective targeted search advertising. For head and torso search queries, sponsored search engines use a huge repository of ...
Query Rewriting for Voice Shopping Null Queries
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Voice shopping using natural language introduces new challenges related to customer queries, like handling mispronounced, misexpressed, and misunderstood queries. Voice null queries, which result in no offers, have negative impact on customers shopping ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

October 2022

5274 pages

ISBN:9781450392365

DOI:10.1145/3511808

General Chairs:
Mohammad Al Hasan
Indiana University Purdue University, Indianapolis, USA
,
Li Xiong
Emory University, Atlanta, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '22

Sponsor:

CIKM '22: The 31st ACM International Conference on Information and Knowledge Management

October 17 - 21, 2022

GA, Atlanta, USA

Acceptance Rates

CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
772
Total Downloads

Downloads (Last 12 months)182
Downloads (Last 6 weeks)11

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhai SChen YLi YHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Embedding Based Deduplication in E-commerce AutoCompleteProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661373(2955-2959)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3661373
Yu CLiu XMaia JLi YCao TGao YSong YGoutam RZhang HYin BLi ZBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)COSMO: A Large-Scale E-commerce Common Sense Knowledge Generation and Serving System at AmazonCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653398(148-160)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3653398
Peng WLi GJiang YWang ZOu DZeng XXu DXu TChen EChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Large Language Model based Long-tail Query Rewriting in Taobao SearchCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648298(20-28)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648298
Zhang XZhao GRen YWang WCai WZhao YZhang XLiu J(2024)Data augmented large language models for medical record generationApplied Intelligence10.1007/s10489-024-05934-955:2Online publication date: 6-Dec-2024
https://doi.org/10.1007/s10489-024-05934-9

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten