extended-abstract

Embedding based retrieval for long tail search queries in ecommerce

Authors:

Arun UdayashankarAuthors Info & Claims

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

Pages 771 - 774

https://doi.org/10.1145/3640457.3688039

Published: 08 October 2024 Publication History

Abstract

In this abstract we present a series of optimizations we performed on the two-tower model architecture [14], training and evaluation datasets to implement semantic product search at Best Buy. Search queries on bestbuy.com follow the pareto distribution whereby a minority of them account for most searches. This leaves us with a long tail of search queries that have low frequency of issuance. The queries in the long tail suffer from very spare interaction signals. Our current work focuses on building a model to serve the long tail queries. We present a series of optimizations we have done to this model to maximize conversion for the purpose of retrieval from the catalog.

The first optimization we present is using a large language model to improve the sparsity of conversion signals. The second optimization is pretraining an off-the-shelf transformer-based model on the Best Buy catalog data. The third optimization we present is on the finetuning front. We use query-to-query pairs in addition to query-to-product pairs and combining the above strategies for finetuning the model. We also demonstrate how merging the weights of these finetuned models improves the evaluation metrics. Finally, we provide a recipe for curating an evaluation dataset for continuous monitoring of model performance with human-in-the-loop evaluation. We found that adding this recall mechanism to our current term match-based recall improved conversion by 3% in an online A/B test.

References

[1]

D. K. Harman, “The TREC ad hoc experiments,” in TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing), pp. 79–98, MIT Press, 2005.

[2]

B. Carterette and R. Jones, “Evaluating search engines by modeling the relationship between relevance and clicks,” Advances in Neural Information Processing Systems, vol. 20, pp. 217–224, 2008.

[3]

Zhang, Han, "Towards personalized and semantic retrieval: An end-to-end solution for e-commerce search via embedding learning." Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020.

[4]

Li, Sen, "Embedding-based product retrieval in taobao search." Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021.

[5]

Liu, Yiding, "Pre-trained language model for web-scale retrieval in baidu search." Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021.

[6]

Alessandro Magnani, Feng Liu, Suthee Chaidaroon, Sachin Yadav, Praveen Reddy Suram, Ajit Puthenputhussery, Sijie Chen, Min Xie, Anirudh Kashi, Tony Lee, and Ciya Liao. 2022. Semantic Retrieval at Walmart. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '22). Association for Computing Machinery, New York, NY, USA, 3495–3503. https://doi.org/10.1145/3534678.3539164

Digital Library

[7]

Nigam, Priyanka, "Semantic product search." Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019.

[8]

Kocián, Matěj, "Siamese bert-based model for web search relevance ranking evaluated on a new czech dataset." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 11. 2022.

[9]

Yiqun Liu, Kaushik Rangadurai, Yunzhong He, Siddarth Malreddy, Xunlong Gui, Xiaoyi Liu, and Fedor Borisyuk. 2021. Que2Search: Fast and Accurate Query and Document Understanding for Search at Facebook. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21). Association for Computing Machinery, New York, NY, USA, 3376–3384. https://doi.org/10.1145/3447548.3467127

Digital Library

[10]

Wortsman, Mitchell, "Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time." International conference on machine learning. PMLR, 2022.

[11]

Liu, Zheng, "Towards Generalizable Semantic Product Search by Text Similarity Pre-training on Search Click Logs." arXiv preprint arXiv:2204.05231 (2022).

[12]

Liu, Yinhan, "Roberta: A robustly optimized bert pretraining approach." arXiv preprint arXiv:1907.11692 (2019).

[13]

Reimers, Nils, and Iryna Gurevych. "Sentence-bert: Sentence embeddings using siamese bert-networks." arXiv preprint arXiv:1908.10084 (2019).

[14]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management (CIKM '13). Association for Computing Machinery, New York, NY, USA, 2333–2338. https://doi.org/10.1145/2505515.2505665

Digital Library

[15]

Sparck Jones, Karen. "A statistical interpretation of term specificity and its application in retrieval." Journal of documentation 28.1 (1972): 11-21.

Index Terms

Embedding based retrieval for long tail search queries in ecommerce
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
    2. Machine learning approaches
      1. Learning latent representations
      2. Neural networks
2. Information systems
  1. Information retrieval

Recommendations

Can Short Queries Be Even Shorter?
ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval

It is well known that query formulation could affect retrieval performance. Empirical observations suggested that a query may contain extraneous terms that could harm the retrieval effectiveness. This is true for both verbose and title queries. Given a ...
An Intermediate Query Model for Structured Retrieval's Queries Construction
iiWAS '14: Proceedings of the 16th International Conference on Information Integration and Web-based Applications & Services

Looking at the amount of structured contents available on the web in recent years, we can be certain that the needs of structured retrieval systems are getting more prominent. In order to access the structured contents, information requests are ...
Word Embedding-Based Reformulation for Long Queries in Information Search
Web Information Systems and Applications
Abstract
It has been found that very often long queries are more challenging than short queries for information search engines to obtain good results. In this paper, we present a word embedding-based approach. First short queries or concepts are extracted ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

October 2024

1438 pages

ISBN:9798400705052

DOI:10.1145/3640457

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024

Check for updates

Author Tags

Qualifiers

Extended-abstract
Research
Refereed limited

Conference

RecSys '24

Sponsor:

RecSys '24: 18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari, Italy

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
344
Total Downloads

Downloads (Last 12 months)344
Downloads (Last 6 weeks)21

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten