short-paper

Applications and Future of Dense Retrieval in Industry

Author:

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 3373 - 3374

https://doi.org/10.1145/3477495.3536324

Published: 07 July 2022 Publication History

Get Access

Abstract

Large-scale search engines are often designed as tiered systems with at least two layers. The L1 candidate retrieval layer efficiently generates a subset of potentially relevant documents (typically ~1000 documents) from a corpus many orders of magnitude larger in size. L1 systems emphasize efficiency and are designed to maximize recall. The L2 re-ranking layer uses a more computationally expensive, but more accurate model (e.g. learning-to-rank or neural model) to re-rank the candidates generated by L1 in order to maximize precision of the final result list.

Traditionally, candidate retrieval was performed with an inverted index data structure, with exact lexical matching. Candidates are ordered by a dot-product-like scoring function f(q,d) where q and d are sparse vectors containing token weights, typically derived from the token's frequency in the document/query and corpus. The inverted index enables sub-linear ranking of the documents. Due to the sparse vector representation of the documents and queries, lexical match retrieval systems have also been called sparse retrieval.

To contrast, dense retrieval represents queries and documents by embedding the text into lower dimensional dense vectors. Candidate documents are scored based on the distance between the query and document embedding vectors. Practically, the similarity computations are made efficiently with approximate k-nearest neighbours (ANN) systems.

In this panel, we bring together experts in dense retrieval across multiple industry applications, including web search, enterprise and personal search, e-commerce, and out-of-domain retrieval.

References

[1]

Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. Pre-training tasks for embedding-based large-scale retrieval. In International Conference on Learning Representations, 2020.

Google Scholar

[2]

Lidan Wang, Jimmy Lin, and Donald Metzler. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '11, page 105--114, New York, NY, USA, 2011. Association for Computing Machinery.

Digital Library

Google Scholar

Cited By

View all

Gou YGao JXu YLong C(2025)SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/37097303:1(1-26)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709730
Pan JWang JLi GBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Vector Database Management Techniques and SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654691(597-604)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654691
Pan JWang JLi G(2024)Survey of vector database management systemsThe VLDB Journal10.1007/s00778-024-00864-x33:5(1591-1615)Online publication date: 15-Jul-2024
https://doi.org/10.1007/s00778-024-00864-x
Show More Cited By

Index Terms

Applications and Future of Dense Retrieval in Industry
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Improving zero-shot retrieval using dense external expansion
Abstract
Pseudo-relevance feedback (PRF) is a classical technique to improve search engine retrieval effectiveness, by closing the vocabulary gap between users’ query formulations and the relevant documents. While PRF is typically applied on ...
Highlights
- Dense external expansion improves zero-shot retrieval performance.
- High quality ...
Cluster-based Partial Dense Retrieval Fused with Sparse Text Retrieval
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Previous work has demonstrated the potential to combine document rankings from dense and sparse retrievers for higher relevance effectiveness. This paper proposes a cluster-based partial dense retrieval scheme guided by sparse retrieval results to ...
Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval
ICTIR '21: Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval

Pseudo-relevance feedback mechanisms, from Rocchio to the relevance models, have shown the usefulness of expanding and reweighting the users' initial queries using information occurring in an initial set of retrieved documents, known as the pseudo-...

Comments

Information & Contributors

Information

Published In

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
295
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)7

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Gou YGao JXu YLong C(2025)SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/37097303:1(1-26)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709730
Pan JWang JLi GBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Vector Database Management Techniques and SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654691(597-604)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654691
Pan JWang JLi G(2024)Survey of vector database management systemsThe VLDB Journal10.1007/s00778-024-00864-x33:5(1591-1615)Online publication date: 15-Jul-2024
https://doi.org/10.1007/s00778-024-00864-x
Zhuang SShou LPei JGong MRen HZuccon GJiang D(2023)Typos-aware Bottlenecked Pre-Training for Robust Dense RetrievalProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625324(212-222)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625324

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Improving zero-shot retrieval using dense external expansion

Cluster-based Partial Dense Retrieval Fused with Sparse Text Retrieval

Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations