Conferences >2014 IEEE 30th International ...

L2AP: Fast cosine similarity search with prefix L-2 norm bounds

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The All-Pairs similarity search, or self-similarity join problem, finds all pairs of vectors in a high dimensional sparse dataset with a similarity value higher than a gi...Show More

Metadata

Abstract:

The All-Pairs similarity search, or self-similarity join problem, finds all pairs of vectors in a high dimensional sparse dataset with a similarity value higher than a given threshold. The problem has been classically solved using a dynamically built inverted index. The search time is reduced by early pruning of candidates using size and value-based bounds on the similarity. In the context of cosine similarity and weighted vectors, leveraging the Cauchy-Schwarz inequality, we propose new ℓ²-norm bounds for reducing the inverted index size, candidate pool size, and the number of full dot-product computations. We tighten previous candidate generation and verification bounds and introduce several new ones to further improve our algorithm's performance. Our new pruning strategies enable significant speedups over baseline approaches, most times outperforming even approximate solutions. We perform an extensive evaluation of our algorithm, L2AP, and compare against state-of-the-art exact and approximate methods, AllPairs, MMJoin, and BayesLSH, across a variety of real-world datasets and similarity thresholds.

Published in: 2014 IEEE 30th International Conference on Data Engineering

Date of Conference: 31 March 2014 - 04 April 2014

Date Added to IEEE Xplore: 19 May 2014

Electronic ISBN:978-1-4799-2555-1

ISSN Information:

DOI: 10.1109/ICDE.2014.6816700

Conference Location: Chicago, IL, USA

Contents

References is not available for this document.

L2AP: Fast cosine similarity search with prefix L-2 norm bounds

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

L2AP: Fast cosine similarity search with prefix L-2 norm bounds

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?