Abstract
This paper introduces Turbo Scan (TS), a novel k-nearest neighbor search solution tailored for high-dimensional data and specific workloads where indexing can’t be efficiently amortized over time. There exist situations where the overhead of index construction isn’t warranted given the few queries executed on the dataset.
Rooted in the Johnson-Lindenstrauss (JL) lemma, our approach sidesteps the need for random rotations. To validate TS’s superiority, we offer in-depth algorithmic and experimental evaluations. Our findings highlight TS’s unique attributes and confirm its performance, surpassing sequential scans by 1.7x at perfect recall and a remarkable 2.5x at 97% recall.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Instead of computing \(\sqrt{\sum _i |u_i-v_i|^2}\) we calculate \(\sum _i |u_i-v_i|^2\), which produces the same ordering of the results.
- 2.
Available at: https://sisap-challenges.github.io/datasets/.
- 3.
Available at: https://github.com/sadit/SlicedSearch.jl.
References
Anowar, F., Sadaoui, S., Selim, B.: Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput. Sci. Rev. 40, 100378 (2021)
Gao, J., Long, C.: High-dimensional approximate nearest neighbor search: with reliable and efficient distance comparison operations. arXiv preprint arXiv:2303.09855 (2023)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26, 189–206 (1984)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2020)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems 14 (2001)
Rakthanmanon, T., et al.: Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. ACM Trans. Knowl. Discov. Data (TKDD) 7(3), 1–31 (2013)
Schuhmann, C., et al.: LAION-5B: an open large-scale dataset for training next generation image-text models (2022)
Zhang, H., Dong, Y., Xu, D.: Accelerating exact nearest neighbor search in high dimensional Euclidean space via block vectors. Int. J. Intell. Syst. 37(2), 1697–1722 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chavez, E., Tellez, E.S. (2023). Turbo Scan: Fast Sequential Nearest Neighbor Search in High Dimensions. In: Pedreira, O., Estivill-Castro, V. (eds) Similarity Search and Applications. SISAP 2023. Lecture Notes in Computer Science, vol 14289. Springer, Cham. https://doi.org/10.1007/978-3-031-46994-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-46994-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46993-0
Online ISBN: 978-3-031-46994-7
eBook Packages: Computer ScienceComputer Science (R0)