Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation

Harwood, Ben; Dezfouli, Amir; Chades, Iadine; Sanderson, Conrad

doi:10.1007/978-981-96-0351-0_8

Ben Harwood¹²,
Amir Dezfouli¹³,
Iadine Chades¹² &
…
Conrad Sanderson^12,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15443))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

279 Accesses

Abstract

Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main applications (online data collection and online feature learning) while taking into account these considerations. Two dynamic datasets are used, derived from the SIFT1M dataset with 1 million samples and the DEEP1B dataset with 1 billion samples. The results indicate that the often used k-d trees method is not suitable on dynamic datasets as it is slower than a straightforward baseline exhaustive search method. For online data collection, the Hierarchical Navigable Small World Graphs method achieves a consistent speedup over baseline across a wide range of recall rates. For online feature learning, the Scalable Nearest Neighbours method is faster than baseline for recall rates below 75%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Associated source code is available at https://github.com/data61/DyANN. The code is extensible with new ANN methods and new categories of dynamic search problems.

References

Amsaleg, L., Jégou, H.: Datasets for approximate nearest neighbor search. http://corpus-texmex.irisa.fr/. Accessed 12 Mar 2024
Aumüller, M., Bernhardsson, E., Faithfull, A.: ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87, 101374 (2020)
Article Google Scholar
Babenko, A., Lempitsky, V.: Efficient indexing of billion-scale datasets of deep descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2055–2063 (2016)
Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article Google Scholar
Bernhardsson, E.: ANNOY: approximate nearest neighbors in C++/Python. https://github.com/spotify/annoy, Accessed 12 Mar 2024
Curtin, R.R., Edel, M., Shrit, O., et al.: mlpack 4: a fast, header-only C++ machine learning library. J. Open Source Softw. 8(82), 5026 (2023)
Article Google Scholar
Garg, S., Milford, M.: SeqNet: learning descriptors for sequence-based hierarchical place recognition. IEEE Robot. Autom. Lett. 6(3), 4305–4312 (2021)
Article Google Scholar
Girdhar, R., et al.: ImageBind: one embedding space to bind them all. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15180–15190 (2023)
Google Scholar
Guo, R., et al.: Accelerating large-scale inference with anisotropic vector quantization. In: International Conference on Machine Learning (ICML), pp. 3887–3896 (2020)
Google Scholar
Harandi, M.T., Hartley, R., Lovell, B., Sanderson, C.: Sparse coding on symmetric positive definite manifolds using Bregman divergences. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1294–1306 (2016)
Article Google Scholar
Harwood, B., Drummond, T.: FANNG: fast approximate nearest neighbour graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5713–5722 (2016)
Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2021)
Article Google Scholar
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)
Article Google Scholar
Kim, G., Kim, A.: Scan context: egocentric spatial descriptor for place recognition within 3D point cloud map. In: International Conference on Intelligent Robots and Systems, pp. 4802–4809 (2018)
Google Scholar
Li, W., et al.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement. IEEE Trans. Knowl. Data Eng. 32(8), 1475–1488 (2020)
Article Google Scholar
Malkov, Y., Yashunin, D.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 824–836 (2020)
Article Google Scholar
Matsui, Y.: annbench: a lightweight benchmark for approximate nearest neighbor search (2020). https://github.com/matsui528/annbench
Prokhorenkova, L., Shekhovtsov, A.: Graph-based nearest neighbor search: from practice to theory. In: International Conference on Machine Learning (ICML) (2020)
Google Scholar
Ramasubramanian, V., Paliwal, K.: Fast k-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding. IEEE Trans. Sig. Process. 40(3), 518–531 (1992)
Article Google Scholar
Sanderson, C., Schleiger, E., Douglas, D., Kuhnert, P., Lu, Q.: Resolving ethics trade-offs in implementing responsible AI. In: IEEE Conference on Artificial Intelligence, pp. 1208–1213 (2024)
Google Scholar
Shimomura, L.C., Oyamada, R.S., Vieira, M.R., Kaster, D.S.: A survey on graph-based methods for similarity searches in metric spaces. Inf. Syst. 95, 101507 (2021)
Article Google Scholar
Xin, D., Miao, H., Parameswaran, A., Polyzotis, N.: Production machine learning pipelines: empirical analysis and optimization opportunities. In: ACM International Conference on Management of Data, pp. 2639–2652 (2021)
Google Scholar
Xu, X., Yin, H., Chen, Z., Li, Y., Wang, Y., Xiong, R.: DiSCO: differentiable scan context with orientation. IEEE Robot. Autom. Lett. 6(2), 2791–2798 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

CSIRO, Brisbane, Australia
Ben Harwood, Iadine Chades & Conrad Sanderson
BIMLOGIQ, Sydney, Australia
Amir Dezfouli
Griffith University, Brisbane, Australia
Conrad Sanderson

Authors

Ben Harwood
View author publications
You can also search for this author in PubMed Google Scholar
Amir Dezfouli
View author publications
You can also search for this author in PubMed Google Scholar
Iadine Chades
View author publications
You can also search for this author in PubMed Google Scholar
Conrad Sanderson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of Melbourne, Parkville, VIC, Australia
Mingming Gong
The University of Adelaide, Adelaide, SA, Australia
Yiliao Song
The University of Auckland, Auckland, Auckland, New Zealand
Yun Sing Koh
La Trobe University, Bundoora, VIC, Australia
Wei Xiang
CSIRO’s Data61, Clayton, VIC, Australia
Derui Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Harwood, B., Dezfouli, A., Chades, I., Sanderson, C. (2025). Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_8

Download citation

DOI: https://doi.org/10.1007/978-981-96-0351-0_8
Published: 20 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0350-3
Online ISBN: 978-981-96-0351-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation