Abstract
Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main applications (online data collection and online feature learning) while taking into account these considerations. Two dynamic datasets are used, derived from the SIFT1M dataset with 1 million samples and the DEEP1B dataset with 1 billion samples. The results indicate that the often used k-d trees method is not suitable on dynamic datasets as it is slower than a straightforward baseline exhaustive search method. For online data collection, the Hierarchical Navigable Small World Graphs method achieves a consistent speedup over baseline across a wide range of recall rates. For online feature learning, the Scalable Nearest Neighbours method is faster than baseline for recall rates below 75%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Associated source code is available at https://github.com/data61/DyANN. The code is extensible with new ANN methods and new categories of dynamic search problems.
References
Amsaleg, L., Jégou, H.: Datasets for approximate nearest neighbor search. http://corpus-texmex.irisa.fr/. Accessed 12 Mar 2024
Aumüller, M., Bernhardsson, E., Faithfull, A.: ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87, 101374 (2020)
Babenko, A., Lempitsky, V.: Efficient indexing of billion-scale datasets of deep descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2055–2063 (2016)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Bernhardsson, E.: ANNOY: approximate nearest neighbors in C++/Python. https://github.com/spotify/annoy, Accessed 12 Mar 2024
Curtin, R.R., Edel, M., Shrit, O., et al.: mlpack 4: a fast, header-only C++ machine learning library. J. Open Source Softw. 8(82), 5026 (2023)
Garg, S., Milford, M.: SeqNet: learning descriptors for sequence-based hierarchical place recognition. IEEE Robot. Autom. Lett. 6(3), 4305–4312 (2021)
Girdhar, R., et al.: ImageBind: one embedding space to bind them all. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15180–15190 (2023)
Guo, R., et al.: Accelerating large-scale inference with anisotropic vector quantization. In: International Conference on Machine Learning (ICML), pp. 3887–3896 (2020)
Harandi, M.T., Hartley, R., Lovell, B., Sanderson, C.: Sparse coding on symmetric positive definite manifolds using Bregman divergences. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1294–1306 (2016)
Harwood, B., Drummond, T.: FANNG: fast approximate nearest neighbour graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5713–5722 (2016)
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2021)
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)
Kim, G., Kim, A.: Scan context: egocentric spatial descriptor for place recognition within 3D point cloud map. In: International Conference on Intelligent Robots and Systems, pp. 4802–4809 (2018)
Li, W., et al.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement. IEEE Trans. Knowl. Data Eng. 32(8), 1475–1488 (2020)
Malkov, Y., Yashunin, D.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 824–836 (2020)
Matsui, Y.: annbench: a lightweight benchmark for approximate nearest neighbor search (2020). https://github.com/matsui528/annbench
Prokhorenkova, L., Shekhovtsov, A.: Graph-based nearest neighbor search: from practice to theory. In: International Conference on Machine Learning (ICML) (2020)
Ramasubramanian, V., Paliwal, K.: Fast k-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding. IEEE Trans. Sig. Process. 40(3), 518–531 (1992)
Sanderson, C., Schleiger, E., Douglas, D., Kuhnert, P., Lu, Q.: Resolving ethics trade-offs in implementing responsible AI. In: IEEE Conference on Artificial Intelligence, pp. 1208–1213 (2024)
Shimomura, L.C., Oyamada, R.S., Vieira, M.R., Kaster, D.S.: A survey on graph-based methods for similarity searches in metric spaces. Inf. Syst. 95, 101507 (2021)
Xin, D., Miao, H., Parameswaran, A., Polyzotis, N.: Production machine learning pipelines: empirical analysis and optimization opportunities. In: ACM International Conference on Management of Data, pp. 2639–2652 (2021)
Xu, X., Yin, H., Chen, Z., Li, Y., Wang, Y., Xiong, R.: DiSCO: differentiable scan context with orientation. IEEE Robot. Autom. Lett. 6(2), 2791–2798 (2021)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Harwood, B., Dezfouli, A., Chades, I., Sanderson, C. (2025). Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_8
Download citation
DOI: https://doi.org/10.1007/978-981-96-0351-0_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0350-3
Online ISBN: 978-981-96-0351-0
eBook Packages: Computer ScienceComputer Science (R0)