Skip to main content

Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation

  • Conference paper
  • First Online:
AI 2024: Advances in Artificial Intelligence (AI 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15443))

Included in the following conference series:

  • 279 Accesses

Abstract

Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main applications (online data collection and online feature learning) while taking into account these considerations. Two dynamic datasets are used, derived from the SIFT1M dataset with 1 million samples and the DEEP1B dataset with 1 billion samples. The results indicate that the often used k-d trees method is not suitable on dynamic datasets as it is slower than a straightforward baseline exhaustive search method. For online data collection, the Hierarchical Navigable Small World Graphs method achieves a consistent speedup over baseline across a wide range of recall rates. For online feature learning, the Scalable Nearest Neighbours method is faster than baseline for recall rates below 75%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Associated source code is available at https://github.com/data61/DyANN. The code is extensible with new ANN methods and new categories of dynamic search problems.

References

  1. Amsaleg, L., Jégou, H.: Datasets for approximate nearest neighbor search. http://corpus-texmex.irisa.fr/. Accessed 12 Mar 2024

  2. Aumüller, M., Bernhardsson, E., Faithfull, A.: ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87, 101374 (2020)

    Article  Google Scholar 

  3. Babenko, A., Lempitsky, V.: Efficient indexing of billion-scale datasets of deep descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2055–2063 (2016)

    Google Scholar 

  4. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  Google Scholar 

  5. Bernhardsson, E.: ANNOY: approximate nearest neighbors in C++/Python. https://github.com/spotify/annoy, Accessed 12 Mar 2024

  6. Curtin, R.R., Edel, M., Shrit, O., et al.: mlpack 4: a fast, header-only C++ machine learning library. J. Open Source Softw. 8(82), 5026 (2023)

    Article  Google Scholar 

  7. Garg, S., Milford, M.: SeqNet: learning descriptors for sequence-based hierarchical place recognition. IEEE Robot. Autom. Lett. 6(3), 4305–4312 (2021)

    Article  Google Scholar 

  8. Girdhar, R., et al.: ImageBind: one embedding space to bind them all. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15180–15190 (2023)

    Google Scholar 

  9. Guo, R., et al.: Accelerating large-scale inference with anisotropic vector quantization. In: International Conference on Machine Learning (ICML), pp. 3887–3896 (2020)

    Google Scholar 

  10. Harandi, M.T., Hartley, R., Lovell, B., Sanderson, C.: Sparse coding on symmetric positive definite manifolds using Bregman divergences. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1294–1306 (2016)

    Article  Google Scholar 

  11. Harwood, B., Drummond, T.: FANNG: fast approximate nearest neighbour graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5713–5722 (2016)

    Google Scholar 

  12. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2021)

    Article  Google Scholar 

  13. Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)

    Article  Google Scholar 

  14. Kim, G., Kim, A.: Scan context: egocentric spatial descriptor for place recognition within 3D point cloud map. In: International Conference on Intelligent Robots and Systems, pp. 4802–4809 (2018)

    Google Scholar 

  15. Li, W., et al.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement. IEEE Trans. Knowl. Data Eng. 32(8), 1475–1488 (2020)

    Article  Google Scholar 

  16. Malkov, Y., Yashunin, D.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 824–836 (2020)

    Article  Google Scholar 

  17. Matsui, Y.: annbench: a lightweight benchmark for approximate nearest neighbor search (2020). https://github.com/matsui528/annbench

  18. Prokhorenkova, L., Shekhovtsov, A.: Graph-based nearest neighbor search: from practice to theory. In: International Conference on Machine Learning (ICML) (2020)

    Google Scholar 

  19. Ramasubramanian, V., Paliwal, K.: Fast k-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding. IEEE Trans. Sig. Process. 40(3), 518–531 (1992)

    Article  Google Scholar 

  20. Sanderson, C., Schleiger, E., Douglas, D., Kuhnert, P., Lu, Q.: Resolving ethics trade-offs in implementing responsible AI. In: IEEE Conference on Artificial Intelligence, pp. 1208–1213 (2024)

    Google Scholar 

  21. Shimomura, L.C., Oyamada, R.S., Vieira, M.R., Kaster, D.S.: A survey on graph-based methods for similarity searches in metric spaces. Inf. Syst. 95, 101507 (2021)

    Article  Google Scholar 

  22. Xin, D., Miao, H., Parameswaran, A., Polyzotis, N.: Production machine learning pipelines: empirical analysis and optimization opportunities. In: ACM International Conference on Management of Data, pp. 2639–2652 (2021)

    Google Scholar 

  23. Xu, X., Yin, H., Chen, Z., Li, Y., Wang, Y., Xiong, R.: DiSCO: differentiable scan context with orientation. IEEE Robot. Autom. Lett. 6(2), 2791–2798 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Harwood, B., Dezfouli, A., Chades, I., Sanderson, C. (2025). Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0351-0_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0350-3

  • Online ISBN: 978-981-96-0351-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics