Skip to main content
Log in

A learned index for approximate kNN queries in high-dimensional spaces

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Approximate k-Nearest Neighbor (kNN) search in high-dimensional spaces is a fundamental problem in computer systems and applications. However, traditional indexes for kNN search do not scale gracefully to massive high-dimensional datasets. As the dimension and data size grows, both the time complexity and space complexity would cost a considerable amount. Motivated by the recent research advancements of learned indexes, we present a learned index for approximate kNN search in high-dimensional spaces, named HKC\(^{+}\)-index. First, a traditional tree-based index is constructed and used for query processing. Then, a deep neural network is trained as the learned index based on incoming queries and the original tree index. Extensive experiments on a variety of real-world high-dimensional datasets demonstrate that HKC\(^{+}\)-index achieves up to 7 times in running time and 8 times smaller over the original tree index, while preserving the high accuracy performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://cs.princeton.edu/cass/audio.tar.gz.

  2. http://www.cs.cmu.edu/ enron/.

  3. http://groups.csail.mit.edu/vision/SUN/.

  4. http://cs-people.bu.edu/hekun/data/TALR/NUSWIDE.zip.

  5. http://phototour.cs.washington.edu/datasets/.

  6. http://corpus-texmex.irisa.fr/.

  7. https://github.com/mariusmuja/flann.

References

  1. Arora A, Sinha S, Kumar P, Bhattacharya A (2018) Hd-index: pushing the scalability-accuracy boundary for approximate knn search in high-dimensional spaces. PVLDB 11(8):906–919

    Google Scholar 

  2. Beis JS, Lowe DG (1997) Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: 1997 conference on computer vision and pattern recognition (CVPR ’97), June 17–19, 1997, San Juan, Puerto Rico, pp 1000–1006. IEEE Computer Society

  3. Jon Louis Bentley (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Article  Google Scholar 

  4. Ciaccia P, Patella M, and Pavel Zezula. (1997) M-tree An efficient access method for similarity search in metric spaces. In: VLDB’97, Proceedings of 23rd international conference on very large data bases, August 25–29, 1997, Athens, Greece, pp 426–435. Morgan Kaufmann,

  5. Ding J, Minhas UF, Yu J, Wang C. ALEX: an updatable adaptive learned index. In: Proceedings of the 2020 international conference on management of data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14–19, 2020, pp 969–984. ACM, 2020

  6. Ferragina P, Vinciguerra G (2020) The pgm-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proc VLDB Endow 13(8):1162–1175

    Article  Google Scholar 

  7. Galakatos A, Markovitch M, Binnig C, Fonseca R, Kraska T. (2018) A-tree: a bounded approximate index structure. CoRR, abs/1801.10207

  8. Galakatos A, Markovitch M, Binnig C, Fonseca R, Kraska T. (2019) Fiting-tree: a data-aware index structure. In: Proceedings of the 2019 international conference on management of data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30–July 5, 2019, pp 1189–1206. ACM,

  9. Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929

    Article  Google Scholar 

  10. Hadian A, Kumar A, Heinis T (2020) Hands-off model integration in spatial index structures. In: AIDB@VLDB, (2020) 2nd International workshop on applied ai for database systems and applications, Held with VLDB 2020, Online Event / Tokyo, Japan, p 2020

  11. Hadjieleftheriou M, Manolopoulos Y, Theodoridis Y, Tsotras VJ (2017) R-trees: a dynamic index structure for spatial searching. In: Encyclopedia of GIS, pp 1805–1817. Springer

  12. Indyk P and Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on the theory of computing, Dallas, Texas, USA, May 23–26, 1998, pp 604–613. ACM

  13. Kipf A, Marcus R, van Renen A (2020) Radixspline: a single-pass learned index. In: Proceedings of the third international workshop on exploiting artificial intelligence techniques for data management, aiDM@SIGMOD 2020, Portland, Oregon, USA, June 19, 2020, pp 5:1–5:5. ACM

  14. Komorowski M, Trzcinski T (2019) Random binary search trees for approximate nearest neighbour search in binary spaces. Appl Soft Comput 79:87–93

    Article  Google Scholar 

  15. Kraska T, Beutel A, Chi EH, Dean J, Polyzotis N (2018) The case for learned index structures. In: Proceedings of the 2018 ACM SIGMOD international conference on management of data (SIGMOD), Houston, TX, USA, June 10-15, 2018, pp 489–504

  16. Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Q 2:83–97

    Article  MathSciNet  Google Scholar 

  17. Leibe B, Mikolajczyk K, Schiele B (2006) Efficient clustering and matching for object class recognition. In: Proceedings of the 2006 British Machine Vision Conference (BMVC), Edinburgh, UK, September 4–7, 2006, pp 789–798

  18. Levchenko O, Kolev B, Yagoubi DE, Akbarinia R, Masseglia F, Palpanas T, Shasha D, Valduriez Patrick (2021) Bestneighbor: efficient evaluation of knn queries on large time series databases. Knowl Inf Syst 63:349–378

    Article  Google Scholar 

  19. Li L, Jie X, Li Yu, Cai J (2021) Hctree+: a workload-guided index for approximate knn search. Inf Sci 581:876–890

    Article  Google Scholar 

  20. Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 2007 international conference on very large data bases (VLDB), University of Vienna, Austria, September 23-27, 2007, pp 950–961

  21. Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2014) Approximate nearest neighbor algorithm based on navigable small world graphs. Inf Syst 45:61–68

    Article  Google Scholar 

  22. Malkov YA, Yashunin DA (2020) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans Pattern Anal Mach Intell 42(4):824–836

    Article  Google Scholar 

  23. Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38

    Article  MathSciNet  Google Scholar 

  24. Nathan V, Ding J, Alizadeh M, Kraska T (2020) Learning multi-dimensional indexes. In: Proceedings of the 2020 international conference on management of data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14–19, 2020, pp 985–1000

  25. Sakurai Y, Yoshikawa M, Uemura S, Kojima H (2000) The a-tree: an index structure for high-dimensional spaces using relative approximation. In: VLDB 2000, pp 516–526. Morgan Kaufmann

  26. Satuluri V, Parthasarathy S (2012) Bayesian locality sensitive hashing for fast similarity search. Proc VLDB Endow 5(5):430–441

    Article  Google Scholar 

  27. Silpa-Anan C, Hartley RI (2008) Optimised kd-trees for fast image descriptor matching. In: 2008 IEEE computer society conference on computer vision and pattern recognition (CVPR 2008), 24–26 June 2008, Anchorage, Alaska, USA

  28. Sun Y, Wang W, Qin J, Zhang Y, Lin X (2014) SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1):1–12

    Google Scholar 

  29. Wang L, Zhong Y, Yin Y (2016) Nearest neighbour cuckoo search algorithm with probabilistic mutation. Appl Soft Comput 49:498–509

    Article  Google Scholar 

  30. Wang Y, Wang P, Pei J, Wang W, Huang S (2013) A data-adaptive and dynamic segmentation index for whole matching on time series. Proc VLDB Endow 6(10):793–804

    Article  Google Scholar 

  31. Wu Y, Yu J, Tian Y (2019) Designing succinct secondary indexing mechanism by exploiting column correlations. In: SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30–July 5, 2019, pp 1223–1240. ACM,

  32. Wu Y, Jin R, Zhang X (2014) Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data (SIGMOD), Snowbird, UT, USA, June 22-27, 2014, pp 1139–1150

  33. Zheng B, Zhao X, Weng L, Hung NQ, Liu H, Jensen CS (2020) PM-LSH: a fast and accurate LSH framework for high-dimensional approximate NN search. PVLDB 13(5):643–655

    Google Scholar 

Download references

Acknowledgements

This work is supported by the Heilongjiang Province Natural Science Foundation YQ2019F016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingli Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, L., Cai, J. & Xu, J. A learned index for approximate kNN queries in high-dimensional spaces. Knowl Inf Syst 64, 3325–3342 (2022). https://doi.org/10.1007/s10115-022-01742-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01742-0

Keywords

Navigation