Abstract
Approximate k-Nearest Neighbor (kNN) search in high-dimensional spaces is a fundamental problem in computer systems and applications. However, traditional indexes for kNN search do not scale gracefully to massive high-dimensional datasets. As the dimension and data size grows, both the time complexity and space complexity would cost a considerable amount. Motivated by the recent research advancements of learned indexes, we present a learned index for approximate kNN search in high-dimensional spaces, named HKC\(^{+}\)-index. First, a traditional tree-based index is constructed and used for query processing. Then, a deep neural network is trained as the learned index based on incoming queries and the original tree index. Extensive experiments on a variety of real-world high-dimensional datasets demonstrate that HKC\(^{+}\)-index achieves up to 7 times in running time and 8 times smaller over the original tree index, while preserving the high accuracy performance.






Similar content being viewed by others
References
Arora A, Sinha S, Kumar P, Bhattacharya A (2018) Hd-index: pushing the scalability-accuracy boundary for approximate knn search in high-dimensional spaces. PVLDB 11(8):906–919
Beis JS, Lowe DG (1997) Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: 1997 conference on computer vision and pattern recognition (CVPR ’97), June 17–19, 1997, San Juan, Puerto Rico, pp 1000–1006. IEEE Computer Society
Jon Louis Bentley (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Ciaccia P, Patella M, and Pavel Zezula. (1997) M-tree An efficient access method for similarity search in metric spaces. In: VLDB’97, Proceedings of 23rd international conference on very large data bases, August 25–29, 1997, Athens, Greece, pp 426–435. Morgan Kaufmann,
Ding J, Minhas UF, Yu J, Wang C. ALEX: an updatable adaptive learned index. In: Proceedings of the 2020 international conference on management of data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14–19, 2020, pp 969–984. ACM, 2020
Ferragina P, Vinciguerra G (2020) The pgm-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proc VLDB Endow 13(8):1162–1175
Galakatos A, Markovitch M, Binnig C, Fonseca R, Kraska T. (2018) A-tree: a bounded approximate index structure. CoRR, abs/1801.10207
Galakatos A, Markovitch M, Binnig C, Fonseca R, Kraska T. (2019) Fiting-tree: a data-aware index structure. In: Proceedings of the 2019 international conference on management of data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30–July 5, 2019, pp 1189–1206. ACM,
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929
Hadian A, Kumar A, Heinis T (2020) Hands-off model integration in spatial index structures. In: AIDB@VLDB, (2020) 2nd International workshop on applied ai for database systems and applications, Held with VLDB 2020, Online Event / Tokyo, Japan, p 2020
Hadjieleftheriou M, Manolopoulos Y, Theodoridis Y, Tsotras VJ (2017) R-trees: a dynamic index structure for spatial searching. In: Encyclopedia of GIS, pp 1805–1817. Springer
Indyk P and Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on the theory of computing, Dallas, Texas, USA, May 23–26, 1998, pp 604–613. ACM
Kipf A, Marcus R, van Renen A (2020) Radixspline: a single-pass learned index. In: Proceedings of the third international workshop on exploiting artificial intelligence techniques for data management, aiDM@SIGMOD 2020, Portland, Oregon, USA, June 19, 2020, pp 5:1–5:5. ACM
Komorowski M, Trzcinski T (2019) Random binary search trees for approximate nearest neighbour search in binary spaces. Appl Soft Comput 79:87–93
Kraska T, Beutel A, Chi EH, Dean J, Polyzotis N (2018) The case for learned index structures. In: Proceedings of the 2018 ACM SIGMOD international conference on management of data (SIGMOD), Houston, TX, USA, June 10-15, 2018, pp 489–504
Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Q 2:83–97
Leibe B, Mikolajczyk K, Schiele B (2006) Efficient clustering and matching for object class recognition. In: Proceedings of the 2006 British Machine Vision Conference (BMVC), Edinburgh, UK, September 4–7, 2006, pp 789–798
Levchenko O, Kolev B, Yagoubi DE, Akbarinia R, Masseglia F, Palpanas T, Shasha D, Valduriez Patrick (2021) Bestneighbor: efficient evaluation of knn queries on large time series databases. Knowl Inf Syst 63:349–378
Li L, Jie X, Li Yu, Cai J (2021) Hctree+: a workload-guided index for approximate knn search. Inf Sci 581:876–890
Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 2007 international conference on very large data bases (VLDB), University of Vienna, Austria, September 23-27, 2007, pp 950–961
Malkov Y, Ponomarenko A, Logvinov A, Krylov V (2014) Approximate nearest neighbor algorithm based on navigable small world graphs. Inf Syst 45:61–68
Malkov YA, Yashunin DA (2020) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans Pattern Anal Mach Intell 42(4):824–836
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38
Nathan V, Ding J, Alizadeh M, Kraska T (2020) Learning multi-dimensional indexes. In: Proceedings of the 2020 international conference on management of data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14–19, 2020, pp 985–1000
Sakurai Y, Yoshikawa M, Uemura S, Kojima H (2000) The a-tree: an index structure for high-dimensional spaces using relative approximation. In: VLDB 2000, pp 516–526. Morgan Kaufmann
Satuluri V, Parthasarathy S (2012) Bayesian locality sensitive hashing for fast similarity search. Proc VLDB Endow 5(5):430–441
Silpa-Anan C, Hartley RI (2008) Optimised kd-trees for fast image descriptor matching. In: 2008 IEEE computer society conference on computer vision and pattern recognition (CVPR 2008), 24–26 June 2008, Anchorage, Alaska, USA
Sun Y, Wang W, Qin J, Zhang Y, Lin X (2014) SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1):1–12
Wang L, Zhong Y, Yin Y (2016) Nearest neighbour cuckoo search algorithm with probabilistic mutation. Appl Soft Comput 49:498–509
Wang Y, Wang P, Pei J, Wang W, Huang S (2013) A data-adaptive and dynamic segmentation index for whole matching on time series. Proc VLDB Endow 6(10):793–804
Wu Y, Yu J, Tian Y (2019) Designing succinct secondary indexing mechanism by exploiting column correlations. In: SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30–July 5, 2019, pp 1223–1240. ACM,
Wu Y, Jin R, Zhang X (2014) Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data (SIGMOD), Snowbird, UT, USA, June 22-27, 2014, pp 1139–1150
Zheng B, Zhao X, Weng L, Hung NQ, Liu H, Jensen CS (2020) PM-LSH: a fast and accurate LSH framework for high-dimensional approximate NN search. PVLDB 13(5):643–655
Acknowledgements
This work is supported by the Heilongjiang Province Natural Science Foundation YQ2019F016.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, L., Cai, J. & Xu, J. A learned index for approximate kNN queries in high-dimensional spaces. Knowl Inf Syst 64, 3325–3342 (2022). https://doi.org/10.1007/s10115-022-01742-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01742-0