Abstract
Many modern search applications are high-dimensional and depend on efficient orthogonal range queries. These applications span web-based and scientific needs as well as uses for data mining. Although k-nearest neighbor queries are becoming increasingly common due to mobile and geospatial applications, orthogonal range queries in high-dimensional data remain extremely important and relevant. For efficient querying, data is typically stored in an index optimized for either kNN or range queries. This can be problematic when data is optimized for kNN retrieval and a user needs a range query or vice versa. Here, we address the issue of using a kNN-based index for range queries, as well as outline the general computational geometry problem of adapting these systems to range queries. We refer to these methods as space-based decompositions and provide a straightforward heuristic for this problem. Using iDistance as our applied kNN indexing technique, we also develop an optimal (data-based) algorithm designed specifically for its indexing scheme. We compare this method to the suggested naïve approach using real world datasets. The data-based algorithm consistently performs better.












Similar content being viewed by others
References
Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23:345–405. doi:10.1145/116873.116880
Bayer R, McCreight EM (1972) Organization and maintenance of large ordered indices. Acta Inform 1:173–189
Bellman R, Bellman RE (1961) Adaptive control processes: a guided tour, vol 4. Princeton University Press, Princeton
Berchtold S, Böhm C, Kriegal HP (1998) The pyramid-technique: towards breaking the curse of dimensionality. SIGMOD Rec 27:142–153
de Berg M, Cheong O, van Kreveld M, Overmars M (2008) Computational geometry: algorithms and applications, 3rd edn. Springer, Heidelberg
Chazelle B (1990) Lower bounds for orthogonal range searching: I. The reporting case. J ACM 37(2):200–212. doi:10.1145/77600.77614
Chen Z, Fu B, Tang Y, Zhu B (2006) A ptas for a disc covering problem using width-bounded separators. J Comb Optim 11(2):203–217. doi:10.1007/s10878-006-7132-y
Doulkeridis C, Vlachou A, Kotidis Y, Vazirgiannis M (2007) Peer-to-peer similarity search in metric spaces. In: Proceedings of the 33rd international conference on very large data bases, VLDB’07, pp 986–997
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 47–57
Hales TC (2006) Historical overview of the kepler conjecture. Discret Comput Geom 36:5–20
Hales TC (2014) The flyspeck project. https://code.google.com/p/flyspeck/. Accessed 10 Oct 2014
Hales TC, McLaughlin S (2008) A proof of the dodecahedral conjecture. CoRR abs/9811079, 9811079v3
Ilarri S, Mena E, Illarramendi A (2006) Location-dependent queries in mobile contexts: distributed processing using mobile agents. IEEE Trans Mob Comput 5(8):1029–1043
Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30:364–397
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137. doi:10.1109/tit.1982.1056489
Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE international conference on computer vision, vol 2, pp 1150–1157
Lu Y, Chen D, Cha J (2015) Packing cubes into a cube is NP-complete in the strong sense. J Comb Optim 29(1):197–215. doi:10.1007/s10878-013-9701-1
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proceedings of the 5th Berkeley symposium on Mathematical Statistics and Probability, UC Press, vol 1, pp 281–297
Ooi BC, Tan KL, Yu C, Bressan S (2000) Indexing the edges: a simple and yet efficient approach to high-dimensional indexing. In: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, ACM, New York, PODS’00, pp 166–174
Qu L, Chen Y, Yang X (2008) iDistance based interactive visual surveillance retrieval algorithm. In: Intelligent Computation Technology and Automation (ICICTA), IEEE, vol 1, pp 71–75
Samet H (2006) Foundations of multidimensional and metric data structures (The Morgan Kaufmann series in computer graphics and geometric modeling). Morgan Kaufmann Publishers Inc., San Francisco
Schuh MA, Wylie T, Angryk RA (2013a) Improving the performance of high-dimensional knn retrieval through localized dataspace segmentation and hybrid indexing. In: Advances in databases and information systems (ADBIS’13). Lecture notes in computer science, vol 8133. Springer, Berlin, pp 344–357
Schuh MA, Wylie T, Banda JM, Angryk RA (2013b) A comprehensive study of idistance partitioning strategies for knn queries and high-dimensional data indexing. In: The 29th British national conference on databases (BNCOD’13). Lecture notes in computer science, vol 7968. Springer, Berlin, pp 238–252
Schuh MA, Wylie T, Angryk RA (2014a) Mitigating the curse of dimensionality for exact knn retrieval. In: Proceedings of the 27th international Florida artifical intelligence research society conference, FLAIRS’14, pp 363–368
Schuh MA, Wylie T, Liu C, Angryk RA (2014b) Approximating high-dimensional range queries with knn indexing techniques. In: The 20th international computing and combinatorics conference (COCOON’14). Lecture notes in computer science, vol 8591, pp 369–380
Shen HT (2005) Towards effective indexing for very large video sequence database. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD’05, pp 730–741
Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40(4):175–179
Yu C, Ooi BC, Tan KL, Jagadish HV (2001) Indexing the distance: an efficient method to KNN processing. In: Proceedings of the 27th international conference on very large data bases, Morgan Kaufmann Publishers Inc., San Francisco, VLDB’01, pp 421–430
Zhang J, Zhou X, Wang W, Shi B, Pei J (2006) Using high dimensional indexes to support relevance feedback based interactive images retrieval. In: Proceedings of the 32nd international conference on very large data bases, VLDB’06, pp 1211–1214
Zhang R, Ooi B, Tan KL (2004) Making the pyramid technique robust to query types and workloads. In: Proceedings of the 20th international conference on data engineering, pp 313–324
Zhu B (2007) On the 1-density of unit ball covering. CoRR abs/0711.2092, 0711.2092v4
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wylie, T., Schuh, M.A. & Angryk, R.A. Enabling high-dimensional range queries using kNN indexing techniques: approaches and empirical results. J Comb Optim 32, 1107–1132 (2016). https://doi.org/10.1007/s10878-015-9927-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-015-9927-1