ABSTRACT
Near neighbor search in high dimensional spaces is useful in many applications. Existing techniques solve this problem efficiently only for the approximate cases. These solutions are designed to solve r-near neighbor queries for a fixed query range or for a set of query ranges with probabilistic guarantees, and then extended for nearest neighbor queries. Solutions supporting a set of query ranges suffer from prohibitive space cost. There are many applications which are quality sensitive and need to efficiently and accurately support near neighbor queries for all query ranges. In this paper, we propose a novel indexing and querying scheme called Spatial Intersection and Metric Pruning (SIMP). It efficiently supports r-near neighbor queries in very high dimensional spaces for all query ranges with 100% quality guarantee and with practical storage costs. Our empirical studies on three real datasets having dimensions between 32 and 256 and sizes up to 10 million show a superior performance of SIMP over LSH, Multi-Probe LSH, LSB tree, and iDistance. Our scalability tests on real datasets having as many as 100 million points of dimensions up to 256 establish that SIMP scales linearly with query range, dataset dimension, and dataset size.
- A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117--122, 2008. Google ScholarDigital Library
- M. Bawa, T. Condie, and P. Ganesan. Lsh forest: self-tuning indexes for similarity search. In WWW, pages 651--660, 2005. Google ScholarDigital Library
- J. L. Bentley. Multidimensional binary search trees used for associative searching. 18(9):509--517, 1975. Google ScholarDigital Library
- S. Berchtold, C. Böhm, D. A. Keim, and H.-P. Kriegel. A cost model for nearest neighbor search in high-dimensional data space. In PODS, pages 78--86, 1997. Google ScholarDigital Library
- S. Berchtold, D. A. Keim, H.-P. Kriegel, and T. Seidl. Indexing the solution space: A new technique for nearest neighbor search in high-dimensional space. IEEE TKDE, 12(1):45--57, 2000. Google ScholarDigital Library
- C. Böhm. A cost model for query processing in high-dimensional data. ACM TDS, 25:129--178, 2000. Google ScholarDigital Library
- J. Buhler. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17:419--428, 2001.Google ScholarCross Ref
- M. S. Charikar. Similarity estimation techniques from rounding algorithms. In ACM STOC, pages 380--388, 2002. Google ScholarDigital Library
- P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435, 1997. Google ScholarDigital Library
- P. Ciaccia, M. Patella, and P. Zezula. A cost model for similarity queries in metric spaces. In PODS, pages 59--68, 1998. Google ScholarDigital Library
- M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, pages 253--262, 2004. Google ScholarDigital Library
- W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li. Modeling lsh for performance tuning. In CIKM, pages 669--678, 2008. Google ScholarDigital Library
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231. AAAI Press, 1996.Google ScholarDigital Library
- V. Gaede and O. Günther. Multidimensional access methods. ACM Comput. Surv., 30(2):170--231, 1998. Google ScholarDigital Library
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999. Google ScholarDigital Library
- A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984. Google ScholarDigital Library
- P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, 1998. Google ScholarDigital Library
- H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM TDS, 30(2):364--397, 2005. Google ScholarDigital Library
- H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE TPAMI, 2010.Google Scholar
- A. Joly and O. Buisson. A posteriori multi-probe locality sensitive hashing. In ACM MM, pages 209--218, 2008. Google ScholarDigital Library
- N. Koudas, B. C. Ooi, H. T. Shen, and A. K. H. Tung. Ldc: Enabling search by partial distance in a hyper-dimensional space. In ICDE, pages 6--17, 2004. Google ScholarDigital Library
- C. A. Lang and A. K. Singh. Modeling high-dimensional index structures using sampling. In SIGMOD, pages 389--400, 2001. Google ScholarDigital Library
- C. A. Lang and A. K. Singh. Faster similarity search for multimedia data via query transformations. Int. J. Image Graphics, pages 3--30, 2003.Google ScholarCross Ref
- J. K. Lawder and P. J. H. King. Using space-filling curves for multi-dimensional indexing. In BNCOD, pages 20--35, 2000. Google ScholarDigital Library
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004. Google ScholarDigital Library
- Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In VLDB, pages 950--961, 2007. Google ScholarDigital Library
- B. S. Manjunath, P. Salembier, and T. Sikora. Introduction to MPEG-7: Multimedia Content Description Interface. Wiley, 2002. Google ScholarDigital Library
- R. Motwani, A. Naor, and R. Panigrahi. Lower bounds on locality sensitive hashing. In SCG '06, pages 154--157, 2006. Google ScholarDigital Library
- R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA, pages 1186--1195, 2006. Google ScholarDigital Library
- H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., 2005. Google ScholarDigital Library
- S. Shekhar and Y. Huang. Discovering spatial co-location patterns: A summary of results. In Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases, pages 236--256, 2001. Google ScholarDigital Library
- V. Singh, A. Bhattacharya, and A. K. Singh. Querying spatial patterns. In EDBT, pages 418--429, 2010. Google ScholarDigital Library
- Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD, pages 563--576, 2009. Google ScholarDigital Library
- R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194--205, 1998. Google ScholarDigital Library
- Z. Zhang, M. Hadjieleftheriou, B. C. Ooi, and D. Srivastava. Bed-tree: an all-purpose index structure for string similarity search based on edit distance. In SIGMOD, pages 915--926, 2010. Google ScholarDigital Library
Index Terms
- SIMP: accurate and efficient near neighbor search in high dimensional spaces
Recommendations
SIMP: Efficient XML Structural Index for Multiple Query Processing
WAIM '08: Proceedings of the 2008 The Ninth International Conference on Web-Age Information ManagementXML indexing is an important method for accelerating query processing. Existing structural indexes suffer from the problems of redundant traversal and lack of scalability for answering multiple queries simultaneously. In this paper, we present a novel ...
Top-n query processing in spatial databases considering bi-chromatic reverse k-nearest neighbors
A reverse k-nearest neighbor (RkNN) query retrieves the data points which regard the query point as one of their respective k nearest neighbors. A bi-chromatic reverse k-nearest neighbor (BRkNN) query is a variant of the RkNN query, considering two ...
Ranked Reverse Nearest Neighbor Search
Given a set of data points P and a query point q in a multidimensional space, Reverse Nearest Neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-Nearest Neighbor (RkNN) query (where k ≥ 1) generalizes RNN query to find ...
Comments