research-article

SIMP: accurate and efficient near neighbor search in high dimensional spaces

Authors:

Vishwakarma Singh,

Ambuj K. SinghAuthors Info & Claims

EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology

Pages 492 - 503

https://doi.org/10.1145/2247596.2247654

Published: 27 March 2012 Publication History

Abstract

Near neighbor search in high dimensional spaces is useful in many applications. Existing techniques solve this problem efficiently only for the approximate cases. These solutions are designed to solve r-near neighbor queries for a fixed query range or for a set of query ranges with probabilistic guarantees, and then extended for nearest neighbor queries. Solutions supporting a set of query ranges suffer from prohibitive space cost. There are many applications which are quality sensitive and need to efficiently and accurately support near neighbor queries for all query ranges. In this paper, we propose a novel indexing and querying scheme called Spatial Intersection and Metric Pruning (SIMP). It efficiently supports r-near neighbor queries in very high dimensional spaces for all query ranges with 100% quality guarantee and with practical storage costs. Our empirical studies on three real datasets having dimensions between 32 and 256 and sizes up to 10 million show a superior performance of SIMP over LSH, Multi-Probe LSH, LSB tree, and iDistance. Our scalability tests on real datasets having as many as 100 million points of dimensions up to 256 establish that SIMP scales linearly with query range, dataset dimension, and dataset size.

References

[1]

A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117--122, 2008.

Digital Library

[2]

M. Bawa, T. Condie, and P. Ganesan. Lsh forest: self-tuning indexes for similarity search. In WWW, pages 651--660, 2005.

Digital Library

[3]

J. L. Bentley. Multidimensional binary search trees used for associative searching. 18(9):509--517, 1975.

Digital Library

[4]

S. Berchtold, C. Böhm, D. A. Keim, and H.-P. Kriegel. A cost model for nearest neighbor search in high-dimensional data space. In PODS, pages 78--86, 1997.

Digital Library

[5]

S. Berchtold, D. A. Keim, H.-P. Kriegel, and T. Seidl. Indexing the solution space: A new technique for nearest neighbor search in high-dimensional space. IEEE TKDE, 12(1):45--57, 2000.

Digital Library

[6]

C. Böhm. A cost model for query processing in high-dimensional data. ACM TDS, 25:129--178, 2000.

Digital Library

[7]

J. Buhler. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17:419--428, 2001.

[8]

M. S. Charikar. Similarity estimation techniques from rounding algorithms. In ACM STOC, pages 380--388, 2002.

Digital Library

[9]

P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435, 1997.

Digital Library

[10]

P. Ciaccia, M. Patella, and P. Zezula. A cost model for similarity queries in metric spaces. In PODS, pages 59--68, 1998.

Digital Library

[11]

M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, pages 253--262, 2004.

Digital Library

[12]

W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li. Modeling lsh for performance tuning. In CIKM, pages 669--678, 2008.

Digital Library

[13]

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231. AAAI Press, 1996.

Digital Library

[14]

V. Gaede and O. Günther. Multidimensional access methods. ACM Comput. Surv., 30(2):170--231, 1998.

Digital Library

[15]

A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999.

Digital Library

[16]

A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.

Digital Library

[17]

P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, 1998.

Digital Library

[18]

H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM TDS, 30(2):364--397, 2005.

Digital Library

[19]

H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE TPAMI, 2010.

[20]

A. Joly and O. Buisson. A posteriori multi-probe locality sensitive hashing. In ACM MM, pages 209--218, 2008.

Digital Library

[21]

N. Koudas, B. C. Ooi, H. T. Shen, and A. K. H. Tung. Ldc: Enabling search by partial distance in a hyper-dimensional space. In ICDE, pages 6--17, 2004.

Digital Library

[22]

C. A. Lang and A. K. Singh. Modeling high-dimensional index structures using sampling. In SIGMOD, pages 389--400, 2001.

Digital Library

[23]

C. A. Lang and A. K. Singh. Faster similarity search for multimedia data via query transformations. Int. J. Image Graphics, pages 3--30, 2003.

[24]

J. K. Lawder and P. J. H. King. Using space-filling curves for multi-dimensional indexing. In BNCOD, pages 20--35, 2000.

Digital Library

[25]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004.

Digital Library

[26]

Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In VLDB, pages 950--961, 2007.

Digital Library

[27]

B. S. Manjunath, P. Salembier, and T. Sikora. Introduction to MPEG-7: Multimedia Content Description Interface. Wiley, 2002.

Digital Library

[28]

R. Motwani, A. Naor, and R. Panigrahi. Lower bounds on locality sensitive hashing. In SCG '06, pages 154--157, 2006.

Digital Library

[29]

R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA, pages 1186--1195, 2006.

Digital Library

[30]

H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., 2005.

Digital Library

[31]

S. Shekhar and Y. Huang. Discovering spatial co-location patterns: A summary of results. In Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases, pages 236--256, 2001.

Digital Library

[32]

V. Singh, A. Bhattacharya, and A. K. Singh. Querying spatial patterns. In EDBT, pages 418--429, 2010.

Digital Library

[33]

Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD, pages 563--576, 2009.

Digital Library

[34]

R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194--205, 1998.

Digital Library

[35]

Z. Zhang, M. Hadjieleftheriou, B. C. Ooi, and D. Srivastava. Bed-tree: an all-purpose index structure for string similarity search based on edit distance. In SIGMOD, pages 915--926, 2010.

Digital Library

Cited By

Zhu LZhang SSong XMa QMeng W(2023)Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree IndexWeb Information Systems Engineering – WISE 202310.1007/978-981-99-7254-8_57(733-747)Online publication date: 21-Oct-2023
https://doi.org/10.1007/978-981-99-7254-8_57
Zhu LLi PWei YSong XWang Y(2021)Processing Approximate KNN Query Based on Data Source Selection2021 International Conference on Intelligent Computing, Automation and Applications (ICAA)10.1109/ICAA53760.2021.00121(672-676)Online publication date: Jun-2021
https://doi.org/10.1109/ICAA53760.2021.00121
Zhu LLi XWei YMa QMeng W(2021)Integrating Real-Time Entity Resolution with Top-N Join Query ProcessingKnowledge Science, Engineering and Management10.1007/978-3-030-82153-1_10(111-123)Online publication date: 7-Aug-2021
https://doi.org/10.1007/978-3-030-82153-1_10
Show More Cited By

Index Terms

SIMP: accurate and efficient near neighbor search in high dimensional spaces
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

SIMP: Efficient XML Structural Index for Multiple Query Processing
WAIM '08: Proceedings of the 2008 The Ninth International Conference on Web-Age Information Management

XML indexing is an important method for accelerating query processing. Existing structural indexes suffer from the problems of redundant traversal and lack of scalability for answering multiple queries simultaneously. In this paper, we present a novel ...
Top-n query processing in spatial databases considering bi-chromatic reverse k-nearest neighbors

A reverse k-nearest neighbor (RkNN) query retrieves the data points which regard the query point as one of their respective k nearest neighbors. A bi-chromatic reverse k-nearest neighbor (BRkNN) query is a variant of the RkNN query, considering two ...
Ranked Reverse Nearest Neighbor Search

Given a set of data points P and a query point q in a multidimensional space, Reverse Nearest Neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-Nearest Neighbor (RkNN) query (where k ≥ 1) generalizes RNN query to find ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology

March 2012

643 pages

ISBN:9781450307901

DOI:10.1145/2247596

Editors:
Elke Rundensteiner
Worcester Polytechnic Institute
,
Volker Markl
Technische Universität Berlin, Germany
,
Ioana Manolescu
INRIA, France
,
Sihem Amer-Yahia
QCRI, Doha, Qatar
,
Felix Naumann
Hasso Plattner Institute, Potsdam, Germany
,
Ismail Ari
Ozyegin University, Turkey

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Division of Information and Intelligent Systems

Conference

EDBT '12

EDBT '12: 15th International Conference on Extending Database Technology

March 27 - 30, 2012

Berlin, Germany

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
216
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhu LZhang SSong XMa QMeng W(2023)Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree IndexWeb Information Systems Engineering – WISE 202310.1007/978-981-99-7254-8_57(733-747)Online publication date: 21-Oct-2023
https://doi.org/10.1007/978-981-99-7254-8_57
Zhu LLi PWei YSong XWang Y(2021)Processing Approximate KNN Query Based on Data Source Selection2021 International Conference on Intelligent Computing, Automation and Applications (ICAA)10.1109/ICAA53760.2021.00121(672-676)Online publication date: Jun-2021
https://doi.org/10.1109/ICAA53760.2021.00121
Zhu LLi XWei YMa QMeng W(2021)Integrating Real-Time Entity Resolution with Top-N Join Query ProcessingKnowledge Science, Engineering and Management10.1007/978-3-030-82153-1_10(111-123)Online publication date: 7-Aug-2021
https://doi.org/10.1007/978-3-030-82153-1_10
Zhu LCheng YWang YMa QMeng W(2020) Evaluating Top- N Join Queries with Real-time Entity Resolution Journal of Physics: Conference Series10.1088/1742-6596/1575/1/0120841575(012084)Online publication date: 14-Jul-2020
https://doi.org/10.1088/1742-6596/1575/1/012084
Zhu LLu WMa QMeng W(2018) Region-tree Based Sorted List Indexing for Real-time Entity Resolution in n -Dimensional Data Space IOP Conference Series: Materials Science and Engineering10.1088/1757-899X/466/1/012025466(012025)Online publication date: 28-Dec-2018
https://doi.org/10.1088/1757-899X/466/1/012025
Prajapati GBhartiya R(2017)High dimensional nearest neighbor search considering outliers based on fuzzy membership2017 Computing Conference10.1109/SAI.2017.8252127(363-371)Online publication date: Jul-2017
https://doi.org/10.1109/SAI.2017.8252127
Singh VZong BSingh A(2016)Nearest Keyword Set Search in Multi-Dimensional DatasetsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.249254928:3(741-755)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1109/TKDE.2015.2492549
Zhu LLiu FMeng WMa QWang YYuan F(2016)Evaluating Top-N queries in n-dimensional normed spacesInformation Sciences: an International Journal10.1016/j.ins.2016.09.035374:C(255-275)Online publication date: 20-Dec-2016
https://dl.acm.org/doi/10.1016/j.ins.2016.09.035
Schuh MWylie TAngryk R(2013)Improving the Performance of High-Dimensional kNN Retrieval through Localized Dataspace Segmentation and Hybrid IndexingProceedings of the 17th East European Conference on Advances in Databases and Information Systems - Volume 813310.5555/2939301.2939313(344-357)Online publication date: 1-Sep-2013
https://dl.acm.org/doi/10.5555/2939301.2939313
Schuh MWylie TAngryk R(2013)Improving the Performance of High-Dimensional kNN Retrieval through Localized Dataspace Segmentation and Hybrid IndexingAdvances in Databases and Information Systems10.1007/978-3-642-40683-6_26(344-357)Online publication date: 2013
https://doi.org/10.1007/978-3-642-40683-6_26
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten