Distributed high dimensional indexing for k-NN search

Choi, Hyun-Hwa; Lee, Mi-Young; Lee, Kyu-Chul

doi:10.1007/s11227-012-0800-z

Distributed high dimensional indexing for k-NN search

Published: 15 June 2012

Volume 62, pages 1362–1384, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hyun-Hwa Choi¹,
Mi-Young Lee¹ &
Kyu-Chul Lee²

367 Accesses
1 Citation
Explore all metrics

Abstract

Although conventional index structures provide various nearest-neighbor search algorithms for high-dimensional data, there are additional requirements to increase search performances, as well as to support index scalability for large-scale datasets. To support these requirements, we propose a distributed high-dimensional index structure based on cluster systems, called a Distributed Vector Approximation-tree (DVA-tree), which is a two-level structure consisting of a hybrid spill-tree and Vector Approximation files (VA-files). We also describe the algorithms used for constructing the DVA-tree over multiple machines and performing distributed k-nearest neighbors (NN) searches. To evaluate performances of the DVA-tree, we conduct an experimental study using both real and synthetic datasets. The results show that our proposed method has significant performance advantages over existing index structures on different kinds of dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Making data visualization more efficient and effective: a survey

Article 19 November 2019

Big data analytics on Apache Spark

Article 13 October 2016

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Article Open access 05 June 2020

References

Nikos K, Christos F, Ibrahim K (1996) Declustering spatial databases on a multi-computer architecture. In: Proceedings of the international conference on extending database technology. LNCS, vol 1057, pp 592–614
Google Scholar
Bernd S, Scott TL (1999) Master-client R-trees: a new parallel R-tree architecture. In: Proceedings of the international conference on scientific and statistical database management, pp 68–77
Google Scholar
Ting L, Charles R, Henry AR (2007) Clustering billions of images with large scale nearest neighbor search. In: Proceedings of the IEEE workshop on applications of computer vision, pp 28–33
Google Scholar
Roger W, Klemens B, Hans JS (2000) Interactive-time similarity search for large image collection using parallel VA-files. In: Proceedings of the European conference on research and advanced technology for digital libraries. LNCS, vol 1923, pp 83–92
Chapter Google Scholar
Jaewoo C, Ahreum L (2008) Parallel high-dimensional index structure for content-based information retrieval. In: Proceedings of the IEEE international conference on computer and information technology, pp 101–106
Chapter Google Scholar
Chi Z, Arvind K, Randolph YW (2004) SkipIndex: towards a scalable peer-to-peer index service for high dimensional data. Technical report TR-703-04, Princeton University
Beomseok N, Alan S (2005) DiST: fully decentralized indexing for querying distributed multidimensional datasets. Technical report CS-TR-4720 and UMIACS-TR-2005-28, Maryland University
Jagadish HV, Beng CO, Quang HV, Rong Z, Aoying Z (2006) VBI-tree: a peer-to-peer framework for supporting multi-dimensional indexing schemes. In: Proceedings of the international conference on data engineering, p 34. doi:10.1109/ICDE.2006.169
Google Scholar
Mayank B, Tyson C, Prasanna G (2005) LSH forest: self-tuning indexes for similarity search. In: Proceedings of the international conference on world wide web, pp 353–366
Google Scholar
Parisa H, Sebastian M, Philippe C-M, Karl A (2008) LSH at large-distributed KNN search in high dimensions. In: Proceedings of the international workshop on the web and databases
Google Scholar
Roger W, Hans JS, Stephen B (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the international conference on very large data bases, pp 194–205
Google Scholar
Roger W, Stephen B (1997) An approximation-based data structure for similarity search. Technical report 24, ESPRIT project HERMES (no 9141)
John TR (1981) The K-D-B-tree: a search structure for large multidimensional dynamic indexes. In: Proceedings of the international ACM SIGMOD conference. doi:10.1145/582318.582321
Google Scholar
David BL, Betty S (1989) A robust multi-attribute search structure. In: Proceedings of the IEEE international conference on data engineering, pp 296–304
Google Scholar
Norbert B, Hans PK (1990) The R^∗-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the international ACM SIGMOD conference, pp 322–331
Google Scholar
Stefan B, Daniel AK, Hans PK (1996) The X-tree: an index structure for high-dimensional data. In: Proceedings of the international conference on very large data bases, pp 28–39
Google Scholar
Paolo C, Marco P, Pavel Z (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the international conference on very large data bases, pp 426–435
Google Scholar
Ting L, Andrew WM, Alexander G, Ke Y (2004) An investigation of practical approximate nearest neighbor algorithms. In: Proceedings of the international conference on neural information processing systems, pp 825–832
Google Scholar
Christian B, Hans PK (2000) Dynamically optimizing high-dimensional index structures. In: Proceedings of the international conference on extending database technology. LNCS, vol 1777, pp 36–50
Google Scholar
Guang HC, Xiaoming Z, Dragutin P, Chin WC (2002) An efficient indexing method for nearest neighbor searches in high-dimensional image databases. IEEE Trans Multimed 4(1):76–87
Article Google Scholar
Sung GH, Jae WC (2000) A new high-dimensional index structure using a cell-based filtering technique. In: Proceedings of the international conference on database systems for advanced applications. LNCS, vol 1884, pp 79–92
Google Scholar
Aristides G, Piotr I, Rajeev M (1999) Similarity search in high dimensions via hashing. In: Proceedings of the international conference on very large data bases, pp 518–529
Google Scholar
Edith C, Mayur D, Shinji F, Aristides G, Piotr I, Rajeev M, Jeffrey DU, Cheng Y (2000) Finding interesting associations without support pruning. In: Proceedings of the IEEE international conference on data engineering, pp 64–78
Google Scholar
Taro Y (1976) Statistics: an introductory analysis
Google Scholar
Paolo C, Marco P, Pavel Z (1998) A cost model for similarity queries in metric spaces. In: Proceedings of the Australasian database conference, pp 65–76
Google Scholar
http://www-deis.unibo.it/research/Mtree
Airphoto dataset, http://vivaldi.ece.ucsb.edu/Manjunath/research.htm
http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html

Download references

Acknowledgements

This work was supported by the IT R&D program of MKE/KEIT. [10038768, The Development of Supercomputing System for the Genome Analysis].

Author information

Authors and Affiliations

Electronics and Telecommunications Research Institute, Daejeon, Rep. of Korea
Hyun-Hwa Choi & Mi-Young Lee
Chungnam National University, Daejeon, Rep. of Korea
Kyu-Chul Lee

Authors

Hyun-Hwa Choi
View author publications
You can also search for this author in PubMed Google Scholar
Mi-Young Lee
View author publications
You can also search for this author in PubMed Google Scholar
Kyu-Chul Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyu-Chul Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, HH., Lee, MY. & Lee, KC. Distributed high dimensional indexing for k-NN search. J Supercomput 62, 1362–1384 (2012). https://doi.org/10.1007/s11227-012-0800-z

Download citation

Published: 15 June 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s11227-012-0800-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed high dimensional indexing for k-NN search

Abstract

Access this article

Similar content being viewed by others

Making data visualization more efficient and effective: a survey

Big data analytics on Apache Spark

MongoDB Vs PostgreSQL: A comparative study on performance aspects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed high dimensional indexing for k-NN search

Abstract

Access this article

Similar content being viewed by others

Making data visualization more efficient and effective: a survey

Big data analytics on Apache Spark

MongoDB Vs PostgreSQL: A comparative study on performance aspects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation