Skip to main content
Log in

Distributed high dimensional indexing for k-NN search

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Although conventional index structures provide various nearest-neighbor search algorithms for high-dimensional data, there are additional requirements to increase search performances, as well as to support index scalability for large-scale datasets. To support these requirements, we propose a distributed high-dimensional index structure based on cluster systems, called a Distributed Vector Approximation-tree (DVA-tree), which is a two-level structure consisting of a hybrid spill-tree and Vector Approximation files (VA-files). We also describe the algorithms used for constructing the DVA-tree over multiple machines and performing distributed k-nearest neighbors (NN) searches. To evaluate performances of the DVA-tree, we conduct an experimental study using both real and synthetic datasets. The results show that our proposed method has significant performance advantages over existing index structures on different kinds of dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Nikos K, Christos F, Ibrahim K (1996) Declustering spatial databases on a multi-computer architecture. In: Proceedings of the international conference on extending database technology. LNCS, vol 1057, pp 592–614

    Google Scholar 

  2. Bernd S, Scott TL (1999) Master-client R-trees: a new parallel R-tree architecture. In: Proceedings of the international conference on scientific and statistical database management, pp 68–77

    Google Scholar 

  3. Ting L, Charles R, Henry AR (2007) Clustering billions of images with large scale nearest neighbor search. In: Proceedings of the IEEE workshop on applications of computer vision, pp 28–33

    Google Scholar 

  4. Roger W, Klemens B, Hans JS (2000) Interactive-time similarity search for large image collection using parallel VA-files. In: Proceedings of the European conference on research and advanced technology for digital libraries. LNCS, vol 1923, pp 83–92

    Chapter  Google Scholar 

  5. Jaewoo C, Ahreum L (2008) Parallel high-dimensional index structure for content-based information retrieval. In: Proceedings of the IEEE international conference on computer and information technology, pp 101–106

    Chapter  Google Scholar 

  6. Chi Z, Arvind K, Randolph YW (2004) SkipIndex: towards a scalable peer-to-peer index service for high dimensional data. Technical report TR-703-04, Princeton University

  7. Beomseok N, Alan S (2005) DiST: fully decentralized indexing for querying distributed multidimensional datasets. Technical report CS-TR-4720 and UMIACS-TR-2005-28, Maryland University

  8. Jagadish HV, Beng CO, Quang HV, Rong Z, Aoying Z (2006) VBI-tree: a peer-to-peer framework for supporting multi-dimensional indexing schemes. In: Proceedings of the international conference on data engineering, p 34. doi:10.1109/ICDE.2006.169

    Google Scholar 

  9. Mayank B, Tyson C, Prasanna G (2005) LSH forest: self-tuning indexes for similarity search. In: Proceedings of the international conference on world wide web, pp 353–366

    Google Scholar 

  10. Parisa H, Sebastian M, Philippe C-M, Karl A (2008) LSH at large-distributed KNN search in high dimensions. In: Proceedings of the international workshop on the web and databases

    Google Scholar 

  11. Roger W, Hans JS, Stephen B (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the international conference on very large data bases, pp 194–205

    Google Scholar 

  12. Roger W, Stephen B (1997) An approximation-based data structure for similarity search. Technical report 24, ESPRIT project HERMES (no 9141)

  13. John TR (1981) The K-D-B-tree: a search structure for large multidimensional dynamic indexes. In: Proceedings of the international ACM SIGMOD conference. doi:10.1145/582318.582321

    Google Scholar 

  14. David BL, Betty S (1989) A robust multi-attribute search structure. In: Proceedings of the IEEE international conference on data engineering, pp 296–304

    Google Scholar 

  15. Norbert B, Hans PK (1990) The R-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the international ACM SIGMOD conference, pp 322–331

    Google Scholar 

  16. Stefan B, Daniel AK, Hans PK (1996) The X-tree: an index structure for high-dimensional data. In: Proceedings of the international conference on very large data bases, pp 28–39

    Google Scholar 

  17. Paolo C, Marco P, Pavel Z (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the international conference on very large data bases, pp 426–435

    Google Scholar 

  18. Ting L, Andrew WM, Alexander G, Ke Y (2004) An investigation of practical approximate nearest neighbor algorithms. In: Proceedings of the international conference on neural information processing systems, pp 825–832

    Google Scholar 

  19. Christian B, Hans PK (2000) Dynamically optimizing high-dimensional index structures. In: Proceedings of the international conference on extending database technology. LNCS, vol 1777, pp 36–50

    Google Scholar 

  20. Guang HC, Xiaoming Z, Dragutin P, Chin WC (2002) An efficient indexing method for nearest neighbor searches in high-dimensional image databases. IEEE Trans Multimed 4(1):76–87

    Article  Google Scholar 

  21. Sung GH, Jae WC (2000) A new high-dimensional index structure using a cell-based filtering technique. In: Proceedings of the international conference on database systems for advanced applications. LNCS, vol 1884, pp 79–92

    Google Scholar 

  22. Aristides G, Piotr I, Rajeev M (1999) Similarity search in high dimensions via hashing. In: Proceedings of the international conference on very large data bases, pp 518–529

    Google Scholar 

  23. Edith C, Mayur D, Shinji F, Aristides G, Piotr I, Rajeev M, Jeffrey DU, Cheng Y (2000) Finding interesting associations without support pruning. In: Proceedings of the IEEE international conference on data engineering, pp 64–78

    Google Scholar 

  24. Taro Y (1976) Statistics: an introductory analysis

    Google Scholar 

  25. Paolo C, Marco P, Pavel Z (1998) A cost model for similarity queries in metric spaces. In: Proceedings of the Australasian database conference, pp 65–76

    Google Scholar 

  26. http://www-deis.unibo.it/research/Mtree

  27. Airphoto dataset, http://vivaldi.ece.ucsb.edu/Manjunath/research.htm

  28. http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html

Download references

Acknowledgements

This work was supported by the IT R&D program of MKE/KEIT. [10038768, The Development of Supercomputing System for the Genome Analysis].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyu-Chul Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, HH., Lee, MY. & Lee, KC. Distributed high dimensional indexing for k-NN search. J Supercomput 62, 1362–1384 (2012). https://doi.org/10.1007/s11227-012-0800-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-012-0800-z

Keywords

Navigation