Skip to main content
Log in

Effective optimizations of cluster-based nearest neighbor search in high-dimensional space

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Nearest neighbor (NN) search in high-dimensional space plays a fundamental role in large-scale image retrieval. It seeks efficient indexing and search techniques, both of which are simultaneously essential for similarity search and semantic analysis. However, in recent years, there has been a rare breakthrough. Achievement of current techniques for NN search is far from satisfactory, especially for exact NN search. A recently proposed method, HB, addresses the exact NN search efficiently in high-dimensional space. It benefits from cluster-based techniques which can generate more compact representation of the data set than other techniques by exploiting interdimensional correlations. However, HB suffers from huge cost for lower bound computations and provides no further pruning scheme for points in candidate clusters. In this paper, we extend the HB method to address exact NN search in correlated, high-dimensional vector data sets extracted from large-scale image database by introducing two new pruning/selection techniques and we call it HB+. The first approach aims at selecting more quickly the subset of hyperplanes/clusters that must be considered. The second technique prunes irrelevant points in the selected subset of clusters. Performed experiments show the improvement of HB+ with respect to HB in terms of efficiency (I/O cost and CPU response time) and also demonstrate the superiority over other exact NN indexes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. In the sequel, we shall use “\(k\)-NN search” to refer to “exact \(k\)-NN search” for simplicity.

  2. In the sequel, we shall use the term “lower bound” to refer to the lower bound distance between a query and a cluster for simplicity.

  3. This concept refers to our suggestion of partially selecting separating hyperplanes. In fact, in experimentation we have tested the empirical results by varying the value of \(\alpha\), the proportion of selected separating hyperplanes, during the experiments.

  4. Download from http://archive.ics.uci.edu/ml/datasets/Corel+Image+Features/.

  5. Download from http://vision.ece.ucsb.edu/download.html.

  6. Download from http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm.

References

  1. Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Patt. Anal. Mach. Intell. IEEE. Trans. 29(3), 394–410 (2007)

    Article  Google Scholar 

  2. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco 518–529 1999

  3. Athitsos, V., Potamias, M., Papapetrou, P., Kollios, G.: Nearest neighbor retrieval using distance-based hashing. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, Washington, IEEE Computer Society 327–336 (2008)

  4. Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: Proceedings of the 35th SIGMOD international conference on Management of data, New York, USA SIGMOD ’09, pp. 563–576, ACM (2009)

  5. Gan, J., Feng, J., Fang, Q., Ng, W.: Locality sensitive hashing scheme based on dyanmic collision counting. In: Proceedings of the 38th SIGMOD international conference on Management of data. SIGMOD’12, pp. 541–552, ACM (2012)

  6. Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 117–128 (2011)

    Article  Google Scholar 

  7. Weber, R., Schek, H-Jörg., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98 Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc. San Francisco, USA pp. 194–205 (1998)

  8. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD 84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data, New York, USA pp. 47–57, ACM (1984)

  9. Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec. 19(2), 322–331 (1990)

    Article  Google Scholar 

  10. David A., Ramesh, W.J.: Similarity indexing with the ss-tree. In: Proceedings of the 12th International Conference on Data Engineering. ICDE’96, pp. 516–523 (1996)

  11. Katayama, N., Satoh, S.: The sr-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the 23rd ACM SIGMOD international conference on Management of data, SIGMOD’97, pp. 369–380 (1997)

  12. Ramaswamy S., Rose, K.: Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans. Knowl. Data. Eng., vol. 23, no. 6, pp. 815–830 June (2011)

  13. Jagadish, H.V.: Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: idistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database. Syst. 30(2), 364–397 (2005)

    Article  Google Scholar 

  14. Hwang, Y., Han, B., Ahn, H.-K.: A fast nearest neighbor search algorithm by nonlinear embedding, in Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. IEEE pp. 3053–3060 (2012)

  15. Achlioptas, D.: Database-friendly random projections:johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  16. Bingham E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data in SIGKDD pp. 245–250 (2001)

  17. Li, P., Hastie, T.J., Church K.W.: Very sparse random projections in SIGKDD, pp. 287–296 (2006)

  18. Jeffrey Scott Vitter: Algorithms and data structures for external memory. Found. Trends. Theor. Comput. Sci. 2(4), 305–474 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  19. Chakrabarti K., Mehrotra, S.: Local dimensionality reduction: A new approach to indexing high dimensional spaces in VLDB ’00 In: Proceedings of the 26th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, USA pp. 89–100 (2000)

  20. Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)

    Article  Google Scholar 

  21. Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In VLDB ’96: Proceedings of the 22th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, USA pp. 28–39 (1996)

  22. Robinson, J.T. The k-d-b-tree: a search structure for large multidimensional dynamic indexes. In SIGMOD ’81: Proceedings of the 1981 ACM SIGMOD international conference on Management of data, ACM, New York, USA pp. 10–18 (1981)

  23. Jon Louis Bentley: Multidimensional binary search trees used for associative searching. Commun. ACM. 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  24. Berchtold, S., Bohm, C., Jagadish, H.V., Kriegel, H.-P., Sander, J.: Independent quantization: an index compression technique for high-dimensional data spaces In ICDE ’00: Proceedings of the 16th International Conference on Data Engineering, IEEE Comput. Soc. Washington, USA 2000, p. 577 (2000)

  25. Ferhatosmanoglu, H., Tuncel, E., Agrawal, D.: Vector approximation based indexing for non-uniform high dimensional data sets. In CIKM ’00: Proceedings of the ninth international conference on Information and knowledge management, ACM. New York, USA pp. 202–209 (2000)

  26. Cui, J., Zhou, S., Sun, J.: Efficient high-dimensional indexing by sorting principal component. Pattern Recogn. Lett. 28(16), 2412–2418 (2007)

    Article  Google Scholar 

  27. Ravi, K.V., Divyakant Agrawal, K., Singh, A.: Dimensionality reduction for similarity searching in dynamic databases. SIGMOD Rec. 27(2), 166–176 (1998)

    Article  Google Scholar 

  28. Van der Maaten, L.J.P., Postma, E.O., Van Den Herik, H.J., Dimensionality reduction : a comparative review. October, vol. 10, no. February, pp. 35, (2009)

  29. Lian X., Chen, L.: A general cost model for dimensionality reduction in high dimensional spaces. In ICDE ’07: Proceedings of the 23th International Conference on Data Engineering, pp. 66–75 (2007)

  30. Berchtold, S., Böhm, C., Kriegal, H.-P.: The pyramid-technique: towards breaking the curse of dimensionality. SIGMOD Rec. 27(2), 142–153 (1998)

    Article  Google Scholar 

  31. Christos F., Searching multimedia databases by content, vol. 3, Springer, New York (1996)

  32. Nick K., Beng C.O., Heng T.S., Tung A.K.H.: Ldc: Enabling search by partial distance in a hyper-dimensional space. In ICDE ’04: Proceedings of the 20th International Conference on Data Engineering, IEEE Computer Society, Washington, USA p. 6 (2004)

  33. Gray, R.M.: Vector quantization. ASSP Magazine. IEEE. vol. 1, no. 2, pp. 4–29 (1984)

  34. Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE. transac. Image Processing: a publication of the IEEE Signal Processing Society, vol. 22, no. 12, pp. 5071–5084 (2013)

  35. Zhang, L., Gao, Y., Hong, C., Feng, Y., Zhu, J., Cai, D.: Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. Cybernetics. IEEE. Trans. 44(8), 1408–1419 (2013)

    Article  Google Scholar 

  36. Zhang, L., Yang, Y., Gao, Y., Yi, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly-supervised images. Image. Processing. IEEE. Trans.23(9), 4150–4159 (2014)

    Article  MathSciNet  Google Scholar 

  37. Zhang, L., Gao, Y., Xia, Y., Dai, Q., Li, X.: A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. Ind Electron. IEEE. Trans. (2014)

Download references

Acknowledgments

Project supported by the National Natural Science Foundation of China (Grant No. 61173089, 61202179 and 61472298), SRF for ROCS, SEM and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, X., Cui, J., Liu, Y. et al. Effective optimizations of cluster-based nearest neighbor search in high-dimensional space. Multimedia Systems 23, 139–153 (2017). https://doi.org/10.1007/s00530-014-0444-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-014-0444-3

Keywords

Navigation