Effective optimizations of cluster-based nearest neighbor search in high-dimensional space

Feng, Xiaokang; Cui, Jiangtao; Liu, Yingfan; Li, Hui

doi:10.1007/s00530-014-0444-3

Effective optimizations of cluster-based nearest neighbor search in high-dimensional space

Special Issue Paper
Published: 24 December 2014

Volume 23, pages 139–153, (2017)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Xiaokang Feng¹,
Jiangtao Cui¹,
Yingfan Liu¹ &
…
Hui Li¹

408 Accesses
2 Citations
Explore all metrics

Abstract

Nearest neighbor (NN) search in high-dimensional space plays a fundamental role in large-scale image retrieval. It seeks efficient indexing and search techniques, both of which are simultaneously essential for similarity search and semantic analysis. However, in recent years, there has been a rare breakthrough. Achievement of current techniques for NN search is far from satisfactory, especially for exact NN search. A recently proposed method, HB, addresses the exact NN search efficiently in high-dimensional space. It benefits from cluster-based techniques which can generate more compact representation of the data set than other techniques by exploiting interdimensional correlations. However, HB suffers from huge cost for lower bound computations and provides no further pruning scheme for points in candidate clusters. In this paper, we extend the HB method to address exact NN search in correlated, high-dimensional vector data sets extracted from large-scale image database by introducing two new pruning/selection techniques and we call it HB+. The first approach aims at selecting more quickly the subset of hyperplanes/clusters that must be considered. The second technique prunes irrelevant points in the selected subset of clusters. Performed experiments show the improvement of HB+ with respect to HB in terms of efficiency (I/O cost and CPU response time) and also demonstrate the superiority over other exact NN indexes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

K-Means algorithm based on multi-feature-induced order

Article 09 April 2024

Sparse semi-supervised multi-label feature selection based on latent representation

Article Open access 17 April 2024

Notes

In the sequel, we shall use “\(k\)-NN search” to refer to “exact \(k\)-NN search” for simplicity.
In the sequel, we shall use the term “lower bound” to refer to the lower bound distance between a query and a cluster for simplicity.
This concept refers to our suggestion of partially selecting separating hyperplanes. In fact, in experimentation we have tested the empirical results by varying the value of \(\alpha\), the proportion of selected separating hyperplanes, during the experiments.
Download from http://archive.ics.uci.edu/ml/datasets/Corel+Image+Features/.
Download from http://vision.ece.ucsb.edu/download.html.
Download from http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm.

References

Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Patt. Anal. Mach. Intell. IEEE. Trans. 29(3), 394–410 (2007)
Article Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco 518–529 1999
Athitsos, V., Potamias, M., Papapetrou, P., Kollios, G.: Nearest neighbor retrieval using distance-based hashing. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, Washington, IEEE Computer Society 327–336 (2008)
Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: Proceedings of the 35th SIGMOD international conference on Management of data, New York, USA SIGMOD ’09, pp. 563–576, ACM (2009)
Gan, J., Feng, J., Fang, Q., Ng, W.: Locality sensitive hashing scheme based on dyanmic collision counting. In: Proceedings of the 38th SIGMOD international conference on Management of data. SIGMOD’12, pp. 541–552, ACM (2012)
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 117–128 (2011)
Article Google Scholar
Weber, R., Schek, H-Jörg., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98 Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc. San Francisco, USA pp. 194–205 (1998)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD 84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data, New York, USA pp. 47–57, ACM (1984)
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec. 19(2), 322–331 (1990)
Article Google Scholar
David A., Ramesh, W.J.: Similarity indexing with the ss-tree. In: Proceedings of the 12th International Conference on Data Engineering. ICDE’96, pp. 516–523 (1996)
Katayama, N., Satoh, S.: The sr-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the 23rd ACM SIGMOD international conference on Management of data, SIGMOD’97, pp. 369–380 (1997)
Ramaswamy S., Rose, K.: Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans. Knowl. Data. Eng., vol. 23, no. 6, pp. 815–830 June (2011)
Jagadish, H.V.: Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: idistance: an adaptive B⁺-tree based indexing method for nearest neighbor search. ACM Trans. Database. Syst. 30(2), 364–397 (2005)
Article Google Scholar
Hwang, Y., Han, B., Ahn, H.-K.: A fast nearest neighbor search algorithm by nonlinear embedding, in Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. IEEE pp. 3053–3060 (2012)
Achlioptas, D.: Database-friendly random projections:johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)
Article MathSciNet MATH Google Scholar
Bingham E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data in SIGKDD pp. 245–250 (2001)
Li, P., Hastie, T.J., Church K.W.: Very sparse random projections in SIGKDD, pp. 287–296 (2006)
Jeffrey Scott Vitter: Algorithms and data structures for external memory. Found. Trends. Theor. Comput. Sci. 2(4), 305–474 (2008)
Article MathSciNet MATH Google Scholar
Chakrabarti K., Mehrotra, S.: Local dimensionality reduction: A new approach to indexing high dimensional spaces in VLDB ’00 In: Proceedings of the 26th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, USA pp. 89–100 (2000)
Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)
Article Google Scholar
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In VLDB ’96: Proceedings of the 22th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, USA pp. 28–39 (1996)
Robinson, J.T. The k-d-b-tree: a search structure for large multidimensional dynamic indexes. In SIGMOD ’81: Proceedings of the 1981 ACM SIGMOD international conference on Management of data, ACM, New York, USA pp. 10–18 (1981)
Jon Louis Bentley: Multidimensional binary search trees used for associative searching. Commun. ACM. 18(9), 509–517 (1975)
Article MATH Google Scholar
Berchtold, S., Bohm, C., Jagadish, H.V., Kriegel, H.-P., Sander, J.: Independent quantization: an index compression technique for high-dimensional data spaces In ICDE ’00: Proceedings of the 16th International Conference on Data Engineering, IEEE Comput. Soc. Washington, USA 2000, p. 577 (2000)
Ferhatosmanoglu, H., Tuncel, E., Agrawal, D.: Vector approximation based indexing for non-uniform high dimensional data sets. In CIKM ’00: Proceedings of the ninth international conference on Information and knowledge management, ACM. New York, USA pp. 202–209 (2000)
Cui, J., Zhou, S., Sun, J.: Efficient high-dimensional indexing by sorting principal component. Pattern Recogn. Lett. 28(16), 2412–2418 (2007)
Article Google Scholar
Ravi, K.V., Divyakant Agrawal, K., Singh, A.: Dimensionality reduction for similarity searching in dynamic databases. SIGMOD Rec. 27(2), 166–176 (1998)
Article Google Scholar
Van der Maaten, L.J.P., Postma, E.O., Van Den Herik, H.J., Dimensionality reduction : a comparative review. October, vol. 10, no. February, pp. 35, (2009)
Lian X., Chen, L.: A general cost model for dimensionality reduction in high dimensional spaces. In ICDE ’07: Proceedings of the 23th International Conference on Data Engineering, pp. 66–75 (2007)
Berchtold, S., Böhm, C., Kriegal, H.-P.: The pyramid-technique: towards breaking the curse of dimensionality. SIGMOD Rec. 27(2), 142–153 (1998)
Article Google Scholar
Christos F., Searching multimedia databases by content, vol. 3, Springer, New York (1996)
Nick K., Beng C.O., Heng T.S., Tung A.K.H.: Ldc: Enabling search by partial distance in a hyper-dimensional space. In ICDE ’04: Proceedings of the 20th International Conference on Data Engineering, IEEE Computer Society, Washington, USA p. 6 (2004)
Gray, R.M.: Vector quantization. ASSP Magazine. IEEE. vol. 1, no. 2, pp. 4–29 (1984)
Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE. transac. Image Processing: a publication of the IEEE Signal Processing Society, vol. 22, no. 12, pp. 5071–5084 (2013)
Zhang, L., Gao, Y., Hong, C., Feng, Y., Zhu, J., Cai, D.: Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. Cybernetics. IEEE. Trans. 44(8), 1408–1419 (2013)
Article Google Scholar
Zhang, L., Yang, Y., Gao, Y., Yi, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly-supervised images. Image. Processing. IEEE. Trans.23(9), 4150–4159 (2014)
Article MathSciNet Google Scholar
Zhang, L., Gao, Y., Xia, Y., Dai, Q., Li, X.: A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. Ind Electron. IEEE. Trans. (2014)

Download references

Acknowledgments

Project supported by the National Natural Science Foundation of China (Grant No. 61173089, 61202179 and 61472298), SRF for ROCS, SEM and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

School of Computer Science and Technology, Xidian University, Xi’an, 710071, China
Xiaokang Feng, Jiangtao Cui, Yingfan Liu & Hui Li

Authors

Xiaokang Feng
View author publications
You can also search for this author in PubMed Google Scholar
Jiangtao Cui
View author publications
You can also search for this author in PubMed Google Scholar
Yingfan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hui Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, X., Cui, J., Liu, Y. et al. Effective optimizations of cluster-based nearest neighbor search in high-dimensional space. Multimedia Systems 23, 139–153 (2017). https://doi.org/10.1007/s00530-014-0444-3

Download citation

Published: 24 December 2014
Issue Date: February 2017
DOI: https://doi.org/10.1007/s00530-014-0444-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective optimizations of cluster-based nearest neighbor search in high-dimensional space

Abstract

Access this article

Similar content being viewed by others

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

K-Means algorithm based on multi-feature-induced order

Sparse semi-supervised multi-label feature selection based on latent representation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective optimizations of cluster-based nearest neighbor search in high-dimensional space

Abstract

Access this article

Similar content being viewed by others

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

K-Means algorithm based on multi-feature-induced order

Sparse semi-supervised multi-label feature selection based on latent representation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation