Abstract
Nearest neighbor (NN) search in high-dimensional space plays a fundamental role in large-scale image retrieval. It seeks efficient indexing and search techniques, both of which are simultaneously essential for similarity search and semantic analysis. However, in recent years, there has been a rare breakthrough. Achievement of current techniques for NN search is far from satisfactory, especially for exact NN search. A recently proposed method, HB, addresses the exact NN search efficiently in high-dimensional space. It benefits from cluster-based techniques which can generate more compact representation of the data set than other techniques by exploiting interdimensional correlations. However, HB suffers from huge cost for lower bound computations and provides no further pruning scheme for points in candidate clusters. In this paper, we extend the HB method to address exact NN search in correlated, high-dimensional vector data sets extracted from large-scale image database by introducing two new pruning/selection techniques and we call it HB+. The first approach aims at selecting more quickly the subset of hyperplanes/clusters that must be considered. The second technique prunes irrelevant points in the selected subset of clusters. Performed experiments show the improvement of HB+ with respect to HB in terms of efficiency (I/O cost and CPU response time) and also demonstrate the superiority over other exact NN indexes.
Similar content being viewed by others
Notes
In the sequel, we shall use “\(k\)-NN search” to refer to “exact \(k\)-NN search” for simplicity.
In the sequel, we shall use the term “lower bound” to refer to the lower bound distance between a query and a cluster for simplicity.
This concept refers to our suggestion of partially selecting separating hyperplanes. In fact, in experimentation we have tested the empirical results by varying the value of \(\alpha\), the proportion of selected separating hyperplanes, during the experiments.
Download from http://archive.ics.uci.edu/ml/datasets/Corel+Image+Features/.
Download from http://vision.ece.ucsb.edu/download.html.
Download from http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm.
References
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Patt. Anal. Mach. Intell. IEEE. Trans. 29(3), 394–410 (2007)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco 518–529 1999
Athitsos, V., Potamias, M., Papapetrou, P., Kollios, G.: Nearest neighbor retrieval using distance-based hashing. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, Washington, IEEE Computer Society 327–336 (2008)
Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: Proceedings of the 35th SIGMOD international conference on Management of data, New York, USA SIGMOD ’09, pp. 563–576, ACM (2009)
Gan, J., Feng, J., Fang, Q., Ng, W.: Locality sensitive hashing scheme based on dyanmic collision counting. In: Proceedings of the 38th SIGMOD international conference on Management of data. SIGMOD’12, pp. 541–552, ACM (2012)
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 117–128 (2011)
Weber, R., Schek, H-Jörg., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98 Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc. San Francisco, USA pp. 194–205 (1998)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD 84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data, New York, USA pp. 47–57, ACM (1984)
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec. 19(2), 322–331 (1990)
David A., Ramesh, W.J.: Similarity indexing with the ss-tree. In: Proceedings of the 12th International Conference on Data Engineering. ICDE’96, pp. 516–523 (1996)
Katayama, N., Satoh, S.: The sr-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the 23rd ACM SIGMOD international conference on Management of data, SIGMOD’97, pp. 369–380 (1997)
Ramaswamy S., Rose, K.: Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans. Knowl. Data. Eng., vol. 23, no. 6, pp. 815–830 June (2011)
Jagadish, H.V.: Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: idistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database. Syst. 30(2), 364–397 (2005)
Hwang, Y., Han, B., Ahn, H.-K.: A fast nearest neighbor search algorithm by nonlinear embedding, in Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. IEEE pp. 3053–3060 (2012)
Achlioptas, D.: Database-friendly random projections:johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)
Bingham E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data in SIGKDD pp. 245–250 (2001)
Li, P., Hastie, T.J., Church K.W.: Very sparse random projections in SIGKDD, pp. 287–296 (2006)
Jeffrey Scott Vitter: Algorithms and data structures for external memory. Found. Trends. Theor. Comput. Sci. 2(4), 305–474 (2008)
Chakrabarti K., Mehrotra, S.: Local dimensionality reduction: A new approach to indexing high dimensional spaces in VLDB ’00 In: Proceedings of the 26th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, USA pp. 89–100 (2000)
Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In VLDB ’96: Proceedings of the 22th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, USA pp. 28–39 (1996)
Robinson, J.T. The k-d-b-tree: a search structure for large multidimensional dynamic indexes. In SIGMOD ’81: Proceedings of the 1981 ACM SIGMOD international conference on Management of data, ACM, New York, USA pp. 10–18 (1981)
Jon Louis Bentley: Multidimensional binary search trees used for associative searching. Commun. ACM. 18(9), 509–517 (1975)
Berchtold, S., Bohm, C., Jagadish, H.V., Kriegel, H.-P., Sander, J.: Independent quantization: an index compression technique for high-dimensional data spaces In ICDE ’00: Proceedings of the 16th International Conference on Data Engineering, IEEE Comput. Soc. Washington, USA 2000, p. 577 (2000)
Ferhatosmanoglu, H., Tuncel, E., Agrawal, D.: Vector approximation based indexing for non-uniform high dimensional data sets. In CIKM ’00: Proceedings of the ninth international conference on Information and knowledge management, ACM. New York, USA pp. 202–209 (2000)
Cui, J., Zhou, S., Sun, J.: Efficient high-dimensional indexing by sorting principal component. Pattern Recogn. Lett. 28(16), 2412–2418 (2007)
Ravi, K.V., Divyakant Agrawal, K., Singh, A.: Dimensionality reduction for similarity searching in dynamic databases. SIGMOD Rec. 27(2), 166–176 (1998)
Van der Maaten, L.J.P., Postma, E.O., Van Den Herik, H.J., Dimensionality reduction : a comparative review. October, vol. 10, no. February, pp. 35, (2009)
Lian X., Chen, L.: A general cost model for dimensionality reduction in high dimensional spaces. In ICDE ’07: Proceedings of the 23th International Conference on Data Engineering, pp. 66–75 (2007)
Berchtold, S., Böhm, C., Kriegal, H.-P.: The pyramid-technique: towards breaking the curse of dimensionality. SIGMOD Rec. 27(2), 142–153 (1998)
Christos F., Searching multimedia databases by content, vol. 3, Springer, New York (1996)
Nick K., Beng C.O., Heng T.S., Tung A.K.H.: Ldc: Enabling search by partial distance in a hyper-dimensional space. In ICDE ’04: Proceedings of the 20th International Conference on Data Engineering, IEEE Computer Society, Washington, USA p. 6 (2004)
Gray, R.M.: Vector quantization. ASSP Magazine. IEEE. vol. 1, no. 2, pp. 4–29 (1984)
Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE. transac. Image Processing: a publication of the IEEE Signal Processing Society, vol. 22, no. 12, pp. 5071–5084 (2013)
Zhang, L., Gao, Y., Hong, C., Feng, Y., Zhu, J., Cai, D.: Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. Cybernetics. IEEE. Trans. 44(8), 1408–1419 (2013)
Zhang, L., Yang, Y., Gao, Y., Yi, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly-supervised images. Image. Processing. IEEE. Trans.23(9), 4150–4159 (2014)
Zhang, L., Gao, Y., Xia, Y., Dai, Q., Li, X.: A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. Ind Electron. IEEE. Trans. (2014)
Acknowledgments
Project supported by the National Natural Science Foundation of China (Grant No. 61173089, 61202179 and 61472298), SRF for ROCS, SEM and Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Feng, X., Cui, J., Liu, Y. et al. Effective optimizations of cluster-based nearest neighbor search in high-dimensional space. Multimedia Systems 23, 139–153 (2017). https://doi.org/10.1007/s00530-014-0444-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-014-0444-3