An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors

Xiaokang, Feng; Jiangtao, Cui; Hui, Li; Yingfan, Liu

doi:10.1007/s11042-018-6987-0

An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors

Published: 07 January 2019

Volume 78, pages 24407–24429, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Feng Xiaokang¹,
Cui Jiangtao ORCID: orcid.org/0000-0001-5569-0780¹,
Li Hui² &
…
Liu Yingfan³

291 Accesses
2 Citations
Explore all metrics

Abstract

In massive multimedia era, the dimension curse and the I/O performance bottleneck have become two major challenges for disk-based Approximate Nearest Neighbor (ANN) search. Hashing is a popular solution to overcome the dimension curse, one promising hashing technique is Locality Sensitive Hashing (LSH). However, most existing LSH indexings incur significant I/O cost during the search due to their low NN candidate hits in each I/O access. We recommend a novel method SC-LSH (SortingCodes-LSH) which combines LSH with another hashing technique (i.e., the discriminative short codes) to lift the hit of NN candidates so as to further boost the ANN search performance. Firstly, we intensify an LSH index and sort all the compound hashing keys according to a linear order to make similar NN candidates distributed locally. Then we generate product quantization (PQ) codes to use them as candidates instead of the original data points. These space-efficient short codes can enable us acquire significantly candidates via much less I/O operations. Moreover, based on theoretical and empirical studies among series of space-filling curves, we finally choose the Gray curve as the linear order to produce better local distribution of candidate data. All these above significantly increase the NN hits during each I/O, which greatly reduce the amount of necessary I/O access. Meanwhile, with the good similarity preserving ability, PQ codes are precise enough to discriminate NNs and thus guarantee the accuracy. Empirical study demonstrates that, comparing with four state-of-the-arts, SC-LSH achieves the best accuracy with significantly smaller I/O cost and space consumption. In fact, depending on the datasets, the I/O cost (resp., space consumption) of our scheme is only 5%-20% (resp., 1%-20%) of the other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient locality-sensitive hashing over high-dimensional streaming data

Article 17 September 2020

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors

Notes

In this work, we suppose that each component of the data point is a one word long integer/float.
http://www.cs.princeton.edu/cass/audio.tar.gz
http://corpus-texmex.irisa.fr/
http://corpus-texmex.irisa.fr/
http://phototour.cs.washington.edu/patches/default.htm

References

Babenko A, Lempitsky V (2012) The inverted multi-index. In: CVPR. IEEE, pp 3069–3076
Böhm C (2000) A cost model for query processing in high dimensional data spaces. ACM Trans Database Syst 25(2):129–178
Article Google Scholar
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp 253–262
Faloutsos C, Roseman S (1989) Fractals for secondary key retrieval. In: PODS, pp 247–252
Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231
Article Google Scholar
Gan J, Feng J, Fang Q, Ng W (2012) Locality sensitive hashing scheme based on dynamic collision counting. In: SIGMOD, pp 541–552
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: VLDB, pp 518–529
Gong Y, Lazebnik S (2011) Iterative quantization: a procrustean approach to learning binary codes. In: CVPR. pp 817–824
He S, Ye G, Hu M, Yang Y, Shen F, Shen HT, Li X (2018) Learning binary codes with local and inner data structure. Neurocomputing 282:32–41
Article Google Scholar
Huang Q, Feng J, Zhang Y, Fang Q, Ng W (2015) Query-aware locality-sensitive hashing for approximate nearest neighbor search. PVLDB 9(1):1–12
Google Scholar
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp 604–613
Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128
Article Google Scholar
Joly A, Buisson O (2008) A posteriori multi-probe locality sensitive hashing. In: ACM multimedia, pp 209–218
Kalantidis Y, Avrithis YS (2014) Locally optimized product quantization for approximate nearest neighbor search. In: CVPR, pp 2329–2336
Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110
Article Google Scholar
Liu Y, Cui J, Huang Z, Li H, Shen HT (2014) SK-LSH: An efficient index structure for approximate nearest neighbor search. PVLDB 7(9):745–756
Google Scholar
Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the 7th IEEE international conference on computer vision, 1999, vol 2. IEEE, pp 1150–1157
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Luo M, Chang X, Li Z, Nie L, Hauptmann AG, Zheng Q (2017) Simple to complex cross-modal learning to rank. Comput Vis Image Underst 163:67–77
Article Google Scholar
Luo X, Nie L, He X, Wu Y, Chen ZD, Xu XS (2018) Fast scalable supervised hashing. In: SIGIR, pp 735–744
Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe lsh: efficient indexing for high-dimensional similarity search. In: VLDB, pp 950–961
Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst (TOIS) 30 (2):13
Article Google Scholar
Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 59–68
Norouzi M, Fleet DJ (2013) Cartesian k-means. In: CVPR, pp 3017–3024
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article MATH Google Scholar
Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: SODA, pp 1186–1195
Park Y, Cafarella MJ, Mozafari B (2015) Neighbor-sensitive hashing. PVLDB 9(3):144–155
Google Scholar
Shen F, Zhou X, Yang Y, Song J, Shen HT, Tao D (2016) A fast optimization method for general binary code learning. IEEE Trans Image Process 25 (12):5610–5621
Article MathSciNet MATH Google Scholar
Shen F, Yang Y, Liu L, Liu W, Dacheng Tao HTS (2017) Asymmetric binary coding for image search. IEEE Trans Multimed 19(9):2022–2032
Article Google Scholar
Shen F, Xu Y, Liu L, Yang Y, Huang Z, Shen HT (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans Pattern Anal Mach Intell
Sun Y, Wang W, Qin J, Zhang Y, Lin X (2014) SRS: Solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1):1–12
Google Scholar
Tao Y, Yi K, Sheng C, Kalnis P (2009) Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD, pp 563–576
Vitter JS (2008) Algorithms and data structures for external memory. Foundations TrendsⓇ, Theor Comput Sci 2(4):305–474
Article MathSciNet MATH Google Scholar
Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: a survey. CoRR 1408.2927
Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, vol 98, pp 194–205
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Proceedings of the 22nd annual conference on neural information processing systems, advances in neural information processing systems 21, Vancouver, British Columbia, Canada, December 8-11, 2008, pp 1753–1760
Zhang PF, Li CX, Liu MY, Nie L, Xu XS (2017) Semi-relaxation supervised hashing for cross-modal retrieval. In: Proceedings of the 2017 ACM on multimedia conference. ACM, pp 1762–1770

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61472298, 61672408, 61702403, U1135002), China 111 Project (No. B16037), China Postdoctoral Science Foundation (No. 2018M633473), Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2015JQ6227), SRF for ROCS, SEM, the Fundamental Research Funds for the Central Universities (No. JB170308, etc.) and the Innovation Fund of Xidian University.

Author information

Authors and Affiliations

School of Computer Science and Technology, Xidian University, Xi’an, China
Feng Xiaokang & Cui Jiangtao
School of Cyber Engineering, Xidian University, Xi’an, China
Li Hui
Department of System Engineering and Engineering Management, Chinese University of Hong Kong, Hong Kong, China
Liu Yingfan

Authors

Feng Xiaokang
View author publications
You can also search for this author in PubMed Google Scholar
Cui Jiangtao
View author publications
You can also search for this author in PubMed Google Scholar
Li Hui
View author publications
You can also search for this author in PubMed Google Scholar
Liu Yingfan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cui Jiangtao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiaokang, F., Jiangtao, C., Hui, L. et al. An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors. Multimed Tools Appl 78, 24407–24429 (2019). https://doi.org/10.1007/s11042-018-6987-0

Download citation

Received: 25 June 2018
Revised: 29 October 2018
Accepted: 28 November 2018
Published: 07 January 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s11042-018-6987-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors

Abstract

Access this article

Similar content being viewed by others

Efficient locality-sensitive hashing over high-dimensional streaming data

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors

Abstract

Access this article

Similar content being viewed by others

Efficient locality-sensitive hashing over high-dimensional streaming data

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation