Skip to main content
Log in

An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In massive multimedia era, the dimension curse and the I/O performance bottleneck have become two major challenges for disk-based Approximate Nearest Neighbor (ANN) search. Hashing is a popular solution to overcome the dimension curse, one promising hashing technique is Locality Sensitive Hashing (LSH). However, most existing LSH indexings incur significant I/O cost during the search due to their low NN candidate hits in each I/O access. We recommend a novel method SC-LSH (SortingCodes-LSH) which combines LSH with another hashing technique (i.e., the discriminative short codes) to lift the hit of NN candidates so as to further boost the ANN search performance. Firstly, we intensify an LSH index and sort all the compound hashing keys according to a linear order to make similar NN candidates distributed locally. Then we generate product quantization (PQ) codes to use them as candidates instead of the original data points. These space-efficient short codes can enable us acquire significantly candidates via much less I/O operations. Moreover, based on theoretical and empirical studies among series of space-filling curves, we finally choose the Gray curve as the linear order to produce better local distribution of candidate data. All these above significantly increase the NN hits during each I/O, which greatly reduce the amount of necessary I/O access. Meanwhile, with the good similarity preserving ability, PQ codes are precise enough to discriminate NNs and thus guarantee the accuracy. Empirical study demonstrates that, comparing with four state-of-the-arts, SC-LSH achieves the best accuracy with significantly smaller I/O cost and space consumption. In fact, depending on the datasets, the I/O cost (resp., space consumption) of our scheme is only 5%-20% (resp., 1%-20%) of the other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. In this work, we suppose that each component of the data point is a one word long integer/float.

  2. http://www.cs.princeton.edu/cass/audio.tar.gz

  3. http://corpus-texmex.irisa.fr/

  4. http://corpus-texmex.irisa.fr/

  5. http://phototour.cs.washington.edu/patches/default.htm

References

  1. Babenko A, Lempitsky V (2012) The inverted multi-index. In: CVPR. IEEE, pp 3069–3076

  2. Böhm C (2000) A cost model for query processing in high dimensional data spaces. ACM Trans Database Syst 25(2):129–178

    Article  Google Scholar 

  3. Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp 253–262

  4. Faloutsos C, Roseman S (1989) Fractals for secondary key retrieval. In: PODS, pp 247–252

  5. Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231

    Article  Google Scholar 

  6. Gan J, Feng J, Fang Q, Ng W (2012) Locality sensitive hashing scheme based on dynamic collision counting. In: SIGMOD, pp 541–552

  7. Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: VLDB, pp 518–529

  8. Gong Y, Lazebnik S (2011) Iterative quantization: a procrustean approach to learning binary codes. In: CVPR. pp 817–824

  9. He S, Ye G, Hu M, Yang Y, Shen F, Shen HT, Li X (2018) Learning binary codes with local and inner data structure. Neurocomputing 282:32–41

    Article  Google Scholar 

  10. Huang Q, Feng J, Zhang Y, Fang Q, Ng W (2015) Query-aware locality-sensitive hashing for approximate nearest neighbor search. PVLDB 9(1):1–12

    Google Scholar 

  11. Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp 604–613

  12. Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128

    Article  Google Scholar 

  13. Joly A, Buisson O (2008) A posteriori multi-probe locality sensitive hashing. In: ACM multimedia, pp 209–218

  14. Kalantidis Y, Avrithis YS (2014) Locally optimized product quantization for approximate nearest neighbor search. In: CVPR, pp 2329–2336

  15. Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110

    Article  Google Scholar 

  16. Liu Y, Cui J, Huang Z, Li H, Shen HT (2014) SK-LSH: An efficient index structure for approximate nearest neighbor search. PVLDB 7(9):745–756

    Google Scholar 

  17. Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the 7th IEEE international conference on computer vision, 1999, vol 2. IEEE, pp 1150–1157

  18. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  19. Luo M, Chang X, Li Z, Nie L, Hauptmann AG, Zheng Q (2017) Simple to complex cross-modal learning to rank. Comput Vis Image Underst 163:67–77

    Article  Google Scholar 

  20. Luo X, Nie L, He X, Wu Y, Chen ZD, Xu XS (2018) Fast scalable supervised hashing. In: SIGIR, pp 735–744

  21. Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe lsh: efficient indexing for high-dimensional similarity search. In: VLDB, pp 950–961

  22. Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst (TOIS) 30 (2):13

    Article  Google Scholar 

  23. Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 59–68

  24. Norouzi M, Fleet DJ (2013) Cartesian k-means. In: CVPR, pp 3017–3024

  25. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  26. Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. In: SODA, pp 1186–1195

  27. Park Y, Cafarella MJ, Mozafari B (2015) Neighbor-sensitive hashing. PVLDB 9(3):144–155

    Google Scholar 

  28. Shen F, Zhou X, Yang Y, Song J, Shen HT, Tao D (2016) A fast optimization method for general binary code learning. IEEE Trans Image Process 25 (12):5610–5621

    Article  MathSciNet  MATH  Google Scholar 

  29. Shen F, Yang Y, Liu L, Liu W, Dacheng Tao HTS (2017) Asymmetric binary coding for image search. IEEE Trans Multimed 19(9):2022–2032

    Article  Google Scholar 

  30. Shen F, Xu Y, Liu L, Yang Y, Huang Z, Shen HT (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans Pattern Anal Mach Intell

  31. Sun Y, Wang W, Qin J, Zhang Y, Lin X (2014) SRS: Solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1):1–12

    Google Scholar 

  32. Tao Y, Yi K, Sheng C, Kalnis P (2009) Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD, pp 563–576

  33. Vitter JS (2008) Algorithms and data structures for external memory. Foundations TrendsⓇ, Theor Comput Sci 2(4):305–474

    Article  MathSciNet  MATH  Google Scholar 

  34. Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: a survey. CoRR 1408.2927

  35. Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, vol 98, pp 194–205

  36. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Proceedings of the 22nd annual conference on neural information processing systems, advances in neural information processing systems 21, Vancouver, British Columbia, Canada, December 8-11, 2008, pp 1753–1760

  37. Zhang PF, Li CX, Liu MY, Nie L, Xu XS (2017) Semi-relaxation supervised hashing for cross-modal retrieval. In: Proceedings of the 2017 ACM on multimedia conference. ACM, pp 1762–1770

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61472298, 61672408, 61702403, U1135002), China 111 Project (No. B16037), China Postdoctoral Science Foundation (No. 2018M633473), Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2015JQ6227), SRF for ROCS, SEM, the Fundamental Research Funds for the Central Universities (No. JB170308, etc.) and the Innovation Fund of Xidian University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cui Jiangtao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiaokang, F., Jiangtao, C., Hui, L. et al. An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors. Multimed Tools Appl 78, 24407–24429 (2019). https://doi.org/10.1007/s11042-018-6987-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6987-0

Keywords

Navigation