Abstract
Hashing is a binary-code encoding method which tries to preserve the neighborhood structures in the original feature space, in order to realize efficient approximate nearest neighbor search in large-scale databases. Existing hashing methods usually adopt a two-stage strategy (projection stage and quantization stage) to encode data points, and threshold-based single-bit quantization (SBQ) is used to binarize each projected dimension into 0 or 1. Data similarity between hash codes is measured by their Hamming distance. However, SBQ may destroy the original neighborhood structures by quantizing neighboring points near threshold into different binary values. Double-bit quantization (DBQ) and its derivative, Manhattan hashing, have been proposed to fix this problem. Experimental results showed that Manhattan hashing outperformed state-of-the-art methods in terms of effectiveness, but lost the advantage of efficiency because it used decimal arithmetic instead of fast bitwise operations for similarity measurement between hash codes. In this paper, we propose an accelerated strategy of Manhattan hashing by making full use of bitwise operations. Our main contributions are: 1) a new encoding method which assigns location information to each binary digit is proposed to avoid the time-consuming decimal arithmetic; 2) a novel hash code distance measurement that accelerates the calculation of Manhattan distance is proposed to improve query efficiency. Extensive experiments on three benchmark datasets show that our approach improves the speed of data querying on 2-bit, 3-bit and 4-bit quantized hash codes by at least one order of magnitude on average, without any precision loss.
Similar content being viewed by others
Notes
The method bit-count (n) counts the number of ’1’ bits in the binary representation of n, which is also known as the calculation of n’s Hamming weight.
Codes are provided on http://ise.thss.tsinghua.edu.cn/MIG/resources.jsp
References
Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Communications of the ACM - 50th anniversary issue: 1958–2008, vol 51
Baluja S, Covell M (2008) Learning to hash: forgiving hash functions and applications. Data Min Knowl Disc 17(3)
Cheng W, Jin X, Sun J-T, Lin X, Zhang X, Wang W (2014) Searching dimension incomplete databases. Knowl Data Eng 26(3)
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Computer vision and pattern recognition
Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3)
Gionis A, Indyk P, Motwani R, et al. (1999) Similarity search in high dimensions via hashing. In: Very large data bases, vol 99
Gong Y, Lazebnik S (2011) Iterative quantization: a procrustean approach to learning binary codes. In: Computer vision and pattern recognition
Guttman A (1984) R-trees: a dynamic index structure for spatial searching 14(2)
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 13th annual ACM symposium on theory of computing
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision
Jégou H, Douze M, Schmid C (2010) Improving bag-of-features for large scale image search. Int J Comput Vis 87(3)
Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. Pattern Analysis and Machine Intelligence 33(1)
Jolliffe I (2002) Principal component analysis
Kong W, Li W-J (2012) Double-bit quantization for hashing. In: Association for the advancement of artificial intelligence
Kong W, Li W-J, Guo M (2012) Manhattan hashing for large-scale image retrieval. In: ACM special interest group on information retrieval
Lee Y, Heo J-P, Yoon S-E (2014) Quadra-embedding: binary code embedding with low quantization error. Comput Vis Image Underst 125
Lin Z, Ding G, Hu M (2014) Image auto-annotation via tag-dependent random search over range-constrained visual neighbours. Multimedia tools and applications
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Computer vision and pattern recognition
Liu W, Wang J, Kumar S, Chang S-F (2011) Hashing with graphs. In: Proceedings of the 28th international conference on machine learning
Moran S, Lavrenko V, Osborne M (2013) Neighbourhood preserving quantisation for lsh. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval
Moran S, Lavrenko V, Osborne M (2013) Variable bit quantisation for lsh. In: Association for computational linguistics
Mu Y, Shen J, Yan S (2010) Weakly-supervised hashing in kernel space. In: Computer vision and pattern recognition
Norouzi M, Blei DM (2011) Minimal loss hashing for compact binary codes. In: International conference on machine learning
Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: Advances in neural information processing systems
Song J, Yang Y, Huang Z, Shen HT, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on multimedia
Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40(4)
Wang J, Kumar S, Chang SF (2010) Semi-supervised hashing for scalable image retrieval. In: Computer vision and pattern recognition
Wang X, Jin X, Chen M-E, Zhang K, Shen D (2012) Topic mining over asynchronous text sequences. Knowl Data Eng 24(1)
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1)
Yu Z, Wu F, Yang Y, Tian Q, Luo J, Zhuang Y (2014) Discriminative coupled dictionary hashing for fast cross-media retrieval. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval
Zhu X, Huang Z, Cheng H, Cui J, Shen HT (2013) Sparse hashing for fast multimedia search. ACM Trans Inf Syst 31(2)
Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on multimedia
Zhu X, Zhang L, Huang Z (2014) A sparse embedding and least variance encoding approach to hashing. Image Processing 23(9)
Acknowledgments
This research was supported by the National Natural Science Foundation of China (Grant No.61271394 and 61571269). The authors would like to thank the anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, W., Ding, G., Lin, Z. et al. Accelerated Manhattan hashing via bit-remapping with location information. Multimed Tools Appl 76, 2441–2466 (2017). https://doi.org/10.1007/s11042-015-3217-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3217-x