Abstract
Similarity search is a fundamental problem in various multimedia database applications. Due to the phenomenon of “curse of dimensionality”, the performance of many access methods decreases significantly when the dimensionality increases. Approximate similarity search is an alternative solution, and Locality Sensitive Hashing (LSH) is the most popular method for it. Nevertheless, LSH needs to verify a large number of points to get good-enough results, which incurs plenty of I/O cost. In this paper, we propose a new scheme called SortedKey and Early stop LSH (SELSH), which extends the previous SortingKeys-LSH (SK-LSH). SELSH uses a linear order to sort all the compound hash keys. Moreover, during query processing an early stop condition and a limited page number are used to determine whether a page needs to be accessed. Our experiments demonstrate the superiority of the proposed method against two state-of-the-art methods, C2LSH and SK-LSH.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Berchtold, S., Böhm, C., Kriegel, H.: The pyramid-technique: towards breaking the curse of dimensionality. In: SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2–4, 1998, Seattle, pp. 142–153 (1998)
Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB 1997, Proceedings of 23rd International Conference on Very Large Data Bases, Athens, 25–29 August, 1997, pp. 426–435 (1997)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th ACM Symposium on Computational Geometry, Brooklyn, New York, 8–11 June, 2004, pp. 253–262 (2004)
Gan, J., Feng, J., Fang, Q., Ng, W.: Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, 20–24 May, 2012, pp. 541–552 (2012)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999, Proceedings of 25th International Conference on Very Large Data Bases, Edinburgh, 7–10 September, 1999, pp. 518–529 (1999)
Günther, O.: The design of the cell tree: an object-oriented index structure for geometric databases. In: Proceedings of the Fifth International Conference on Data Engineering, Los Angeles, 6–10 February, 1989, pp. 598–605 (1989)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, 23–26 May, 1998, pp. 604–613 (1998)
Jagadish, H.V., Ooi, B.C., Tan, K., Yu, C., Zhang, R.: iDistance: an adaptive b\({}^{\text{+ }}\)-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2), 364–397 (2005)
Liu, Y., Cui, J., Huang, Z., Li, H., Shen, H.T.: SK-LSH: an efficient index structure for approximate nearest neighbor search. PVLDB 7(9), 745–756 (2014)
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, 23–27 September, 2007, pp. 950–961 (2007)
Shen, F., Shen, C., Shi, Q., van den Hengel, A., Tang, Z.: Inductive hashing on manifolds. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, 23–28 June, 2013, pp. 1562–1569 (2013)
Shen, F., Shen, C., Shi, Q., van den Hengel, A., Tang, Z., Shen, H.T.: Hashing on nonlinear manifolds. IEEE Trans. Image Process. 24(6), 1839–1851 (2015)
Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1), 1–12 (2014)
Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst., 35(3) (2010)
Weber, R., Böhm, K., Schek, H.: Interactive-time similarity search for large image collections using parallel va-files. In: ICDE. p. 197 (2000)
Acknowledgments
This work is partially supported by the Fundamental Research Funds for the Central Universities of China under grant No.ZYGX2014Z007.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, J., He, C., Hu, G., Shao, J. (2016). SELSH: A Hashing Scheme for Approximate Similarity Search with Early Stop Condition. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9517. Springer, Cham. https://doi.org/10.1007/978-3-319-27674-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-27674-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27673-1
Online ISBN: 978-3-319-27674-8
eBook Packages: Computer ScienceComputer Science (R0)