The basic idea of the LSH (Gionis et al. 1999) technique is using multiple hash functions to hash the data points and guarantee that there is a high probability of collision for points which are close to each other and low collision probability for dissimilar points. LSH schemes exist for many distance measures, such as Hamming norm, L p norms, cosine distance, earth movers distance (EMD), and Jaccard coefficient.
In LSH, define a family H = {h : S → U} as locality-sensitive, if for any a, the function \(p(t) = Pr_{H}[h(a) = h(b) :\vert \vert a - b\vert \vert = x]\) is decreasing in x. Based on this definition, the probability of collision of points a and b is decreasing with their distance.
Although LSH was originally proposed for approximate nearest neighbor search in high dimensions, it can be used for clustering as well (Das et al. 2007; Haveliwala et al. 2000). The buckets could be used as the bases for clustering. Seeding the hash functions several times can help getting better...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Das AS, Datar M, Garg A, Rajaram S (2007) Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th international conference on world wide web (WWW’07). ACM, New York, pp 271–280
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases (VLDB’99). Morgan Kaufmann Publishers, San Francisco, pp 518–529
Haveliwala TH, Gionis A, Indyk P (2000) Scalable techniques for clustering the web (extended abstract). In: Proceedings of the third international workshop on the web and databases. Stanford University, Stanford, pp 129–134
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Jin, X., Han, J. (2017). Locality Sensitive Hashing Based Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_950
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_950
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering