Skip to main content

Locality Sensitive Hashing Based Clustering

  • Reference work entry
  • First Online:
Encyclopedia of Machine Learning and Data Mining

The basic idea of the LSH (Gionis et al. 1999) technique is using multiple hash functions to hash the data points and guarantee that there is a high probability of collision for points which are close to each other and low collision probability for dissimilar points. LSH schemes exist for many distance measures, such as Hamming norm, L p norms, cosine distance, earth movers distance (EMD), and Jaccard coefficient.

In LSH, define a family H = {h : S → U} as locality-sensitive, if for any a, the function \(p(t) = Pr_{H}[h(a) = h(b) :\vert \vert a - b\vert \vert = x]\) is decreasing in x. Based on this definition, the probability of collision of points a and b is decreasing with their distance.

Although LSH was originally proposed for approximate nearest neighbor search in high dimensions, it can be used for clustering as well (Das et al. 2007; Haveliwala et al. 2000). The buckets could be used as the bases for clustering. Seeding the hash functions several times can help getting better...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  • Das AS, Datar M, Garg A, Rajaram S (2007) Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th international conference on world wide web (WWW’07). ACM, New York, pp 271–280

    Google Scholar 

  • Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases (VLDB’99). Morgan Kaufmann Publishers, San Francisco, pp 518–529

    Google Scholar 

  • Haveliwala TH, Gionis A, Indyk P (2000) Scalable techniques for clustering the web (extended abstract). In: Proceedings of the third international workshop on the web and databases. Stanford University, Stanford, pp 129–134

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this entry

Cite this entry

Jin, X., Han, J. (2017). Locality Sensitive Hashing Based Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_950

Download citation

Publish with us

Policies and ethics