Locality Sensitive Hashing Based Clustering

Jin, Xin; Han, Jiawei

doi:10.1007/978-1-4899-7687-1_950

Xin Jin³ &
Jiawei Han⁴

371 Accesses
1 Citations

The basic idea of the LSH (Gionis et al. 1999) technique is using multiple hash functions to hash the data points and guarantee that there is a high probability of collision for points which are close to each other and low collision probability for dissimilar points. LSH schemes exist for many distance measures, such as Hamming norm, L_p norms, cosine distance, earth movers distance (EMD), and Jaccard coefficient.

In LSH, define a family H = {h : S → U} as locality-sensitive, if for any a, the function $p(t) = Pr_{H}[h(a) = h(b) :\vert \vert a - b\vert \vert = x]$ is decreasing in x. Based on this definition, the probability of collision of points a and b is decreasing with their distance.

Although LSH was originally proposed for approximate nearest neighbor search in high dimensions, it can be used for clustering as well (Das et al. 2007; Haveliwala et al. 2000). The buckets could be used as the bases for clustering. Seeding the hash functions several times can help getting better...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 999.99; Price excludes VAT (USA)

Hardcover Book: USD 999.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

PayPal Inc., San Jose, CA, USA
Xin Jin
University of Illinois at Urbana-Champaign, Urbana, IL, USA
Jiawei Han

Authors

Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Claude Sammut
Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Jin, X., Han, J. (2017). Locality Sensitive Hashing Based Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_950

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7687-1_950
Published: 14 April 2017
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics