Efficient Hybrid Algorithms for Computing Clusters Overlap

https://doi.org/10.1016/j.procs.2017.05.212Get rights and content
Under a Creative Commons license
open access

Abstract

Every year, marketers target different segments of the population with multitudes of advertisements. However, millions of dollars are wasted targeting similar segments with different advertisements. Furthermore, it is extremely expensive to compute the similarity between the segments because these segments can be very large, on the order of millions and billions. In this project, we come up with a fast probabilistic algorithm, described in Section 3, that can determine the similarity between large segments with a higher degree of accuracy than other known methods.

Keywords

Clustering
Jaccard similarity
Inclusion-Exclusion algorithm
MinHash
HyperLogLog

Cited by (0)