Abstract:
In this paper, we present LSHDB, the first parallel and distributed engine for record linkage and similarity search. LSHDB materializes an abstraction layer to hide the m...Show MoreNotes: This article was mistakenly omitted from the original submission to IEEE Xplore. It is now included as part of the conference record.
Metadata
Abstract:
In this paper, we present LSHDB, the first parallel and distributed engine for record linkage and similarity search. LSHDB materializes an abstraction layer to hide the mechanics of the Locality-Sensitive Hashing (a popular method for detecting similar items in high dimensions) which is used as the underlying similarity search engine. LSHDB creates the appropriate data structures from the input data and persists these structures on disk using a noSQL engine. It inherently supports the parallel processing of distributed queries, is highly extensible, and is easy to use.We will demonstrate LSHDB both as the underlying system for detecting similar records in the context of Record Linkage (and of Privacy-Preserving Record Linkage) tasks, as well as a search engine for identifying string values that are similar to submitted queries.
Notes: This article was mistakenly omitted from the original submission to IEEE Xplore. It is now included as part of the conference record.
Date of Conference: 12-15 December 2016
Date Added to IEEE Xplore: 02 March 2017
ISBN Information:
Electronic ISSN: 2375-9259