Loading [a11y]/accessibility-menu.js
Fast-join: An efficient method for fuzzy token matching based string similarity join | IEEE Conference Publication | IEEE Xplore

Fast-join: An efficient method for fuzzy token matching based string similarity join


Abstract:

String similarity join that finds similar string pairs between two string sets is an essential operation in many applications, and has attracted significant attention rec...Show More

Abstract:

String similarity join that finds similar string pairs between two string sets is an essential operation in many applications, and has attracted significant attention recently in the database community. A significant challenge in similarity join is to implement an effective fuzzy match operation to find all similar string pairs which may not match exactly. In this paper, we propose a new similarity metrics, called “fuzzy token matching based similarity”, which extends token-based similarity functions (e.g., Jaccard similarity and Cosine similarity) by allowing fuzzy match between two tokens. We study the problem of similarity join using this new similarity metrics and present a signature-based method to address this problem. We propose new signature schemes and develop effective pruning techniques to improve the performance. Experimental results show that our approach achieves high efficiency and result quality, and significantly outperforms state-of-the-art methods.
Date of Conference: 11-16 April 2011
Date Added to IEEE Xplore: 16 May 2011
ISBN Information:

ISSN Information:

Conference Location: Hannover, Germany

References

References is not available for this document.