Abstract
Jaro similarity is widely used in computing the similarity (or distance) between two strings of characters. For example, record linkage is an application of great interest in many domains for which Jaro similarity is popularly employed. Existing algorithms for computing the Jaro similarity between two given strings take quadratic time in the worst case. In this paper, we present an algorithm for Jaro similarity computation that takes only linear time. We also present experimental results that reveal that our algorithm outperforms existing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Basak, J., Soliman, A., Deo, N., Rajasekaran, S.: SuperBlocking: an efficient blocking technique for record linkage, manuscript (2023)
Clark, D.E.: Practical introduction to record linkage for injury research. Injury Prevention BMJ J. 10(3), 186–191 (2004)
GeeksforGeeks, “Jaro and Jaro-Winkler Similarity”, 20 Jan. 2020. https://www.geeksforgeeks.org/jaro-and-jaro-winkler-similarity/
Horowitz, E., Sahni, S., Rajasekaran, S.: Computer Algorithms. Silicon Press (2008)
Jaro, M.A.: Advances in record linkage methodology as applied to the 1985 census of Tampa Florida. J. Am. Stat. Assoc. 84(406), 414–20 (1989). https://doi.org/10.1080/01621459.1989.10478785
Maizlish, N., Herrera, L.: A record linkage protocol for a diabetes registry at ethnically diverse community health centers. J. Am. Med. Inform. Assoc. 12, 331–337 (2005)
Papadakis, G., Ioannou, E., Thanos, E., Palpanas, T.: The four generations of entity resolution. Synthesis Lectures Data Manage. 16, 1–170 (2021)
Saeedi, A., Peukert, E., Rahm, E.: Using link features for entity clustering in knowledge graphs. The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pp. 576–592 (2018)
Soliman, A., Rajasekaran, S.: FIRLA: a Fast Incremental Record Linkage Algorithm. J. Biomed. Inform. 130, 104094 (2022)
Soliman, A., Rajasekaran, S.: A Novel String Map-Based Approach for Distance Calculations with Applications to Faster Record Linkage, manuscript (2023)
Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods, American Statistical Association: 354–359 (1990)
Winkler, W.E.: Overview of Record Linkage and Current Research Directions, Research Report Series, Statistical Research Division, U.S. Census Bureau, Washington, DC 20233 (2006)
Acknowledgements.
This work was partially supported by the United States Census Bureau under Award Number CB21RMD0160003. The content is solely the responsibility of the authors and does not necessarily represent the official views of the US Census Bureau.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Basak, J. et al. (2023). On Computing the Jaro Similarity Between Two Strings. In: Guo, X., Mangul, S., Patterson, M., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2023. Lecture Notes in Computer Science(), vol 14248. Springer, Singapore. https://doi.org/10.1007/978-981-99-7074-2_3
Download citation
DOI: https://doi.org/10.1007/978-981-99-7074-2_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7073-5
Online ISBN: 978-981-99-7074-2
eBook Packages: Computer ScienceComputer Science (R0)