Abstract
The lexical similarity measure is used for calculating the similarities between strings. Existing lexical-based methods usually base on either n-grams or Dice’s approaches. These measures have a good performance and could be extended by adjusting the parameter. However, they do not return reasonable results in some situations where strings are quite similar or the sets of characters are the same but their positions are different. To deal with this problem, our paper presents an approach to improve a lexical-based measure based on both information-theoretic and edit distance measures. The proposed method is tested on a partial OAEI benchmark 2008. The results show that our approach has some prominent features compared to other lexical-based methods. It is also flexible clearly and convenient in implementation. Moreover, we chose a range of good parameters can be applied in different domains.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Algergawy, A., Schallehn, E., Saake, G.: A sequence-based ontology matching approach. In: The 10th International Conference on Information Integration and Web-Based Applications & Services, pp. 131–136. ACM (2008)
Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. Biomed. Inf. 44(1), 118–125 (2011)
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, Heidelberg (2013)
Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Techn. J. 29(2), 147–160 (1950)
Ichise, R.: Machine learning approach for ontology mapping using multiple concept similarity measures. In: The 7th IEEE/ACIS International Conference on Computer and Information Science, pp. 340–346. IEEE (2008)
Jaccard, P.: The distribution of the flora in the alpine zone. New Phytol. 11(2), 37–50 (1912)
Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Am. Stat. Assoc. 84(406), 414–420 (1989)
Kondrak, G.: N-gram similarity and distance. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Doklady 10, 707–710 (1966)
Lin, D.: An information-theoretic definition of similarity. In: The 15th International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann (1998)
Maedche, A., Staab, S.: Measuring similarity between ontologies. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 251–263. Springer, Heidelberg (2002)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Mol. Biol. 48, 443–453 (1970)
Nguyen, T.T.A., Conrad, S.: Combination of lexical and structure-based similarity measures to match ontologies automatically. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds.) IC3K 2012. CCIS, vol. 415, pp. 101–112. Springer, Heidelberg (2013)
Nguyen, T.T.A., Conrad, S.: Applying information-theoretic and edit distance approaches to flexibly measure lexical similarity. In: The 6th International Conference on Knowledge Discovery and Information Retrieval, pp. 505–511. SciTePress (2014)
Pirró, G., Euzenat, J.: A feature and information theoretic framework for semantic similarity and relatedness. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 615–630. Springer, Heidelberg (2010)
Pirró, G., Seco, N.: Design, implementation and evaluation of a new semantic similarity metric combining features and intrinsic information content. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part II. LNCS, vol. 5332, pp. 1271–1288. Springer, Heidelberg (2008)
Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)
Tversky, A.: Features of similarity. Psychol. Rev. 84, 327–352 (1997)
Wang, X., Ding, Y., Zhao, Y.: Similarity measurement about ontology-based semantic web services. In: The Workshop on Semantics for Web Services (2006)
Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: The Section on Survey Research, pp. 354–359 (1990)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, T.T.A., Conrad, S. (2015). An Improved String Similarity Measure Based on Combining Information-Theoretic and Edit Distance Methods. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2014. Communications in Computer and Information Science, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-319-25840-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-25840-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25839-3
Online ISBN: 978-3-319-25840-9
eBook Packages: Computer ScienceComputer Science (R0)