Abstract
Identification of similar objects from a large collection of objects is one funda- mental technique in several different areas in computer science, e.g., the case- based reasoning and the machine discovery. Strings are the most basic represen- tations of objects inside computers, and thus string similarity is one of the most important topics in computer science.
Similarity measure must be sensitive to the kind of differences we wish to quantify. The weighted edit distance is one such framework in which the measure can be varied by altering weight assignment to each edit operation depending on symbols involved. However, it does not suffice to solve ‘real problems’ (see e.g., [2]). It is considered that two objects have necessarily a common structure if they seem similar, and the degree of similarity depends upon how valuable the common structure is. Based on this intuition, we present a unifying framework, named string resemblance system (SRS, for short). In this framework, similarity of two strings can be viewed as the maximum score of pattern that matches both of them. The differences among the measures are therefore the choices of (1) pattern set to which common patterns belong, and (2) pattern score function which assigns a score to each pattern.
For example, if we choose the set of patterns with variable length don’t cares and define the score of a pattern to be the number of symbols in it, then the obtained measure is the length of the longest common subsequence (LCS) of two strings. In fact, the strings acdeba and abdac have a common pattern a⋆d⋆a⋆ which contains three symbols. With this framework one can easily design and modify his/her measures. In this paper we briefly describe SRSs and then report successful results of applications to literature and music.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. Angluin. Finding patterns common to a set of strings. J. Comput. Sys. Sci., 21:46–62, 1980.
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, 1997.
T. Kadota, A. Ishino, M. Takeda, and F. Matsuo. On melodic similarity. IPSJ SIG Notes, 2000 (49):15–24, 2000. (in Japanese).
M. Mongeau and D. Sankoff. Comparison of musical sequences. Computers and the Humanities, 24(3):161–175, 1990.
K. Tamari, M. Yamasaki, T. Kida, M. Takeda, T. Fukuda, and I. Nanri. Discovering poetic allusion in anthologies of classical Japanese poems. In Proc. 2nd International Conference on Discovery Science (DS’99), pages 128–138, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takeda, M. (2001). String Resemblance Systems: A Unifying Framework for String Similarity with Applications to Literature and Music. In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_13
Download citation
DOI: https://doi.org/10.1007/3-540-48194-X_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42271-6
Online ISBN: 978-3-540-48194-2
eBook Packages: Springer Book Archive