String Resemblance Systems: A Unifying Framework for String Similarity with Applications to Literature and Music

Takeda, Masayuki

doi:10.1007/3-540-48194-X_13

Masayuki Takeda⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2089))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

802 Accesses

Abstract

Identification of similar objects from a large collection of objects is one funda- mental technique in several different areas in computer science, e.g., the case- based reasoning and the machine discovery. Strings are the most basic represen- tations of objects inside computers, and thus string similarity is one of the most important topics in computer science.

Similarity measure must be sensitive to the kind of differences we wish to quantify. The weighted edit distance is one such framework in which the measure can be varied by altering weight assignment to each edit operation depending on symbols involved. However, it does not suffice to solve ‘real problems’ (see e.g., [2]). It is considered that two objects have necessarily a common structure if they seem similar, and the degree of similarity depends upon how valuable the common structure is. Based on this intuition, we present a unifying framework, named string resemblance system (SRS, for short). In this framework, similarity of two strings can be viewed as the maximum score of pattern that matches both of them. The differences among the measures are therefore the choices of (1) pattern set to which common patterns belong, and (2) pattern score function which assigns a score to each pattern.

For example, if we choose the set of patterns with variable length don’t cares and define the score of a pattern to be the number of symbols in it, then the obtained measure is the length of the longest common subsequence (LCS) of two strings. In fact, the strings acdeba and abdac have a common pattern a⋆d⋆a⋆ which contains three symbols. With this framework one can easily design and modify his/her measures. In this paper we briefly describe SRSs and then report successful results of applications to literature and music.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Angluin. Finding patterns common to a set of strings. J. Comput. Sys. Sci., 21:46–62, 1980.
Article MathSciNet MATH Google Scholar
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, 1997.
Book MATH Google Scholar
T. Kadota, A. Ishino, M. Takeda, and F. Matsuo. On melodic similarity. IPSJ SIG Notes, 2000 (49):15–24, 2000. (in Japanese).
Google Scholar
M. Mongeau and D. Sankoff. Comparison of musical sequences. Computers and the Humanities, 24(3):161–175, 1990.
Article Google Scholar
K. Tamari, M. Yamasaki, T. Kida, M. Takeda, T. Fukuda, and I. Nanri. Discovering poetic allusion in anthologies of classical Japanese poems. In Proc. 2nd International Conference on Discovery Science (DS’99), pages 128–138, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Kyushu University 33, Fukuoka, 812-8581, Japan
Masayuki Takeda

Authors

Masayuki Takeda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Bar-Ilan University, 52900, Ramat-Gan, Israel, Atlanta, Georgia, 30332-0280, USA
Amihood Amir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takeda, M. (2001). String Resemblance Systems: A Unifying Framework for String Similarity with Applications to Literature and Music. In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_13

Download citation

DOI: https://doi.org/10.1007/3-540-48194-X_13
Published: 13 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42271-6
Online ISBN: 978-3-540-48194-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics