Skip to main content

String Resemblance Systems: A Unifying Framework for String Similarity with Applications to Literature and Music

  • Conference paper
  • First Online:
  • 739 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2089))

Abstract

Identification of similar objects from a large collection of objects is one funda- mental technique in several different areas in computer science, e.g., the case- based reasoning and the machine discovery. Strings are the most basic represen- tations of objects inside computers, and thus string similarity is one of the most important topics in computer science.

Similarity measure must be sensitive to the kind of differences we wish to quantify. The weighted edit distance is one such framework in which the measure can be varied by altering weight assignment to each edit operation depending on symbols involved. However, it does not suffice to solve ‘real problems’ (see e.g., [2]). It is considered that two objects have necessarily a common structure if they seem similar, and the degree of similarity depends upon how valuable the common structure is. Based on this intuition, we present a unifying framework, named string resemblance system (SRS, for short). In this framework, similarity of two strings can be viewed as the maximum score of pattern that matches both of them. The differences among the measures are therefore the choices of (1) pattern set to which common patterns belong, and (2) pattern score function which assigns a score to each pattern.

For example, if we choose the set of patterns with variable length don’t cares and define the score of a pattern to be the number of symbols in it, then the obtained measure is the length of the longest common subsequence (LCS) of two strings. In fact, the strings acdeba and abdac have a common pattern a⋆d⋆a⋆ which contains three symbols. With this framework one can easily design and modify his/her measures. In this paper we briefly describe SRSs and then report successful results of applications to literature and music.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Angluin. Finding patterns common to a set of strings. J. Comput. Sys. Sci., 21:46–62, 1980.

    Article  MathSciNet  MATH  Google Scholar 

  2. D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, 1997.

    Book  MATH  Google Scholar 

  3. T. Kadota, A. Ishino, M. Takeda, and F. Matsuo. On melodic similarity. IPSJ SIG Notes, 2000 (49):15–24, 2000. (in Japanese).

    Google Scholar 

  4. M. Mongeau and D. Sankoff. Comparison of musical sequences. Computers and the Humanities, 24(3):161–175, 1990.

    Article  Google Scholar 

  5. K. Tamari, M. Yamasaki, T. Kida, M. Takeda, T. Fukuda, and I. Nanri. Discovering poetic allusion in anthologies of classical Japanese poems. In Proc. 2nd International Conference on Discovery Science (DS’99), pages 128–138, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Takeda, M. (2001). String Resemblance Systems: A Unifying Framework for String Similarity with Applications to Literature and Music. In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-48194-X_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42271-6

  • Online ISBN: 978-3-540-48194-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics