Skip to main content

Phrase Similarity through the Edit Distance

  • Conference paper
Database and Expert Systems Applications (DEXA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3180))

Included in the following conference series:

Abstract

This work intends to capture the concept of similarity between phrases. The algorithm is based on a dynamic programming approach integrating both the edit distance between parse trees and single-term similarity. Our work stresses the use of the underlying grammatical structure, which serves as a guide in the computation of semantic similarity between words. This proposal allows us to obtain a more accurate notion of semantic proximity at sentence level, without increasing the complexity of the pattern-matching algorithm on which it is based.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hammouda, K., Kamel, M.: Phrase-based document similarity based on an index graph model. In: 2002 IEEE Int. Conf. on Data Mining, Maebashi, Japan, pp. 203–210 (2002)

    Google Scholar 

  2. Montes-y-Gomez, M., Gelbukh, A., Lopez-Lopez, A., Baeza-Yates, R.: Flexible Comparison of Conceptual Graphs. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, p. 102. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  3. Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th International Conf. on Machine Learning, pp. 296–304 (1998)

    Google Scholar 

  4. Miller, G.: WordNet: An online lexical database. International Journal of Lexico- graphy 3(4) (1990)

    Google Scholar 

  5. Mitchell: Machine learning and data mining. CACM: Communications of the ACM 42 (1999)

    Google Scholar 

  6. Tai, K.-C.: The Tree-to-Tree Correction Problem. Journal of the ACM 26(3), 422–433 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  7. Vilares, M., Dion, B.A.: Efficient incremental parsing for context-free languages. In: Proc. of the 5th IEEE Int. Conf. on Computer Languages, Toulouse, France, pp. 241–252 (1994)

    Google Scholar 

  8. Vilares, M., Ribadas, F.J., Darriba, V.M.: Approximate pattern matching in shared-forest. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 322–333. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  9. Wagner, R.A., Fischer, M.J.: The string to string correction problem. Journal of the ACM 21(1), 168–173 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  10. Zhang, K., Shasha, D., Wang, J.T.L.: Approximate tree matching in the presence of variable length don’t cares. Journal of Algorithms 16(1), 33–66 (1994)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vilares, M., Ribadas, F.J., Vilares, J. (2004). Phrase Similarity through the Edit Distance. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30075-5_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22936-0

  • Online ISBN: 978-3-540-30075-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics