Skip to main content

Fuzzy Hamming Distance: A New Dissimilarity Measure (Extended Abstract)

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2089))

Abstract

Many problems depend on a reliable measure of the distance or similarity between objects that, frequently, are represented as vectors. We consider here vectors that can be expressed as bit sequences. For such problems, the most heavily used measure is the Hamming distance, perhaps normalized. The value of Hamming distances is limited by the fact that it counts only exact matches, whereas in various applications, corresponding bits that are close by, but not exactly matched, can still be considered to be almost identical. We here define a “fuzzy Hamming distance” that extends the Hamming concept to give partial credit for near misses, and suggest a dynamic programming algorithm that permits it to be computed efficiently. We envision many uses for such a measure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adriaans P., Zantinge D., Data Mining, AddisonWesley Longman Ltd, Harlow, England (1996).

    Google Scholar 

  2. Bookstein A., Klein S.T., Information Retrieval Tools for Literary Analysis, in Database and Expert Systems Applications, edited by A M. Tjoa, Springer Verlag, Vienna (1990) 1–7.

    Chapter  Google Scholar 

  3. Bookstein A., Klein S.T., Compression of Correlated Bit-Vectors, Information Systems16 (1991) 387–400.

    Article  Google Scholar 

  4. Bookstein A., Klein S.T., Raita T., Clumping properties of content-bearing words, Journal of the American Society for Information Science49 (1998) 102–114.

    Google Scholar 

  5. Cormen T.H., Leiserson C.E., Riveser R.L., Introduction to Algorithms, MIT Press, Cambridge, MA (1990).

    MATH  Google Scholar 

  6. Crochemore M., Rytter W., Text algorithms, New York, Oxford University Press (1994).

    MATH  Google Scholar 

  7. Fan, C.K., Tsai, W.H., Automatic Word Identification in Chinese Sentences by the Relaxation Technique. Computer Processing of Chinese & Oriental Languages4 (1988) 33–56.

    Google Scholar 

  8. Doyle L., Semantic Road Maps for Literature Searchers, Journal of the ACM, 8(4) (1961) 553–578.

    Article  MathSciNet  MATH  Google Scholar 

  9. Hamming R.W., Coding and Information Theory, Englewood Cliffs, NJ, Prentice-Hall (1980).

    MATH  Google Scholar 

  10. Hearst M.A., Multi-Paragraph Segmentation of Expository Text, Proc. ACL Conf., Las Cruces (1994).

    Google Scholar 

  11. Hearst M.A., Plaunt C,. Subtopic Structuring for Full-Length Document Access, Proc. 16-th ACM-SIGIR Conf., Pittsburgh (1993) 59–68.

    Google Scholar 

  12. Knuth D.E., The Art of Computer Programming, VolI, Fundamental Algorithms, Addison-Wesley, Reading, Mass. (1973).

    Google Scholar 

  13. Sankoff D., Kruskal J.B., Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Reading, Mass., Addison-Wesley Pub. Co. (1983).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bookstein, A., Tomi Klein, S., Raita, T. (2001). Fuzzy Hamming Distance: A New Dissimilarity Measure (Extended Abstract). In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-48194-X_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42271-6

  • Online ISBN: 978-3-540-48194-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics