Abstract
Many problems depend on a reliable measure of the distance or similarity between objects that, frequently, are represented as vectors. We consider here vectors that can be expressed as bit sequences. For such problems, the most heavily used measure is the Hamming distance, perhaps normalized. The value of Hamming distances is limited by the fact that it counts only exact matches, whereas in various applications, corresponding bits that are close by, but not exactly matched, can still be considered to be almost identical. We here define a “fuzzy Hamming distance” that extends the Hamming concept to give partial credit for near misses, and suggest a dynamic programming algorithm that permits it to be computed efficiently. We envision many uses for such a measure.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adriaans P., Zantinge D., Data Mining, AddisonWesley Longman Ltd, Harlow, England (1996).
Bookstein A., Klein S.T., Information Retrieval Tools for Literary Analysis, in Database and Expert Systems Applications, edited by A M. Tjoa, Springer Verlag, Vienna (1990) 1–7.
Bookstein A., Klein S.T., Compression of Correlated Bit-Vectors, Information Systems16 (1991) 387–400.
Bookstein A., Klein S.T., Raita T., Clumping properties of content-bearing words, Journal of the American Society for Information Science49 (1998) 102–114.
Cormen T.H., Leiserson C.E., Riveser R.L., Introduction to Algorithms, MIT Press, Cambridge, MA (1990).
Crochemore M., Rytter W., Text algorithms, New York, Oxford University Press (1994).
Fan, C.K., Tsai, W.H., Automatic Word Identification in Chinese Sentences by the Relaxation Technique. Computer Processing of Chinese & Oriental Languages4 (1988) 33–56.
Doyle L., Semantic Road Maps for Literature Searchers, Journal of the ACM, 8(4) (1961) 553–578.
Hamming R.W., Coding and Information Theory, Englewood Cliffs, NJ, Prentice-Hall (1980).
Hearst M.A., Multi-Paragraph Segmentation of Expository Text, Proc. ACL Conf., Las Cruces (1994).
Hearst M.A., Plaunt C,. Subtopic Structuring for Full-Length Document Access, Proc. 16-th ACM-SIGIR Conf., Pittsburgh (1993) 59–68.
Knuth D.E., The Art of Computer Programming, VolI, Fundamental Algorithms, Addison-Wesley, Reading, Mass. (1973).
Sankoff D., Kruskal J.B., Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Reading, Mass., Addison-Wesley Pub. Co. (1983).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bookstein, A., Tomi Klein, S., Raita, T. (2001). Fuzzy Hamming Distance: A New Dissimilarity Measure (Extended Abstract). In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_7
Download citation
DOI: https://doi.org/10.1007/3-540-48194-X_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42271-6
Online ISBN: 978-3-540-48194-2
eBook Packages: Springer Book Archive