Fuzzy Hamming Distance: A New Dissimilarity Measure (Extended Abstract)

Bookstein, Abraham; Tomi Klein, Shmuel; Raita, Timo

doi:10.1007/3-540-48194-X_7

Fuzzy Hamming Distance: A New Dissimilarity Measure (Extended Abstract)

Abraham Bookstein⁶,
Shmuel Tomi Klein⁷ &
Timo Raita⁸

Conference paper
First Online: 01 January 2001

901 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2089))

Abstract

Many problems depend on a reliable measure of the distance or similarity between objects that, frequently, are represented as vectors. We consider here vectors that can be expressed as bit sequences. For such problems, the most heavily used measure is the Hamming distance, perhaps normalized. The value of Hamming distances is limited by the fact that it counts only exact matches, whereas in various applications, corresponding bits that are close by, but not exactly matched, can still be considered to be almost identical. We here define a “fuzzy Hamming distance” that extends the Hamming concept to give partial credit for near misses, and suggest a dynamic programming algorithm that permits it to be computed efficiently. We envision many uses for such a measure.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adriaans P., Zantinge D., Data Mining, AddisonWesley Longman Ltd, Harlow, England (1996).
Google Scholar
Bookstein A., Klein S.T., Information Retrieval Tools for Literary Analysis, in Database and Expert Systems Applications, edited by A M. Tjoa, Springer Verlag, Vienna (1990) 1–7.
Chapter Google Scholar
Bookstein A., Klein S.T., Compression of Correlated Bit-Vectors, Information Systems16 (1991) 387–400.
Article Google Scholar
Bookstein A., Klein S.T., Raita T., Clumping properties of content-bearing words, Journal of the American Society for Information Science49 (1998) 102–114.
Google Scholar
Cormen T.H., Leiserson C.E., Riveser R.L., Introduction to Algorithms, MIT Press, Cambridge, MA (1990).
MATH Google Scholar
Crochemore M., Rytter W., Text algorithms, New York, Oxford University Press (1994).
MATH Google Scholar
Fan, C.K., Tsai, W.H., Automatic Word Identification in Chinese Sentences by the Relaxation Technique. Computer Processing of Chinese & Oriental Languages4 (1988) 33–56.
Google Scholar
Doyle L., Semantic Road Maps for Literature Searchers, Journal of the ACM, 8(4) (1961) 553–578.
Article MathSciNet MATH Google Scholar
Hamming R.W., Coding and Information Theory, Englewood Cliffs, NJ, Prentice-Hall (1980).
MATH Google Scholar
Hearst M.A., Multi-Paragraph Segmentation of Expository Text, Proc. ACL Conf., Las Cruces (1994).
Google Scholar
Hearst M.A., Plaunt C,. Subtopic Structuring for Full-Length Document Access, Proc. 16-th ACM-SIGIR Conf., Pittsburgh (1993) 59–68.
Google Scholar
Knuth D.E., The Art of Computer Programming, VolI, Fundamental Algorithms, Addison-Wesley, Reading, Mass. (1973).
Google Scholar
Sankoff D., Kruskal J.B., Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Reading, Mass., Addison-Wesley Pub. Co. (1983).
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Information and Language Studies, University of Chicago, Chicago, IL, 60637
Abraham Bookstein
Dept. of Math. & CS, Bar Ilan University, Ramat-Gan, 52900, Israel
Shmuel Tomi Klein
Comp. Sci. Dept., University of Turku, 20520, Turku, Finland
Timo Raita

Authors

Abraham Bookstein
View author publications
You can also search for this author in PubMed Google Scholar
Shmuel Tomi Klein
View author publications
You can also search for this author in PubMed Google Scholar
Timo Raita
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Bar-Ilan University, 52900, Ramat-Gan, Israel, Atlanta, Georgia, 30332-0280, USA
Amihood Amir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bookstein, A., Tomi Klein, S., Raita, T. (2001). Fuzzy Hamming Distance: A New Dissimilarity Measure (Extended Abstract). In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_7

Download citation

DOI: https://doi.org/10.1007/3-540-48194-X_7
Published: 13 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42271-6
Online ISBN: 978-3-540-48194-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics