Fast approximate matching of words against a dictionary

Bunke, H.

doi:10.1007/BF02238238

Fast approximate matching of words against a dictionary

Schneller approximativer Vergleich von Wörtern mit einem Wörterbuch

Published: March 1995

Volume 55, pages 75–89, (1995)
Cite this article

Computing Aims and scope Submit manuscript

H. Bunke¹

51 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

A new algorithm for string edit distance computation is given. The algorithm assumes that one of the two strings to be compared is a dictionary entry that is known a priori. This dictionary word is converted in an off-line phase into a deterministic finite state automaton. Given an input string and the automaton derived from the dictionary word, the computation of the edit distance between the two strings corresponds to a traversal of the states of the automaton. This procedure needs time which is only linear in the length of the input string. It is independent of the length of the dictionary word. Given not only one butN different dictionary words, their corresponding automata can be combined into a single deterministic finite state automaton. Thus the computation of the edit distance between the input word and each dictionary entry, and the determination of the nearest neighbor in the dictionary need time that is only linear in the length of the input string. However, the number os states of the automation is exponential.

Zusammenfassung

Es wird ein neuer Algorithmus für die Berechnung der Editierdistanz von Zeichenketten angegeben. Der Algorithmus beruht auf der Annahme, dass eine der beiden zu vergleichenden Zeichenketten ein a priori bekannter Eintrag in einen Wörterbuch ist. Dieser Wörterbucheintrag wird in einer off-line Phase in einen deterministischen endlichen Automaten konvertiert. Für einen gegebenen Automaten und ein Eingabewort entspricht die Berechnung der Editiordistanz einer Traversierung verschiedener Zustände dieses Automaten. Diese Prozedur benötigt Zeit, die lediglich linear von der Länge des Eingabeworts abhängt. Die Zeit ist unabhängig von der Länge des Wörterbucheintrags. Die endlichen Automaten, welche zuN verschiedenen Wörterbucheinträgen gehören, können zu einem einzigen Automaten zusammengefasst werden. Auf diese Weise benötigen die Berechnung der Editierdistanz zwischen dem Eingabewort und jedem Wörterbucheintrag sowie die Bestimmung des nächsten Nachbarn im Wörterbuch lediglich lineare Zeit hinsichtlich der Länge des Eingabeworts. Die Anzahl der Zustände des Automaten ist jedoch von exponentieller Grössenordnung.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Srihari, S. N. (ed.): Computer text recognition and error correction. Tutorial, IEEE Computer Society Press, Silver Spring, MD, 1985.
Google Scholar
Du, M. W., Chang, S. C.: A model and fast algorithms for multiple errors spelling correction. Acta Info.29, 281–302 (1992).
Google Scholar
Pavlidis, T., Mori, S.: Optical character recoginition. Proc. IEEE80, 1027–1209 (1992).
Google Scholar
Elliman, D. G., Lancaster, I. T.: A review of segmentation and contextural analysis techniques for test recognition. Pattern Recógnition23, 337–346 (1990).
Google Scholar
Bunke, H.: Recent advances in string matching In: Advances in structural and syntactic pattern recognition (Bunke, H., ed.), pp. 3–21. Singapore: World Scientific 1993.
Google Scholar
Sankoff, D., Kruskal, J. B. (eds.): Time warps, string edits, and macro-molecules; the theory and practice of sequence comparison. Reading: Addison Wesley 1983.
Google Scholar
Hall, P. A. V., Dowling, G. R.: Approximate string matching. ACM Comp. Surv.12, 381–401 (1980).
Google Scholar
Levensthtein, V. I.: Binary codes capable of correcting deletions, insertions, and reversals. Cyb. Cont. Theory.10, 707–710 (1966).
Google Scholar
Wagner, R. A., Fischer, M. J.: The string-to-string correction problem. J. ACM21, 168–173 (1974).
Google Scholar
Hunt, J. W., Szymanski, T. G.: A fast algorithm for computing longest common subsequences. Comm ACM20, 350–353 (1977).
Google Scholar
Myers, E. W.: AnO (ND) Difference algorithm and its variations. Algorithmica,1, 251–266 (1986).
Google Scholar
Ukkonen, E.: Algorithms for approximate string matching. Inform. Control64, 100–118 (1985).
Google Scholar
Masek, W. J., Paterson, M. S.: A faster algorithm for comparing string-edit distances. J. Comput. Sys. Sci.20, 18–31 (1980).
Google Scholar
Aho, A. V.: Algorithms for finding patterns in strings. In: Handbook of theoretical computer science (van Leeuwen, J., ed.), pp. 255–300. Amsterdam: Elsevier 1990.
Google Scholar
Galil, Z., Giancarlo, R.: Data structures and algorithms for approximates string mathcing. J. Complexity4, 33–72 (1988).
Google Scholar
Landau, G. M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms10, 157–169 (1989).
Google Scholar
Galil, Z., Park, K.: An improved algorithm for approximate string matching. SIAM J. Comp.19, 989–999 (1990).
Google Scholar
Wu, S., Manber, U.: Fast text searching allowing errors. CACM35, 83–91 (1992).
Google Scholar
Ukkonen, E.: Finding approximate patterns in strings. J. Algorithms6, 132–137 (1985).
Google Scholar
Hopcroft, J. E., Ullman, J. D.: Introduction to automata theory, langauges, and computation. Reading: Addison Wesley 1979.
Google Scholar
Lowrance, R., Wagner, R. A.: An extension of the string-to-string correction problem. J. ACM22, 177–183 (1975).
Google Scholar
Kruskal, J. B., Sankoff, D.: An anthology of algorithms and concepts for sequence comparison, In [6], 265–321.
Google Scholar
Tanaka, E.: A string correction method based on the context-dependent similarity. In: Syntactic and structural pattern recognition (Ferrate, G., Pavlidis, T., Sanfelin, A., Bunke, H., eds.), pp. 3–17. NATO ASI Series, Vol. F45 (1988).

Download references

Author information

Authors and Affiliations

Institut für Informatik und angewandte Mathematic, Neubrückstr. 10, CH-3012, Bern, Switzerland
H. Bunke

Authors

H. Bunke
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bunke, H. Fast approximate matching of words against a dictionary. Computing 55, 75–89 (1995). https://doi.org/10.1007/BF02238238

Download citation

Received: 28 June 1994
Issue Date: March 1995
DOI: https://doi.org/10.1007/BF02238238

AMS Subject Classifications

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast approximate matching of words against a dictionary

Abstract

Zusammenfassung

Access this article

Similar content being viewed by others

Dictionary Matching with Uneven Gaps

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

Compressed String Dictionary Search with Edit Distance One

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

AMS Subject Classifications

Key words

Navigation

Fast approximate matching of words against a dictionary

Abstract

Zusammenfassung

Access this article

Similar content being viewed by others

Dictionary Matching with Uneven Gaps

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

Compressed String Dictionary Search with Edit Distance One

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

AMS Subject Classifications

Key words

Search

Navigation