Skip to main content
Log in

Fast approximate matching of words against a dictionary

Schneller approximativer Vergleich von Wörtern mit einem Wörterbuch

  • Published:
Computing Aims and scope Submit manuscript

Abstract

A new algorithm for string edit distance computation is given. The algorithm assumes that one of the two strings to be compared is a dictionary entry that is known a priori. This dictionary word is converted in an off-line phase into a deterministic finite state automaton. Given an input string and the automaton derived from the dictionary word, the computation of the edit distance between the two strings corresponds to a traversal of the states of the automaton. This procedure needs time which is only linear in the length of the input string. It is independent of the length of the dictionary word. Given not only one butN different dictionary words, their corresponding automata can be combined into a single deterministic finite state automaton. Thus the computation of the edit distance between the input word and each dictionary entry, and the determination of the nearest neighbor in the dictionary need time that is only linear in the length of the input string. However, the number os states of the automation is exponential.

Zusammenfassung

Es wird ein neuer Algorithmus für die Berechnung der Editierdistanz von Zeichenketten angegeben. Der Algorithmus beruht auf der Annahme, dass eine der beiden zu vergleichenden Zeichenketten ein a priori bekannter Eintrag in einen Wörterbuch ist. Dieser Wörterbucheintrag wird in einer off-line Phase in einen deterministischen endlichen Automaten konvertiert. Für einen gegebenen Automaten und ein Eingabewort entspricht die Berechnung der Editiordistanz einer Traversierung verschiedener Zustände dieses Automaten. Diese Prozedur benötigt Zeit, die lediglich linear von der Länge des Eingabeworts abhängt. Die Zeit ist unabhängig von der Länge des Wörterbucheintrags. Die endlichen Automaten, welche zuN verschiedenen Wörterbucheinträgen gehören, können zu einem einzigen Automaten zusammengefasst werden. Auf diese Weise benötigen die Berechnung der Editierdistanz zwischen dem Eingabewort und jedem Wörterbucheintrag sowie die Bestimmung des nächsten Nachbarn im Wörterbuch lediglich lineare Zeit hinsichtlich der Länge des Eingabeworts. Die Anzahl der Zustände des Automaten ist jedoch von exponentieller Grössenordnung.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Srihari, S. N. (ed.): Computer text recognition and error correction. Tutorial, IEEE Computer Society Press, Silver Spring, MD, 1985.

    Google Scholar 

  2. Du, M. W., Chang, S. C.: A model and fast algorithms for multiple errors spelling correction. Acta Info.29, 281–302 (1992).

    Google Scholar 

  3. Pavlidis, T., Mori, S.: Optical character recoginition. Proc. IEEE80, 1027–1209 (1992).

    Google Scholar 

  4. Elliman, D. G., Lancaster, I. T.: A review of segmentation and contextural analysis techniques for test recognition. Pattern Recógnition23, 337–346 (1990).

    Google Scholar 

  5. Bunke, H.: Recent advances in string matching In: Advances in structural and syntactic pattern recognition (Bunke, H., ed.), pp. 3–21. Singapore: World Scientific 1993.

    Google Scholar 

  6. Sankoff, D., Kruskal, J. B. (eds.): Time warps, string edits, and macro-molecules; the theory and practice of sequence comparison. Reading: Addison Wesley 1983.

    Google Scholar 

  7. Hall, P. A. V., Dowling, G. R.: Approximate string matching. ACM Comp. Surv.12, 381–401 (1980).

    Google Scholar 

  8. Levensthtein, V. I.: Binary codes capable of correcting deletions, insertions, and reversals. Cyb. Cont. Theory.10, 707–710 (1966).

    Google Scholar 

  9. Wagner, R. A., Fischer, M. J.: The string-to-string correction problem. J. ACM21, 168–173 (1974).

    Google Scholar 

  10. Hunt, J. W., Szymanski, T. G.: A fast algorithm for computing longest common subsequences. Comm ACM20, 350–353 (1977).

    Google Scholar 

  11. Myers, E. W.: AnO (ND) Difference algorithm and its variations. Algorithmica,1, 251–266 (1986).

    Google Scholar 

  12. Ukkonen, E.: Algorithms for approximate string matching. Inform. Control64, 100–118 (1985).

    Google Scholar 

  13. Masek, W. J., Paterson, M. S.: A faster algorithm for comparing string-edit distances. J. Comput. Sys. Sci.20, 18–31 (1980).

    Google Scholar 

  14. Aho, A. V.: Algorithms for finding patterns in strings. In: Handbook of theoretical computer science (van Leeuwen, J., ed.), pp. 255–300. Amsterdam: Elsevier 1990.

    Google Scholar 

  15. Galil, Z., Giancarlo, R.: Data structures and algorithms for approximates string mathcing. J. Complexity4, 33–72 (1988).

    Google Scholar 

  16. Landau, G. M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms10, 157–169 (1989).

    Google Scholar 

  17. Galil, Z., Park, K.: An improved algorithm for approximate string matching. SIAM J. Comp.19, 989–999 (1990).

    Google Scholar 

  18. Wu, S., Manber, U.: Fast text searching allowing errors. CACM35, 83–91 (1992).

    Google Scholar 

  19. Ukkonen, E.: Finding approximate patterns in strings. J. Algorithms6, 132–137 (1985).

    Google Scholar 

  20. Hopcroft, J. E., Ullman, J. D.: Introduction to automata theory, langauges, and computation. Reading: Addison Wesley 1979.

    Google Scholar 

  21. Lowrance, R., Wagner, R. A.: An extension of the string-to-string correction problem. J. ACM22, 177–183 (1975).

    Google Scholar 

  22. Kruskal, J. B., Sankoff, D.: An anthology of algorithms and concepts for sequence comparison, In [6], 265–321.

    Google Scholar 

  23. Tanaka, E.: A string correction method based on the context-dependent similarity. In: Syntactic and structural pattern recognition (Ferrate, G., Pavlidis, T., Sanfelin, A., Bunke, H., eds.), pp. 3–17. NATO ASI Series, Vol. F45 (1988).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bunke, H. Fast approximate matching of words against a dictionary. Computing 55, 75–89 (1995). https://doi.org/10.1007/BF02238238

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02238238

AMS Subject Classifications

Key words

Navigation