Skip to main content
Log in

An algorithm for matching run-length coded strings

Ein Algorithmus für den Vergleich lauflängencodierter Zeichenketten

  • Published:
Computing Aims and scope Submit manuscript

Abstract

An algorithm for the computation of the edit distance of run-length coded strings is given. In run-length coding, not all individual symbols in a string are explicitly listed. Instead, one run of identical consecutive symbols is coded by giving one representative symbol together with its multiplicity. The algorithm determines the minimum cost sequence of edit operations transforming one string into another. In the worst case, the algorithm has a time complexity ofO(n·m), wheren andm give the lengths of the strings to be compared. In the best case, the time complexity isO(k·l), wherek andl are the numbers of runs of identical symbols in the two strings under comparison.

Zusammenfassung

Wir geben einen Algorithmus zur Bestimmung der Ähnlichkeit lauflängencodierter Zeichenketten an. Bei der Lauflängencodierung werden nicht alle Symbole einer Zeichenkette explizit angegeben. Statt dessen wird eine Sequenz identischer aufeinanderfolgender Zeichen durch einen Repräsentanten zusammen mit der Häufigkeit des Auftretens dargestellt. Der Algorithmus bestimmt die Folge von Editieroperationen minimaler Kosten, die eine Zeichenkette in eine andere überführt. Im schlechtesten Fall hat der Algorithmus eine Zeitkomplexität vonO(n·m), wobein undm die Längen der zu vergleichenden Zeichenketten bezeichnen. Die Zeitkomplexität im besten Fall isO(k·l), wobeik undl die Anzahl der Sequenzen gleicher Symbole in den beiden Zeichenketten darstellen.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Srihari, S. N. (ed.): Computer text recognition and error correction. Silver Spring: IEEE Computer Society Press 1985.

    Google Scholar 

  2. Sankoff, D., Kruskal, J. B. (eds.): Time warps, string edits, and macromolecules; the theory and practice of sequence comparsion. Reading: Addison Wesley Publ. Co. 1983.

    Google Scholar 

  3. Bunke, H. (ed.): Advances in structural and syntactic pattern recognition. Singapore: World Scientific Publ. Co. 1993.

    Google Scholar 

  4. Hall, P. A. V., Dovling, G. R.: Approximate string matching. Computing Surveys12, 381–402 (1980).

    Google Scholar 

  5. Levensthtein, V. I.: Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory10, 707–710 (1966).

    Google Scholar 

  6. Wagner, R. A., Fischer, M. J.: The string-to-string correction problem. Journal of the ACM21, 168–173 (1974).

    Google Scholar 

  7. Hunt, J. W., Szymanski, T. G.: A fast algorithm for computing longest common subsequences. CACM20, 350–353 (1977).

    Google Scholar 

  8. Myers, E. W.: AnO(ND) difference algorithm and its variations. Algorithmica1, 251–266 (1986).

    Google Scholar 

  9. Ukkonen, E.: Algorithms for approximate string matching. Inform. Control64, 100–118 (1985).

    Google Scholar 

  10. Masek, W. J., Paterson, M. S.: A faster algorithm for comparing string-edit distances. Journal of Computer and System Sciences20, 18–31 (1980).

    Google Scholar 

  11. Aho, A. V.: Algorithms for finding patterns in strings. In: J. van Leeuwen (ed.): Handbook of theoretical computer science, pp. 255–300. Amsterdam: Elsevier Science Publishers B. V. 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bunke, H., Csirik, J. An algorithm for matching run-length coded strings. Computing 50, 297–314 (1993). https://doi.org/10.1007/BF02243873

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02243873

AMS Subject Classifications

Key words

Navigation