skip to main content
article

The string edit distance matching problem with moves

Published: 02 February 2007 Publication History

Abstract

The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.
We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).

References

[1]
Abrahamson, K. 1987. Generalized string matching. SIAM J. Comput. 16, 6, 1039--1051.
[2]
Alstrup, S., Brodal, G. S., and Rauhe, T. 2000. Pattern matching in dynamic texts. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 819--828.
[3]
Amir, A., Lewenstein, M., and Porat, E. 2000. Faster algorithms for string matching with k-mismatches. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 794--803.
[4]
Anderson, R. J., and Miller, G. L. 1991. Deterministic parallel list ranking. Algorithmica 6, 859--868.
[5]
Cole, R., and Hariharan, R. 1998. Approximate string matching: A simpler, faster algorithm. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 463--472.
[6]
Cole, R., and Vishkin, U. 1986. Deterministic coin tossing and accelerating cascades: Micro and macro techniques for designing parallel algorithms. In Proceedings of the ACM Symposium on Theory of Computing. 206--219.
[7]
Cormode, G., Paterson, M., Sahinalp, S. C., and Vishkin, U. 2000. Communication complexity of document exchange. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 197--206.
[8]
Crochemore, M., and Rytter, W. 1994. Text Algorithms. Oxford University Press.
[9]
Goel, A., Indyk, P., and Varadarajan, K. 2001. Reductions among high dimensional proximity problems. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 769--778.
[10]
Goldberg, A., Plotkin, S., and Shannon, G. 1987. Parallel symmetry-breaking in sparse graphs. In Proceedings of the ACM Symposium on Theory of Computing. 315--324.
[11]
Gusfield, D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York.
[12]
Henzinger, M., Raghavan, P., and Rajagopalan, S. 1998. Computing on data streams. Tech. Rep. SRC 1998-011, DEC Systems Research Centre.
[13]
Indyk, P. 2000. Stable distributions, pseudorandom generators, embeddings and data stream computation. In IEEE Conference on the Foundations of Computer Science. 189--197.
[14]
Indyk, P., and Motwani, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the ACM Symposium on Theory of Computing. 604--613.
[15]
Karloff, H. 1993. Fast algorithms for approximately counting mismatches. Inf. Process. Lett. 48, 2, 53--60.
[16]
Karp, R. M., Miller, R. E., and Rosenberg, A. L. 1972. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the ACM Symposium on Theory of Computing. 125--136.
[17]
Karp, R. M., and Rabin, M. O. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31, 2, 249--260.
[18]
Landau, G., and Vishkin, U. 1989. Fast parallel and serial approximate string matching. J. Algorithms 10, 2, 157--169.
[19]
Landau, G. M., and Vishkin, U. 1986. Efficient string matching with k mismatches. Theor. Comput. Sci. 43, 239--249.
[20]
Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physic. Doklady. 10, 8, 707--710.
[21]
Masek, W. J., and Paterson, M. S. 1980. A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 18--31.
[22]
Mehlhorn, K., Sundar, R., and Uhrig, C. 1997. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17, 2, 183--198.
[23]
Muthukrishnan, S., Sahinalp, S. C. 2000. Approximate nearest neighbors and sequence comparison with block operations. In Proceedings of the ACM Symposium on Theory of Computing. 416--424.
[24]
Myers, E. W. 1986. An O(N D) difference algorithm and its variations. Algorithmica 1, 251--256.
[25]
Sahinalp, S. C., and Vishkin, U. 1994. Symmetry breaking for suffix tree construction. In Proceedings of the ACM Symposium on Theory of Computing. 300--309.
[26]
Sahinalp, S. C., and Vishkin, U. 1995. Data compression using locally consistent parsing. Tech. Rep., University of Maryland Department of Computer Science.
[27]
Sahinalp, S. C., and Vishkin, U. 1996. Efficient approximate and dynamic matching of patterns using a labeling paradigm. In IEEE Conference on the Foundations of Computer Science. 320--328.
[28]
Sankoff, D., and Kruskal, J. B. 1983. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley.
[29]
Shapira, D., and Storer, J. A. 2002. Edit distance with move operations. In Proceedings of the 13th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2373. 85--98.

Cited By

View all
  • (2023)Cluster-Based Data Aggregation in Flying Sensor Networks Enabled Internet of ThingsFuture Internet10.3390/fi1508027915:8(279)Online publication date: 20-Aug-2023
  • (2023)Transforming data from the image to the text domain: benign versus malignant micro-calcification classificationVAWKUM Transactions on Computer Sciences10.21015/vtcs.v11i2.168511:2(113-122)Online publication date: 31-Dec-2023
  • (2023)The edge-preservation similarity for comparing rooted, unordered, node-labeled treesPattern Recognition Letters10.1016/j.patrec.2023.02.017167(189-195)Online publication date: Mar-2023
  • Show More Cited By

Index Terms

  1. The string edit distance matching problem with moves

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Algorithms
    ACM Transactions on Algorithms  Volume 3, Issue 1
    February 2007
    178 pages
    ISSN:1549-6325
    EISSN:1549-6333
    DOI:10.1145/1186810
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 February 2007
    Published in TALG Volume 3, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Approximate pattern matching
    2. data streams
    3. edit distance
    4. embedding
    5. similarity search
    6. string matching

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Cluster-Based Data Aggregation in Flying Sensor Networks Enabled Internet of ThingsFuture Internet10.3390/fi1508027915:8(279)Online publication date: 20-Aug-2023
    • (2023)Transforming data from the image to the text domain: benign versus malignant micro-calcification classificationVAWKUM Transactions on Computer Sciences10.21015/vtcs.v11i2.168511:2(113-122)Online publication date: 31-Dec-2023
    • (2023)The edge-preservation similarity for comparing rooted, unordered, node-labeled treesPattern Recognition Letters10.1016/j.patrec.2023.02.017167(189-195)Online publication date: Mar-2023
    • (2023)Assessing the benefits of virtual speaker lateralization for binaural speech intelligibility over the InternetApplied Acoustics10.1016/j.apacoust.2022.109146202(109146)Online publication date: Jan-2023
    • (2023)Signed rearrangement distances considering repeated genes, intergenic regions, and indelsJournal of Combinatorial Optimization10.1007/s10878-023-01083-w46:2Online publication date: 10-Sep-2023
    • (2023)Weak Inverse Neighborhoods of LanguagesDevelopments in Language Theory10.1007/978-3-031-33264-7_6(61-73)Online publication date: 19-May-2023
    • (2022)Sequence Similarity Network Analysis Provides Insight into the Temporal and Geographical Distribution of Mutations in SARS-CoV-2 Spike ProteinViruses10.3390/v1408167214:8(1672)Online publication date: 29-Jul-2022
    • (2022)A Compression-Based Multiple Subword Segmentation for Neural Machine TranslationElectronics10.3390/electronics1107101411:7(1014)Online publication date: 24-Mar-2022
    • (2022)Algorithms and Hardness for Scaffold Filling to Maximize Increased Duo-PreservationsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2021.308389619:4(2071-2079)Online publication date: 1-Jul-2022
    • (2022)FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns2022 Data Compression Conference (DCC)10.1109/DCC52660.2022.00014(63-72)Online publication date: Mar-2022
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media