Skip to main content
Log in

A subquadratic algorithm for approximate limited expression matching

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

In this paper we present an efficient subquadratic-time algorithm for matching strings and limited expressions in large texts. Limited expressions are a subset of regular expressions that appear often in practice. The generalization from simple strings to limited expressions has a negligible affect on the speed of our algorithm, yet allows much more flexibility. Our algorithm is similar in spirit to that of Masek and Paterson [MP], but it is much faster in practice. Our experiments show a factor of four to five speedup against the algorithms of Sellers [Se] and Ukkonen [Uk1] independent of the sizes of the input strings. Experiments also reveal our algorithm to be faster, in most cases, than a recent improvement by Chang and Lampe [CL2], especially for small alphabet sizes for which it is two to three times faster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. K. Abrahamson, Generalized string matching,SIAM J. Comput.,16 (1987), 1039–1051.

    Article  Google Scholar 

  2. A. V. Aho and M. J. Corasick, Efficient string matching: an aid to bibliographic search,Comm. ACM,18 (1975), 333–340.

    Article  Google Scholar 

  3. V. L. Arlazarov, E. A. Dinic, M. A. Kronrod, and I. A. Faradzev, On economic construction of the transitive closure of a directed graph,Dokl. Akad. Nauk SSSR,194 (1970), 487–488 (in Russian). English translation inSoviet Math. Dokl.,11 (1975), 1209–1210.

    Google Scholar 

  4. R. A. Baeza-Yates and G. H. Gonnet, A new approach to text searching,Comm. ACM,35 (1992), 74–82.

    Article  Google Scholar 

  5. R. S. Boyer and J. S. Moore, A fast string searching algorithm,Comm. ACM,20 (1977), 762–772.

    Article  Google Scholar 

  6. W. I. Chang and E. L. Lawler, Approximate string matching in sublinear expected time,Proc. 31st Symp. on Foundations of Computer Science, 1990, pp. 116–124.

  7. W. I. Chang and J. Lampe, Theoretical and empirical comparisons of approximate string matching algorithms,Proc. 3rd Symp. on Combinatorial Pattern Matching, Tucson, AZ, April 1992, pp. 172–181.

  8. B. Commentz-Walter, A string matching algorithm fast on the average,Proc. 6th Internal. Colloq. on Automata, Languages, and Programming, 1979, pp. 118–132.

  9. M. Fischev and M. Paterson, String matching and other products,Proc. 7th SIAM-AMS Symp. on Complexity of Computation, 1974, pp. 113–125.

  10. Z. Galil and K. Park, An improved algorithm for approximate string matching,SIAM J. Comput.,19 (1990), 989–999.

    Article  Google Scholar 

  11. D. E. Knuth, J. H. Morris, and V. R. Pratt, Fast pattern matching in strings,SIAM J. Comput.,6 (1977), 323–350.

    Article  Google Scholar 

  12. G. M. Landau and U. Vishkin, Fast string matching withk differences,J. Comput. System Sci.,37 (1988), 63–78.

    Article  Google Scholar 

  13. W. J. Masek and M. S. Paterson, A faster algorithm for computing string edit distances,J. Comput. System Sci.,20 (1980), 18–31.

    Article  Google Scholar 

  14. E. W. Myers, Incremental Alignment Algorithms and their Applications, Technical Report 86-22, Department of Computer Science, University of Arizona, 1986.

  15. E. W. Myers, A Sublinear Algorithm for Approximate Keywords Searching, Technical Report TR-90-25, Department of Computer Science, University of Arizona, 1990. Also inAlgorithmica,12 (1994), 345–374.

  16. E. W. Myers, A four-Russians algorithm for regular expression pattern matching,J. Assoc. Comput. Mach.,39 (1992), 430–448.

    Google Scholar 

  17. R. Pinter, Efficient string matching with don't-care patterns, inCombinatorial Algorithms on Words (A. Apostilico and Z. Galil, eds.), NATO ASI Series, Vol. F12, Springer-Verlag, New York, 1985, 11–29.

    Google Scholar 

  18. P. H. Sellers, The theory and computations of evolutionary distances: pattern recognition,J. Algorithms,1 (1980), 359–373.

    Article  Google Scholar 

  19. J. Tarhio and E. Ukkonen, Approximate Boyer-Moore string matching,SIAM J. Comput.,22(2) (1993), 243–260.

    Article  Google Scholar 

  20. E. Ukkonen, Finding approximate patterns in strings,J. Algorithms,6 (1985), 132–137.

    Article  Google Scholar 

  21. E. Ukkonen, Approximate string-matching withq-grams and maximal matches,Theoret. Comput. Sci.,92(1992), 191–211.

    Article  Google Scholar 

  22. E. Ukkonen and D. Wood, Approximate string matching with suffix automata,Algorithmica,10 (1993), 353–364.

    Article  Google Scholar 

  23. R. A. Wagner and M. J. Fisher, The string to string correction problem,J. Assoc. Comput. Mach.,21 (1974), 168–173.

    Google Scholar 

  24. S. Wu and U. Manber, Agrep-a fast approximate pattern-matching tool,Proc. Usenix Winter 1992Technical Conference, San Francisco, January 1992, pp. 153–162.

  25. S. Wu and U. Manber, Fast text searching allowing errors,Comm. ACM,35 (1992), 83–91.

    Article  Google Scholar 

  26. S. Wu, U. Manber, and E. W. Myers, A sub-quadratic algorithm for approximate regular expression matching,J. of Algorithms, to appear.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by C. K. Wong.

The research of U. Manber was supported in part by a Presidential Young Investigator Award DCR-8451397, with matching funds from AT&T, and by NSF Grant CCR-9001619. G. Myers research was supported in part by NIH Grant LM04960, NSF Grant CCR-9001619, and the Aspen Center for Physics.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, S., Manber, U. & Myers, G. A subquadratic algorithm for approximate limited expression matching. Algorithmica 15, 50–67 (1996). https://doi.org/10.1007/BF01942606

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01942606

Key words

Navigation